CJK support in LaTeX

(Oh gosh. This is one post I should have written a loooong time ago.)

Najmi has previous posted about Jawi support on LaTeX. Well then, what about CJK (Chinese, Japanese, Korean) characters?

(The process may be simpler using XƎLaTeX, but I personally use LaTeX more, so this post won’t touch on XƎLaTeX.)

Short snippets

If you need only short CJK snippets, use the CJK package. (While you’re at it, you may as well grab cjk-fonts and wadalab for the fonts.) On Debian-based systems, just grab latex-cjk-all and you should be good. Or if you don’t want the whole package (it’s huge), grab whatever you need for latex-cjk-chinese, latex-cjk-japanese or latex-cjk-korean (and whatever relevant font packages).

Here’s a basic example for Chinese:

\usepackage{CJK}

%% if your file is saved as GB simplified encoding
… as we say in Chinese,
\begin{CJK}{GB}{gbsn}子曰:有朋自远方来,不亦乐乎?\end{CJK}

%% if you file is saved as Big5 traditional encoding
… as we say in Chinese,
\begin{CJK}{Bg5}{bsmi}子曰:有朋自遠方來,不亦樂乎?\end{CJK}

But if you’re saving as UTF-8 then you need CJKutf8.sty (included in CJK package):

\usepackage{CJKutf8}

as we say in Chinese,
\begin{CJK}{UTF8}{gbsn}子曰:有朋自远方来,不亦乐乎?\end{CJK}
or \begin{CJK}{UTF8}{bsmi}子曰:有朋自遠方來,不亦樂乎?\end{CJK}

You have a few font choices (make sure you get the latex-cjk-chinese-arphic-* files!)

  • gbsn (简体宋体, simplified Chinese)
  • gkai (简体楷体, simplified Chinese )
  • bsmi (繁体细上海宋体, traditional Chinese)
  • bkai (繁体标楷体, traditional Chinese)

Japanese and Korean text are typeset much the same way. If you save everything as UTF-8, then it’s just a matter of knowing what fonts to invoke:

\usepackage{CJKutf8}

%% Japanese
\begin{CJK}{UTF8}{min}
露の世は 露の世ながら さりながら
\end{CJK}

%% Korean
\begin{CJK}{UTF8}{mj}
편편황조 자웅상의 염아지독 수기여귀
\end{CJK}

The Japanese fonts are from the wadalab packages (latex-cjk-japanese-wadalab-*):

  • min (明朝 Mincho)
  • goth (ゴシック Gothic)
  • maru (丸ゴシック Maru Gothic)

As for Korean, well I’ve only been able to get mj (明朝体 MyongJu) working so far.

Entire Document in Chinese

On the other hand, if your entire document is going to be in Chinese, you might be better off using the ctexart document class (in the ctex package):

\documentclass[UTF8]{ctexart}

\begin{document}

\section{论语}
子曰:有朋自远方来,不亦乐乎?

\end{document}

There is a caveat, though. You’ll need to copy some Windows Chinese font files to your $localtexmf/fonts/truetype/… directory (don’t forget to run texhash!) to use ctex properly (font name in CJK/ctexart in brackets). These are all for simplified Chinese characters:

  • simsun.ttc 宋体 (song, default)
  • simfant.ttf 仿宋 (fs)
  • simkai.ttf 楷书 (kai)
  • simhei.ttf 黑体 (hei)
  • simli.ttf 隶书 (li)
  • simyou.ttf 幼圆 (you)

In any case, for more help on the ctex package and ctexart.cls, you’d best ask for help at the CTEX forum. (Language there is predominantly Mandarin Chinese.) I’m not aware of similar classes for Japanese nor Korean, though.

Pinyin and Ruby

Younger children learning Chinese characters (Hanzi/Kanji/Hanja) would often have the pronunciations annotated alongside/above/beneath the characters. For Chinese pinyin pronunciations, you would invoke

\usepackage{pinyin}
…\dian4 \deng1

to get diàn dēng.

To cite Martin Duerst:

Ruby are small characters used for annotations of a text, at the right side for vertical text, and atop for horizontal text, to indicate the reading (pronounciation) of ideographic characters.

And you can produce them with the ruby package:

\usepackage{CJKutf8,pinyin}
\usepackage[overlap,CJK]{ruby}

%% By convention, the pinyin would be *under* the Hanzi
%% so change the \rubysep to move it under

\begin{CJK}{UTF8}{gbsn}
\renewcommand\rubysep{-1.4em}
\ruby{电}{\dian4}\ruby{灯}{\deng1}
\end{CJK}

%% I find the default \rubysep (-0.5ex) too tight, so
%% let’s enlarge it a little.

\renewcommand\rubysep{-0.2ex}

%% Shonen manga readers would get the written as
%% rival, pronounced as friend
 reference

%% (CORRECTED June 22)
\begin{CJK}{UTF8}{min}
\ruby{素敵}{ともだち}
\end{CJK}

%% Disclaimer: I’m actually unsure where the
ruby should be placed for Korean Hanja

\begin{CJK}{UTF8}{mj}
\ruby{南}{남}\ruby{宮}{궁}
\end{CJK}

The output of which looks something like this: