TeX4ht Options

CV Radhakrishnan (CVR) has posted a list of TeX4ht options, the most comprehensive I have seen yet.

TeX4ht is a very powerful piece of software for converting LaTeX to other formats, such as (X)HTML and ODT. Unfortunately, the documentation was never truly complete, and the inner workings of the system can be hard to grasp and understand. The original creator, Eitan Gurari passed on unexpectedly in 2009. CVR and Karl Berry has since taken over the maintenance of TeX4ht.

As a side note, if you have an existing LaTeX document that you need to convert to other formats, TeX4ht is the most robust system that I have come across, i.e. it works with almost any LaTeX packages that are used in your document. (See this link for other LaTeX-to-whatever conversion tools.)

On the other hand, if you’re just starting to write your document from scratch, with a view to exporting to different output formats later, you might be better off using DocBook or pandoc instead. I personally prefer the markdown syntax in pandoc and exporting to LaTeX later for further editing.

Creating an Online Academic Portfolio with LaTeX and TeX4ht

This was originally asked on TeX.SX, the requirements being:

Any one know of a good script to turn a bibtex file into a nice academic portfolio that:

  • links to electronic versions where known (from url or doi)
  • works with local files (e.g. with bibdesk’s format or otherwise)
  • automatically creates a thumbnail of the first page
  • and generally produces a polished web page suitable for showing off your work?

Well, I maintain my own online publication list by generating the HTML code from my BibTeX, using BibLaTeX, Biber and TeX4ht. So my answer to the above question was a quick modification of my own workflow, adding Ghostscript to the mix to generate thumbnail images of the papers. The output looks like this: (The publication lists can be split according to their types)

(BibLaTeX is a complete reimplementation of the bibliographic facilities provided by LaTeX in conjunction with BibTeX. It’s very flexible, and many find it easier to deal with compared to the BST language. Biber is the replacement of the BibTeX binary, for users of BibLaTeX.)

The source codes can be downloaded here as a .zip file. Further elaborations follow.

The Bibliography File

Back to the task at hand. First we have the BibTeX file, the content of which is pretty much the norm, except that I used the custom BibLaTeX field to hold the local PDF file name. My publications.bib contains entries like:

@ARTICLE{Lim:Ranaivo:Tang:2011,
author = {Lim, Lian Tze and Ranaivo-Malan\c{c}on, Bali and Tang, Enya Kong},
title = {Low Cost Construction of a Multilingual Lexicon from Bilingual Lists},
journal = {Polibits},
year = {2011},
volume = {43},
pages = {45–51},
url = {http://polibits.gelbukh.com/2011_43/43-06.htm},
usera = {LLT-polibits.pdf}
}

The LaTeX Source File

Next is the portfolio.tex file, in which I set up a hook at every bibliography item to include the first page of the file pointed to by usera. I’ve also added a bibmacro called string+hyperlink, to make the publication title link to the url or doi field if these are available, as shown in this answer.

\documentclass{article}
\usepackage[backend=biber,bibstyle=authoryear,sorting=ydnt]{biblatex}
\usepackage{graphicx}
\bibliography{publications}
\usepackage{hyperref}

\ExecuteBibliographyOptions{doi=false,url=false}
\newbibmacro{string+hyperlink}[1]{%
\iffieldundef{url}{%
\iffieldundef{doi}{#1}{\href{http://dx.doi.org/\thefield{doi}}{#1}}}
{\href{\thefield{url}}{#1}}}
\DeclareFieldFormat*{title}{\usebibmacro{string+hyperlink}{#1}}

\newbibmacro{usera}{%
\iffieldundef{usera}{}{%
\savefield*{usera}{\filename}%
\usebibmacro{string+hyperlink}{\includegraphics[width=100pt]{\filename}}\\}%
}
\AtEveryBibitem{\usebibmacro{usera}}

\begin{document}
\section{My Academic Portfolio}
\nocite{*}
\printbibliography[title={Articles},type={article}]
\printbibliography[title={Conference Proceedings},type={inproceedings}]

\end{document}

TeX4ht Configuration File

I then set up a TeX4ht personal configuration file, called portfolio.cfg (included in the .zip file). It contains some simple CSS, and tells TeX4ht to convert the first page of the local PDFs into PNGs using ghostscript. (So yes you will need to have ghostscript installed for this to work.)

Generating the HTML

Right, now we can run the following commands:

$  htlatex portfolio “portfolio”
$  biber portfolio
$  htlatex portfolio “portfolio”

And you should then get portfolio.html, which you can further embellish with more CSS. Well that was fun!