Pruning .bib files with JabRef

BibTeX files are quite amazing when you think about it: It’s a plain text file, acting as a format-independent database, holding information about various types of published (or unpublished) artifacts, to support the generation of citations in LaTeX documents.

BibTeX styles typically ignore fields that it doesn’t support, so you can add arbitrary field names to hold whatever additional information you want about each item, e.g. “annote“, “keywords“, “abstract“, etc. Some reference management software even take to embed an entire PDF file as a binary string as a field (e.g. “bdsk-file-1“) right inside the .bib file.

While this means you can use your .bib as a completely standalone reference library, it does mean that the .bib’s physical file size can get quite bloated, even to many MBs: bibliography processors like bibtex or biber would then take a much longer time reading and parsing the .bib file, even if they’re just going discard these field values. And if these fields contain special characters, like %, then bibtex/biber will likely choke and fail to process the entire .bib file correctly.

It may therefore make sense to prune your .bib file to remove these fields, to export a leaner .bib. Several Python scripts have been written for such purposes (e.g. this one), but if you happen to have a copy of JabRef it’s pretty nifty too, especially if you prefer a GUI tool.

  1. Click on any items in the .bib first, and then Ctrl-A (or Cmd-A on a Mac) to select all entries in the .bib.
  2. Click on the “Library” menu and choose “Manage field names & content
  3. Specify/choose “abstract” for field name, then select “Clear fields” and “Overwrite existing field values“. Then click “OK”.
  4. Repeat step 3 for “keywords”, “annote”, “bdsk-file-1” and any other fields that you want to truncate. Make sure that all entries are still selected; because the “set/clear/append/rename fields” operation only applies to selected entries.
  5. Remember to Save the .bib to commit the changes to file.


“Why is LaTeX doing all the APA citations wrong?”

Over the years I get emails asking the above question, especially in thesis templates where the university requires the APA citation and referencing style, which I usually implement with


Alternatively biblatex can also be used:


“If LaTeX is so great, why is it making all the APA citations wrong? It should always be (Author1, et al., 2012), but it keeps giving me (Author1, Author2, & Author 3, 2012) when I cite this entry. Should I stop using LaTeX?”

Riiight. Is apacite really doing things wrongly? First let’s see what the APA6 guidelines say about citations:

the first in-text citation for a work with three to five
includes all of the names of the authors/editors, subsequent citations include only the first author’s/editor’s surname, followed by et al. and the year.

So the first time you cite a source with 3 ≤ # of authors ≤ 5, it should come out as (Author1, Author2, & Author 3, 2012). It’ll only come out as (Author1, et al., 2012) if you cite it again later. The apacite and biblatex-apa packages both do exactly this.

Incidentally if it does come out as (Author1 et al, 2012) the first time you cite it: are there are 6 authors or more for this source? Then yes, this is correct; this is exactly what the APA6 guidelines say to do with such sources. But if this source has 3 ≤ # of authors ≤ 5 and the first citation in your thesis (it’s there on page 1 of Chapter!) is still the abbreviated version (Author1 et al, 2012), then the most likely reason is that the true “first citation” has already appeared somewhere in the Table of Contents, List of Figures, or List of Tables, via a \section etc or a \caption!

In this case I’d recommend that you use an optional argument with your sectional heading or caption, which will be used in the table of contents and lists of figures/tables:

\section[The Old Approach]{The Old Approach \citep{Smith:etal:1982}}
\caption[Old Model]{Old Model \citep{Smith:etal:1982}}

So that the list entries in the front matter will not have citations; but the sectional headings and captions in the main text do.

But there is another scenario: not-quite twins, i.e. they are actually authored by different teams of authors even though the first author is the same person; or even if same group of authors, but in a different order.

[H]ow to cite multiple articles by the same authors that were published in the same year so that everyone can easily tell them apart. […] [L]owercase letters are added after the year (2011a, 2011b, etc.), and the references are alphabetized by title to determine which is “a” and which is “b.” […]

However, be careful that your references are true identical twins. That is, the method described above applies only when all author names are the same and appear in the same order. If any of the names or the order is different, then the references are distinguished in a different way: by spelling out as many author names as necessary to tell them apart.

For example: The first source by Adam Smith, Mark Jones, Paul Stark, Someone Blah, 1982 (I ran out of ideas for names)
and the second source by Adam Smith, Foo Bar, Hiya Hill, Mary Doe, 1982

In cases like this, even on subsequent citations, they cannot both be shortened to (Smith, et al., 1982a) and (Smith, et al., 1982b), because that may be ambiguous, implying that both papers are written by the exact  same team of authors in 1982. Instead, they would be cited as (Smith, Jones, et al., 1982) and (Smith, Bar, et al.,1982). Again, this is what apacite and biblatex-apa do.

“But the IPS/Graduate Office/my supervisor insist that all the citations must be shortened to (Author1, et al., 1982) everywhere, otherwise I am not allowed to submit my thesis!”

Yeah, that’s what’s most crucial, isn’t it… There is a way to get a “half-compliant” APA citation scheme. You can either use the \shortcite command provided by the apacite package (thanks to Stefan for reminding me about this in the comments!), or use the apalike bibliography style instead:


But never say apacite is doing it wrong—it’s actually doing its job very nicely; but certain Graduate Offices and supervisors don’t want the full APA format!

Extracting Only Cited Bibliography Entries

Occasionally, while exchanging files with collaborators or submitting articles, I’d like to extract a smaller .bib file from my main, “hold-all” bibliography file, i.e. containing only the entries that I’d actually cited in my document.

Fortunately this can be done fairly automatically using the bibexport tool. Quick rundown:

  1. Compile your document, say myarticle.tex the usual way, with your “big” allrefs.bib file.
  2. Run bibexport with the following:
    bibexport -o extracted.bib myarticle

    extracted.bib should now contain only the bibliography entries that were cited in myarticle.tex.

  3. Change \bibliography{allrefs.bib} to \bibliography{extracted.bib} in myarticle.tex.
  4. Send myarticle.tex and extracted.bib to collaborator or editor!

Converting an EndNote Database to BibTeX

During a recent LaTeX introductory workshop, many participants said that they’re very much looking forward to using LaTeX for their future writings, but mentioned that there didn’nt seem to be an obvious way of porting their existing EndNote bibliography libarary into BibTeX format.

EndNote does have an “Export BibTeX” filter, but it doesn’t seem to generate satisfactory BibTeX files. After some googling, I found Bevan Weir’s customised export filter, which does a much better job than EndNote’s default. I modified his filter file a little bit more, and was able to convert an EndNote bibliography library to BibTeX with the following steps.

I tested this with EndNote X5 on the Mac, with JabRef 2.7, but they should also work with Windows versions. %ENDNOTE% refers to the path where EndNote is installed on your system.

  1. Put BibTeX_Export_LLT.ens (download) in %ENDNOTE%/Styles/ .
  2. Start EndNotes, and load your library.
  3. Make sure the new style is listed:
    Edit > Output Styles > Open Style Manager
    Make sure BibTeX_Export_LLT is checked.
  4. File > Export
    Make sure Save File as Type is set to Text Only, and Output Style is set to BibTeX_Export_LLT.
  5. Save your file and check that it has a .bib extension.
  6. Open the exported .bib in JabRef. There will be a whole bunch of errors about corrupted or empty BibTeX keys; don’t worry. Just click OK.
  7. Ctrl+A to select all the BibTeX entreis, Tools > Autogenerate BibTeX keys.
  8. Check through the BibTeX entries, especially those highlighted red, to check and correct any crucial information loss.

And hopefully the converted bibliography file is now usable enough.

Converting BibTeX to EndNote format

You might somehow need to submit your research findings in a journal which however, needs you to submit compulsively in Microsoft Words format.
If your institution is subscribing to EndNote, perhaps you might want to install them, at least, in Wine on top of your Linux machine
For easy convertion from *.bib file to EndNote importable format (XML).. use this software, bib2endnote

Unexplained error

I faced this kind of error before. What I mean by unexplained is that the location of error shown by Lyx does not contain any error. Sometime it is just one word without any LaTeX command at the location. By the way, I should mention for the benefit of newcomer, Lyx will highlight the supposedly error location. In this case of error, the highlight is at random location and you can explain what is the actual error. The unexplained error usually occurs, at least I experienced it, once I include a citation. I use BibTeX to generate reference list. The .bib file is linked to the Lyx file and once a citation is included, BibTeX will run the process of creating the list. Error will occur if the bib database contains error, usually it has an illegal character. You know, citation info can be exported directly from publisher’s webpage or even from Google Scholar page. The auto export function is the culprit of the error. Some of the article title or even the name of authors can have illegal characters. Most of it is characters like % and &, which is the command character in LaTeX. So, the errors can be overcomed by making sure that the bib database doesn’t have any of this illegal characters. That will it for now. See you soon!

Published with Blogger-droid v1.7.2

Carta Alir TeX dan LaTeX

Sebelum ini apabila saya melihat cekupan screen (screenshot) yang ditunjukkan oleh bro Root, saya tidak faham kenapa perlu bersusah payah menggunakan LaTeX, sedangkan pemprosesan kata seperti MS Word dan OpenOffice lebih mudah digunapakai.

Kalau ditanya kenapa saya guna Latex :

1- Auto numbering untuk heading, section dan subsection, gambar rajah, diagram, table dan sebagainya. Jadi tak perlu pening kepala nak trace diagram ni nombor berapa, dan kalau masuk diagram baru semua nak kena check balik. Saya tak pasti word processor ada sokong ke tidak ciri-ciri (feature) ini

2- Untuk citation pun dengan mudah kita boleh beralih format, katalah IEEE ke APA dan sebagainya, dengan sokongan BibTeX. Seperti yang saya sebut di dalam posting lain, contohnya untuk IEEE, sistem penomboran citation itu dilakukan secara automatik, seperti yang dilakukan oleh EndNote, satu perisian pengurusan citation (komersial, tapi universiti-universiti ada yang beli. Contohnya USM punya EndNote boleh download di laman web mereka oleh pelajar USM)

3- Layout dan font yang menarik (by default!)

4- Sokongan teks Arab/Jawi dengan menggunakan ArabTeX (dibangunkan oleh seorang profesor di Jerman)

Untuk memahami bagaimana proses penulisan fail Latex (dengan extension *.tex) sehinggalah kepada format PostScript (*.ps) dan akhirnya PDF (*.pdf), rujuk carta alir berikut: