Proceedings Generator

I am making the script available that I wrote for putting the ISMIR 2007 proceedings together.

Input: multiple PDF files (one per paper), desired order, session names and mapping of papers to sessions.

Output: one big PDF file with page numbers, table of contents and author index.

This is ad-hoc, quick-and-dirty software! It won't be useful for creating other proceedings without reading and adapting the source code! But this could still be less work than starting from scratch.

What it does

Basic ideas:

  • All papers should be submitted as PDF files. This makes it possible to support LaTeX, Word, even OpenOffice (although more variety here will doubtless cause additional problems).
  • My script converts the PDF files into plain text and attempts to extract metadata (authors, title). This will be imperfect and require manual corrections, but that's probably still better than doing it entirely by hand.
  • The script writes a LaTeX source file that embeds all PDF files as images, with each page as a separate image.
  • There is a translation and scaling parameter which can be adjusted for each paper to compensate for authors ignoring formatting instructions and paper size (adjusting this is usually less work than contacting the author and making him correct his paper).
  • A table of contents and an author index are created automatically.
  • The order of papers can be freely configured.


  • Delete all metadata from the beginning of the script.
  • Individual papers are submitted as PDF files. Save them in a folder or a set of folders. Make sure papers with different numbers of pages are in different folders. In its current form, the script expects the directories to be called "CR2", "CR3", "CR4", "CR5" where each subdirectory contains 2-page poster, 6-page paper, 4-page paper, and 4-page posters, respectively. This probably needs to be adjusted.
  • Currently, each file name must me named ISMIRYYYY_III.pdf where YYYY is a year and III is a three-digit unique paper index. This might need to be adjusted as well.
  • You must have pdftotext in order to automatically extract metadata from each camera-ready pdf.
  • Run the script:
    perl (list|mflist) > ismirYYYYproc.tex
    where list and mflist give you plaintext lists of the proceedings.
    Copy the automatically extracted metadata from the output to the beginning of the script and correct any mistakes.
  • Run the script again with all metadata in place.
  • Run pdflatex on the output.
  • (Read the source code to see how to get a list of papers with page ranges and session names).

Source Code + Templates


Although Perl runs on many platforms, Linux will probably work best since some Linux commands are used in this script.


I grant you the non-exclusive right to use this program for non-commercial purposes. No other rights are granted. I do not guarantee anything and am not liable for any damages resulting from using this software.