Cameron Laird's personal notes on PDF toolkits

An index to PDF toolkits

Java
- iText [explain how crucial it is]
- PJ
Perl
- PDF::API2
PHP
Python
- pyPdf
- PyPDF2
- ReportLab
Tcl

Details on individual toolkits

See remarks on PDF::API2 in this book review of Perl Graphics Programming.

Etymon™'s PJ class library coded in Java includes a command-line utility, pjscript. Jens Vonderheide was also enthusiastic about it. In early 2012, Etymon seems to be off-line.

PyPDF2 is Phaseit's fork of pyPdf. Both pyPdf and PyPDF2 are open-source pure-Python libraries which concentrate on manipulation of existing PDF instances.

ReportLab

ReportLab is an ambitious, industrial-strength library largely focused on precise creation of PDF. Understand clearly that it has an open-source, no-charge base, but also a for-fee "ReportLab PLUS" extension of that base. ReportLab PLUS involves a relatively large cost and a relatively large extension of capabilities and services.

ReportLab programming

Along with the references above and in related pages, "Yes You Can" and "PDF for the server" (but see important import_HTML note below) touch on ReportLab programming. Readers asked for example usages. Here are a few:

copyPages

Is copyPages still not in the standard ReportLab documentation? In which public release did it first appear? As October 2002 begins, it looks as though it's only in the for-fee library, but that's not true ... [collect details, explain.] In any case, here's how you can append one PDF source to another, while preserving "bookmarks":

   from pageCatcher import copyPages
   from reportlab.pdfgen import canvas

   def makeAppendedResult(result, first_source, second_source):
      c = canvas.Canvas(result)
      copyPages(first_source, c);
      copyPages(first_source, c);
      c.showOutline()
      c.save()

import_HTML

Ugh. My apologies, folks; in the article titled "PDF for the Server" I identified import_HTML as part of ReportLabs' library. This is simply false, and I'll make a point of correcting it in a future column.

The import_HTML I use is this:

# In response to a correspondent's comment, I replied:
#   "Bleah; ignore the Python.  I'll comment it to make this
#    clear:  the point is just that HTML->PDF is achieved as
#    HTML->PS->PDF, the second step is canonical, and the
#    first is done with a specific command-line tool."

# Copyright Kyler Laird 2001.
# Freely redistributable.
#

# Import from HTML.
def import_HTML(self, html, color=0, style=None, landscape=0, number=0):
    infile = self._write_string_to_tmpfile(html, ext='HTML')
    self.outfile = self._mktemp('ps')

    options = []

    if number:
        options.append('--number')
        options.append('--startno %d' % number)

    if landscape:
        options.append('--landscape')

    if color:
        options.append('--colour')

    if style:
        stylefile = self._write_string_to_tmpfile(style,
ext='style')
        # options.append('--style "%s"' % (style))
        options.append('-f "%s"' % (stylefile))

    command_string = "html2ps %s -o %s %s" %
(string.join(options, ' '), self.outfile, infile)
    self._run(command_string)
    return

There are several ways to render HTML as PS.

More on PDF

I also (episodically) maintain pages on PDF in general, PDF "converters", PDF generation, ...

In December 2001, I published a breezy introduction to no-cost PDF resources for my "Open Sources" column. I also wrote "Yes You Can" (August 2002), "Low-cost PDF" (April 2003), "PDF for C and C++ Developers" (October 2003), and ... For more information on the products described there, start with the home pages of PDFlib, PJ, and ReportLab. I'll probably write more on ReportLab programming and business strategy throughout 2002, perhaps beginning with a piece on PDF security; write me if there's a particular aspect you want me to cover. Note that the Ohio Department of Transportation's open-source JavaPDF is another product worth considering along with PDFlib, PJ, ReportLab, and all of CPAN's PDF directory.

I recommend reading "Kyler Laird's PDF utilities" both for the usefulness of the tools and hyperlinks available there, and also for the correct engineering commentary. Dave Toureztky maintains a "Gallery of Adobe Remedies" with more comprehensive information on PDF security, including a pointer to a Perl script which decrypts PDF.

Cameron Laird's personal notes on PDF toolkits/claird@phaseit.net