In December 2001, I published a breezy introduction to no-cost PDF resources for my "Open Sources" column. I also wrote "Yes You Can" (August 2002), "Low-cost PDF" (April 2003), "PDF for C and C++ Developers" (October 2003), and ... For more information on the products described there, start with the home pages of PDFlib, PJ, and ReportLab. I'll probably write more on ReportLab programming and business strategy throughout 2002, perhaps beginning with a piece on PDF security; write me if there's a particular aspect you want me to cover. Note that the Ohio Department of Transportation's open-source JavaPDF is another product worth considering along with PDFlib, PJ, ReportLab, and all of "CPAN's PDF directory.
I recommend reading "Kyler Laird's PDF utilities" both for the usefulness of the tools and hyperlinks available there, and also for the correct engineering commentary. Dave Toureztky maintains a "Gallery of Adobe Remedies" with more comprehensive information on PDF security, including a pointer to a Perl script which decrypts PDF.
Etymon's PJ class library coded in Java includes a command-line utility, pjscript.
Acquaintances tell me good things about Xpdf's ability to extract (plain)text content from PDF sources. Xpdf is an X-oriented PDF viewer.
Addison-Wesley published the PDF Reference on dead trees. It's also available online, as are the draft specification for PDF 1.5 and ... a different specification.
[Describe other tools.]
[Editorialize on PDF role.]
"PDF: Unfit for Human Consumption" is Jakob Nielsen's hysterical--that is, effectively publicized--mid-July 2003 attack on the "usability" of the format. [explain errors, obscurity of correct observations]
Freeware GhostWord plugs into Word, PowerPoint, and Excel, and automates production of .ps, and, from there, .pdf. Thanks to Dr. Gregory Guthrie for tipping me off to GhostWord.
from pageCatcher import copyPages
from reportlab.pdfgen import canvas
def makeAppendedResult(result, first_source, second_source):
c = canvas.Canvas(result)
copyPages(first_source, c);
copyPages(first_source, c);
c.showOutline()
c.save()
The import_HTML I use is this:
# In response to a correspondent's comment, I replied:
# "Bleah; ignore the Python. I'll comment it to make this
# clear: the point is just that HTML->PDF is achieved as
# HTML->PS->PDF, the second step is canonical, and the
# first is done with a specific command-line tool."
# Copyright Kyler Laird 2001.
# Freely redistributable.
#
# Import from HTML.
def import_HTML(self, html, color=0, style=None, landscape=0, number=0):
infile = self._write_string_to_tmpfile(html, ext='HTML')
self.outfile = self._mktemp('ps')
options = []
if number:
options.append('--number')
options.append('--startno %d' % number)
if landscape:
options.append('--landscape')
if color:
options.append('--colour')
if style:
stylefile = self._write_string_to_tmpfile(style,
ext='style')
# options.append('--style "%s"' % (style))
options.append('-f "%s"' % (stylefile))
command_string = "html2ps %s -o %s %s" %
(string.join(options, ' '), self.outfile, infile)
self._run(command_string)
return
There are several
ways to render HTML as PS.
[Explain significance.] [Compare to htmldoc.]
my_html_source = """
<HTML>
<HEAD><TITLE>%s</TITLE></HEAD>
<BODY><H1>%s</H1>
%s
</BODY>
</HTML>""" % (title, title, content)
my_document.import_HTML(my_html_source)
Cameron
Laird's personal notes on
PDF/claird@phaseit.net