`

Generating PDFs for Fun and Profit with Flying Saucer and iText

阅读更多
 E-mail  Print <!-- <img src="/im/ic_discuss.gif" width="13" height="12" hspace="4" border="0" alt=" " /> <a href="/comment/reply/219785#comment-form">Discuss</a> -->

Generating PDFs for Fun and Profit with Flying Saucer and iText

Tue, 2007-06-26

{cs.r.title}



PDFs are one of the most common and most significant document formats on the internet. Typically, developers must use expensive tools from Adobe or cumbersome APIs to generate PDFs. In this article, you will learn how to programmatically generate PDFs easily with plain XHTML and CSS using two open source Java libraries: Flying Saucer and iText.

The Problem with PDFs

PDFs are a great technology. When Adobe created the PDF format, they had a vision for a portable document format (hence the name) that could be viewed on any computer and printed to any printer. Unlike web pages, PDFs will look exactly the same on every device, thanks to the rigorous PDF specification. And the best thing about PDFs is that the specification is open so you can generate them on the fly, using readily available open source libraries.

There is one big problem with PDFs, however: the spec is complicated and the APIs for generating PDFs tend to be cumbersome, requiring a lot of low-level coding of paragraphs and headers. More importantly, you have to use code to generate PDFs. But to make good-looking PDFs, you need a graphic designer to create the layout. Even if graphic designers are up to the task of programming, they still must convert their layout from some other format to code, which can be cumbersome, buggy, and time-consuming. Fortunately, there is a better way.

The way to make good looking PDFs is to let the programmers do what they are good at: writing code that manipulates data, and let the graphic designers do what they are good at: making attractive graphic designs. Flying Saucer and iText are tools that do this. They let you render CSS stylesheets and XHTML, either static or generated, directly to PDFs.

An Introduction to Flying Saucer and iText

Flying Saucer, which is the common name for the xhtmlrenderer project on java.net, is an LGPLed Java library on java.net originally created by me and continually developed by the java.net community. Download it from the project page, or use the copy included with this article's sample code (see Resources). Flying Saucer's primary purpose is to render spec-compliant XHTML and CSS 2.1 to the screen as a Swing component. Though it was originally intended for embedding markup into desktop applications (things like the iTunes Music Store), Flying Saucer has been extended work with iText as well. This makes it very easy to render XHTML to PDFs, as well as to images and to the screen. Flying Saucer requires Java 1.4 or higher.

iText is a PDF generation library created by Bruno Lowagie and Paulo Soares, licensed under the LGPL and the Mozilla Public License. You can download iText from its home page or use the copy in the download bundle at the end of this article (see Resources). Using the iText API, you can produce paragraphs, headers, or any other PDF feature. Since the PDF imaging model is fairly similar to Java2D's model, Flying Saucer and iText can easily work together to produce PDFs. In fact, the PDF version of the Flying Saucer user manual was itself produced using Flying Saucer and iText.

Generating a Simple PDF

To get started, I'm going to show you how to render a very simple HTML document as a PDF file. You can see in the samples/firstdoc.xhtml file below that it's a plain XHTML document (note the XHTML DTD in the header) and contains only a single formatting rule: b { color: green; }. This means the default HTML formatting for paragraphs and text will apply, with the exception that all b elements will be green.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>My First Document</title>
        <style type="text/css"> b { color: green; } </style>
    </head>
    <body>
        <p>
            <b>Greetings Earthlings!</b>
            We've come for your Java.
        </p>
    </body>
</html>

Now that we have a document, we need code to produce the PDF. The FirstDoc.java file below is the simplest possible way to render a PDF document.

package flyingsaucerpdf;

import java.io.*;
import com.lowagie.text.DocumentException;
import org.xhtmlrenderer.pdf.ITextRenderer;

public class FirstDoc {
    
    public static void main(String[] args) 
            throws IOException, DocumentException {
        String inputFile = "samples/firstdoc.xhtml";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "firstdoc.pdf";
        OutputStream os = new FileOutputStream(outputFile);
        
        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);
        
        os.close();
    }
}

There are two main parts to the code. First it prepares the input and output files. Since Flying Saucer deals with input URLs, the code above converts a local file string into a file:// URL using the File class. The output document is just a FileOutputStream that writes to the firstdoc.pdf file in the current working directory.

The second part of the code creates a new ITextRenderer object. This is the Flying Saucer class that knows how to render PDFs using iText. You must first set the document property of the renderer using the setDocument(String) method. There are other methods for setting the document using URLs and W3C DOM objects. Once the document is installed you must call layout() to perform the actual layout of the document and then createPDF() to draw the document into a PDF file on disk.

To compile and run this code you need the Flying Saucer .jar, core-renderer.jar. For this article I am using a recent development build (R7 HEAD). R7 final should be out in a few weeks, perhaps by the time you read this. I chose to use a recent R7 build instead of the year-old R6 because R7 has a rewritten CSS parser, better table support, and of course, many, many bugfixes. You will also need the iText .jar itext_paulo-155.jar (this is actually an early access copy of iText from its SourceForge project page). All of these .jars are included in the standard Flying Saucer R6 download, and also in the examples.zip file in this article's Resources section. Once you put these .jars in your classpath everything will compile and run. The finished PDF looks like Figure 1:

Screenshot of firstdoc.pdf
Figure 1. Screenshot of firstdoc.pdf (click to download full PDF document)

Generating Content on the Fly

Producing a PDF from static documents is useful, but it would be more interesting if you could generate the markup programmatically. Then you could produce documents that contain more interesting content than simple static text.

Below is the code for a simple program that generates the lyrics to the song "99 Bottles of Beer on the Wall." This song has a repeated structure, so we can easily produce the lyrics with a simple loop. This document also uses some extra CSS styles like color, text transformation, and modified padding.

In first part of the OneHundredBottles.java code, all of the style and markup is appended to a StringBuffer. Note that the style rule for h3 includes the text-transform property. This will capitalize the first letter of every word in the title. The body of the document is produced by the loop that goes from 99 to 0. Notice that there is an image, 100bottles.jpg, included at the top of the document. iText will embed the image in the resulting PDF, meaning the user will not need to load any other images once they receive the PDF. This is an advantage of PDFs over HTML, where images must be stored separately.

public class OneHundredBottles {

public static void main(String[] args) throws Exception {
    
    StringBuffer buf = new StringBuffer();
    buf.append("<html>");
    
    // put in some style
    buf.append("<head><style language='text/css'>");
    buf.append("h3 { border: 1px solid #aaaaff; background: #ccccff; ");
    buf.append("padding: 1em; text-transform: capitalize; font-family: sansserif; font-weight: normal;}");
    buf.append("p { margin: 1em 1em 4em 3em; } p:first-letter { color: red; font-size: 150%; }");
    buf.append("h2 { background: #5555ff; color: white; border: 10px solid black; padding: 3em; font-size: 200%; }");
    buf.append("</style></head>");
    
    // generate the body
    buf.append("<body>");
    buf.append("<p><img src='100bottles.jpg'/></p>");
    for(int i=99; i>0; i--) {
        buf.append("<h3>"+i+" bottles of beer on the wall, "
                + i + " bottles of beer!</h3>");
        buf.append("<p>Take one down and pass it around, "
                + (i-1) + " bottles of beer on the wall</p>\n");
    }
    buf.append("<h2>No more bottles of beer on the wall, no more bottles of beer. ");
    buf.append("Go to the store and buy some more, 99 bottles of beer on the wall.</h2>");
    buf.append("</body>");
    buf.append("</html>");

The second part of the code parses the StringBufferinto a DOM document using the standard Java XML APIs and then sets that as the document on the ITextRenderer object. The renderer needs a base URL to load resources like images and external CSS files. If you pass a URL for the document to the renderer, then it will infer the base URL. For example the document URL http://myserver.com/pdf/mydoc.xhtml would result in a base URL of http://myserver.com/pdf/ However, if you pass in a pre-parsed Document object instead of a URL, then the renderer will have no idea what the base URL is. You can manually set the base URL using the second argument to the setDocument() method. In this case I have used a value of null, since I am not referencing any external resources.

    // parse the markup into an xml Document
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.parse(new StringBufferInputStream(buf.toString()));

    ITextRenderer renderer = new ITextRenderer();
    renderer.setDocument(doc, null);

    String outputFile = "100bottles.pdf";
    OutputStream os = new FileOutputStream(outputFile);
    renderer.layout();
    renderer.createPDF(os);
    os.close();
}
}

The final document looks like Figure 2:

Screenshot of 100bottles.pdf
Figure 2. Screenshot of 100bottles.pdf (click to download full PDF document)

Page-Specific Features

So far the documents we have rendered are basically just web pages in PDF form. They don't have any features that take advantage of pages. Paged media like printed documents or slideshows have certain features specific to pages. In particular, pages have specific sizes and margins. Text laid out for an 8 1/2 by 11 inch piece of paper will look very different than text for a paperback book, or a CD cover. In short, pages matter, and Flying Saucer gives you some control over pages using page-specific features in CSS.

This next example will print the first chapter of Lewis Carroll's Alice in Wonderland in a paperback format. The markup is pretty straightforward, just a bunch of headers and paragraphs. Below are the first few paragraphs of the document (see the download for the entire chapter). There are two things to notice in this document. First, all of the style is included in the alice.css file linked in the header with a link element. The media="print" attribute must be included, or the style will not be loaded. The other important thing to notice are the two divs at the top: header and footer. The footer has two special elements in it, pagenumber and pagecount, which are used to generate the page numbers. These divs and the page number elements will not be rendered at the top of the page. Instead, we will use some special CSS to make these divs repeat on every page and generate the proper page numbers at runtime.

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Alice's Adventures in Wonderland -- Chapter I</title>
        <link rel="stylesheet" type="text/css" href="alice.css" media="print"/>
    </head>
    
    <body>
        <div id="header" style="">Alice's Adventures in Wonderland</div>
        <div id="footer" style="">  Page <span id="pagenumber"/> of <span id="pagecount"/> </div>
                
        <h1>CHAPTER I</h1>
        
        <h2>Down the Rabbit-Hole</h2>
        
        <p class="dropcap-holder">
            <div class="dropcap">A</div>
            lice was beginning to get very tired of sitting by her sister
            on the bank, and of having nothing to do: once or twice she had
            peeped into the book her sister was reading, but it had no pictures
            or conversations in it, `and what is the use of a book,' thought
            Alice `without pictures or conversation?'
        </p>
        
        <p>So she was considering in her own mind (as well as she could,
            for the hot day made her feel very sleepy and stupid), whether the
            pleasure of making a daisy-chain would be worth the trouble of
            getting up and picking the daisies, when suddenly a White Rabbit
        with pink eyes ran close by her. </p>
        
        <p class="figure">
            <img src="alice2.gif" width="200px" height="300px"/>
            <br/>
            <b>White Rabbit checking watch</b>
        </p>
        ... the rest of the chapter

Most of the alice.css file contains normal CSS rules that can apply to any kind XHTML document, printed or not. There are a few, however, that are page-specific extensions:

@page { 
size: 4.18in 6.88in;
margin: 0.25in; 
-fs-flow-top: "header";
-fs-flow-bottom: "footer";
-fs-flow-left: "left";
-fs-flow-right: "right";
border: thin solid black;
padding: 1em;
}

#header {
font: bold serif;
position: absolute; top: 0; left: 0; 
-fs-move-to-flow: "header";
}

#footer {
font-size: 90%; font-style: italic; 
position: absolute; top: 0; left: 0;
-fs-move-to-flow: "footer";
}


#pagenumber:before {
content: counter(page); 
}

#pagecount:before {
content: counter(pages);  
}


The first thing you'll notice in the CSS above is the @page rule. This is a rule that is attached to the page itself rather than to any particular elements within the document. Within this @page rule, you can set the size of the page as well as page margins using the size and margin properties. Note that I have set the size to 4.18in 6.88in, which is the size of a standard mass-market paperback book in the U.S. (according to CafePress). Also in the @page rule are four special properties beginning with -fs-flow-. These are Flying Saucer-specific properties that tell the renderer to move content marked with the specified names: header, footer, left, and right to every page in the top, bottom, left, and right positions.

In the rules for the header and footer divs, you can see another Flying Saucer-specific property called -fs-move-to-flow, which will take the div out of the normal document and put it in the special place marked by "footer" or "header". This property works in conjunction with the -fs-flow-* properties in the @page element to make repeated content work. These custom properties are needed because CSS 2.1 does not define any way to have repeated headers and footers. CSS 3 does define a way to have repeated content, and Flying Saucer will support the new standard mechanism in the future.

After the @page and header rules, you'll find two more rules for the pagenumber and pagecount elements. These are made-up elements (not standard XHTML) that will have counters added to their content. Since those two elements are empty, you will only see the counters themselves. Since the pagenumber and pagecount elements were defined in the footer, the final page numbers will also appear in the footer. Again, these page number elements will be replaced with their proper CSS 3 equivalents in the future.

The final rendered alice.xhtml is shown in Figure 3:

Screenshot of pagination.pdfScreenshot of pagination.pdf
Figure 3. Screenshot of two pages of pagination.pdf (click to download full PDF document)

A quick note on debugging: CSS can be tricky sometimes, and it is very easy to misspell a keyword or forget some punctuation. Flying Saucer R7 has a brand new CSS parser with very robust error reporting. When developing your application, I recommend turning on the built-in logging. The in-depth details of Flying Saucer configuration are available in the FAQ. I have found the most useful setting is to set the logging level to INFO by adding this to your Java command line:

-Dxr.util-logging.java.util.logging.ConsoleHandler.level=INFO

This setting will print lots of debugging information, including places where the CSS or markup may be broken.

Rendering Generic XML Instead of XHTML

Every example so far has used XHTML, meaning the XHTML dialect of XML defined by the W3C. Many documents rendered into PDF are in fact XHTML documents, but Flying Saucer can actually handle any well-formed XML file. In fact, Flying Saucer does very little that is XHTML-specific. XHTML documents are just XML documents with a default stylesheet. If you define your own stylesheet, then you can render any XML document you want. This could be particularly useful when working with the output of databases or web services, since that output is probably in XML already.

Below is a very simple custom XML document, weather.xml, that describes the weather at multiple locations. It does not use standard XHTML elements at all; every element is custom. Notice the second line contains a reference to the stylesheet.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='weather.css' type='text/css'?>
<weather>
    <station>
        <location>Springfield, NT</location>
        <description>Sunny</description>
        <tempf>85</tempf>
    </station>
    <station>
        <location>Arlen, TX</location>
        <description>Super Sunny</description>
        <tempf>99</tempf>
    </station>
    <station>
        <location>South Park, CO</location>
        <description>Snowing</description>
        <tempf>18</tempf>
    </station>
</weather>

Here is the DirectXML.java code that renders the document. Notice that the code does nothing special. As far as Flying Saucer is concerned, the only difference between XHTML and XML is the file extension.


public class DirectXML {
    public static void main(String[] args) throws IOException, DocumentException {
        String inputFile = "samples/weather.xml";
        String outputFile = "weather.pdf";
        
        OutputStream os = new FileOutputStream(outputFile);
        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(new File(inputFile));
        renderer.layout();
        renderer.createPDF(os);
        os.close();
    }
}

Here's the weather.css CSS that will style the XML.


* { display: block; margin: 0; padding: 0; border: 0;}

station { 
    clear: both; 
    width: 3in; height: 3in;
    padding: 0.5em; margin: 1em;
    border: 3px solid black; background-color: green;
    font-size: 30pt;
    page-break-inside: avoid;
}

tempf {
    border: 1px solid white;
    background-color: blue; color: white;
    width: 1.5in; height: 1.5in;
    margin: 5pt;
    padding: 8pt;
    font: 300% sans-serif bold;
}

location { color: white; }
description { color: yellow; }


The CSS stylesheet contains all of the magic in this example. Since this is all XML, there are no default rules to show how any element is drawn. That's why the first rule is a * rule to affect all elements: they should all be blocks with no border, margins, or padding. Then I have defined a rule for each of the four content elements. The elements take the standard CSS properties that you could apply to HTML elements. Note that the station element has a page-break-inside: avoid property. This is a CSS 3 property that tells the renderer that you don't want the station element split by a page break. This is useful when you have content sections that must be printed whole. For example you might be printing to label paper for stickers on a map display. In that case, you definitely would not want any boxes to be split across pages.

Note that I've set the size of the station block using inches. When coding for the Web you usually want to avoid absolute units like inches, pixels, or centimeters. Instead, you should use relative units like points or ems, since these work well when a user resizes the document or changes their font size. But then again, PDFs aren't for the Web. They are paged media for printing. That means absolute units are perfectly fine, and in fact encouraged, since their use ensures the user will get a document that looks exactly like you wanted.

The final document looks like Figure 4.:

Screenshot of weather.pdf
Figure 4. Screenshot of weather.pdf (click to download full PDF document)

Generating PDFs in a Server-Side Application

All of the examples in this article have been small command-line programs that write PDF files. However, you can easily use this technology to produce PDFs in a web application using a servlet. The only difference is that you will be writing to a ServletOutputStream instead of a FileOutputStream. Below is a portion of the code for a PDF generation servlet that produces a tabular report of sales for a particular user:


public class PDFServlet extends HttpServlet {
    
    protected void processRequest(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        response.setContentType("application/pdf");
        
        StringBuffer buf = new StringBuffer();
        buf.append("<html>");
        
        String css = getServletContext().getRealPath("/PDFservlet.css");
        // put in some style
        buf.append("<head><link rel='stylesheet' type='text/css' "+
                "href='"+css+"' media='print'/></head>");
        
        ... //generate the rest of the HTML
        
        // parse our markup into an xml Document
        try {
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(new StringBufferInputStream(buf.toString()));
            ITextRenderer renderer = new ITextRenderer();
            renderer.setDocument(doc, null);
            renderer.layout();
            OutputStream os = response.getOutputStream();
            renderer.createPDF(os);
            os.close();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

The code above looks pretty much like the previous examples. There are two special things to notice, though. First, you must set the content type to application/pdf. This will make the user's web browser pass the PDF on to their PDF reader or plugin instead of showing it as garbled text. Second, the CSS is stored in a separate file in the main webapp directory (where the JSPs and HTML would go). In order for Flying Saucer to find it, you must use the getServletContext().getRealPath() method to convert PDFservlet.css into an absolute URL and put it in the link tag at the top of the generated markup. Once you have your HTML properly generated, you can just parse it into a Document and render the PDF to the output stream returned by response.getOutputStream().

The final document looks like Figure 5:

Screenshot of servlet.pdf
Figure 5. Screenshot of servlet.pdf (click to download full PDF document)

Conclusion

PDFs are a great format for maps, receipts, reports, and printable labels. Flying Saucer and iText let you produce PDF files programmatically without having to use expensive tools or cumbersome APIs. By using plain XHTML and CSS, your graphic designer can use their existing web tools like Dreamweaver to produce great looking CSS templates that you or your developers plug in to your applications. By splitting the work, you can save both time and money.

If you use Flying Saucer to produce PDFs for your company or project, please post a link in the comments of this article or email me. The Flying Saucer team would love to have more examples of cool things people are doing with Flying Saucer and iText.

Resources

Joshua Marinacci first tried Java in 1995 at the request of his favorite TA and has never looked back.
Related Topics >> Web Design      
Comments
Comments are listed in date ascending order (oldest first)

THk you very much

THk you very much



 
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics