[FOP-2886] FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.3
Fix Version/s: None
Component/s: renderer/pdf
Labels:
None
Environment:
Reproduced on Ubuntu 18.04.3 LTS

Description

Overview:

We use FOP 2.3 to generate PDFs based on HTML. We have found that the inclusion of a large number of certain Mathematical Unicode characters (such as https://www.compart.com/en/unicode/U+1D538 ) allows the PDF to be created without error, but the PDF generated cannot be opened by any PDF viewer.

We also use Lowagie PdfReader to validate that the PDF we generate is well-formed. The PdfReader threw the following Exception:
com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer not found.; Original message: PDF startxref not found.

Manual inspection has revealed that the trailer has indeed not been included. We've seen this issue can occur when the input and output streams are not closed or flushed properly – in our case, we are using the Java try-with-resources pattern to invoke close() automatically, so I don't believe this is our issue. I have also tried in vain closing our streams manually, as well as switching the order in which the close() happens.

Steps to Reproduce:

I have not been able to reproduce outside of our software, unfortunately, but I've included the HTML that causes the problem (reproHtml.txt) and the .xsl files we use. This is the code snippet that we use to convert the input HTML into a ByteArrayOutputStream:

public void generatePdfWithCssToXslFo(
        final String htmlString,
        final OutputStream outputStream
) throws CSSToXSLFOException, SAXException, IOException {
    try (final Reader htmlReader = new StringReader(htmlString)) {
        final InputSource source = new InputSource(htmlReader);
        final boolean isValidatingParser = false;
        final boolean cssToXslFoDebugEnabled = System.getProperty("be.re.css.debug") != null;

        // Setup FOP to take the xml:fo and turn it into a PDF
        final Fop fop;
        final FOUserAgent userAgent;

        FopFactoryBuilder builder = new FopFactoryBuilder(URI.create(resourceLoader.getResource(resourceBasePath).getURI().toString()), new ClasspathResolverURIAdapter());
        builder.setConfiguration(configuration);

        FopFactory factory = builder.build();
        userAgent = factory.newFOUserAgent();
        userAgent.setAuthor("Indeed");
        userAgent.setCreator("Indeed Resume");
        userAgent.setTitle("Indeed Resume");
        userAgent.setKeywords("Indeed Resume");

        fop = factory.newFop(MimeConstants.MIME_PDF, userAgent, outputStream);

        // Setup CSSToXSLFo as transform the XHTML output into xml:fo
        final URL baseUrl = resourceLoader.getResource(resourceBasePath).getURL();
        Loggers.debug(LOGGER, "Parsing HTML response using base URL '%s'", baseUrl);
        final XMLReader xmlParser = Util.getParser(null, isValidatingParser);
        final ProtectEventHandlerFilter eventHandlerFilter = new ProtectEventHandlerFilter(true, true, xmlParser);

        final XMLReader filter =
                new CSSToXSLFOFilter(
                        baseUrl,
                        null,
                        Collections.EMPTY_MAP,
                        eventHandlerFilter,
                        cssToXslFoDebugEnabled);

        filter.setEntityResolver(classPathEntityResolver);
        filter.setContentHandler(fop.getDefaultHandler());
        filter.parse(source);
    }
}

Actual Results:

The attached PDF is created (ActualResult.pdf)

Expected Results:

An intact PDF can be created. For example, I've attached ApproximateExpectedResult.pdf where I've replaced the first letter with my name, which allows the PDF to render.

Build Date & Hardware: Date and hardware of the build in which you first encountered the bug.

FOP version 2.3, Build 2014-07-15 on Ubuntu 18.04.3 LTS

Additional Builds and Platforms: Whether or not the bug takes place on other platforms (or browsers, if applicable).

(Unable to test on other platforms.)

Additional Information:

As you can see in the ApproximateExpectedResult.pdf, there is a mix of these Mathematical characters and normal Latin letter characters. Adding additional Latin characters or removing any of the Mathematical characters can sometimes allow the PDF to render, but it's hard to predict - I was not able to link it to any particular character or word.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

reproHtml.txt
29/Oct/19 23:29
12 kB
Lawrence Thibodeaux
name2fo.xsl
29/Oct/19 23:30
13 kB
Lawrence Thibodeaux
fo_setup.xsl
29/Oct/19 23:30
13 kB
Lawrence Thibodeaux
xhtml2fo.xsl
29/Oct/19 23:30
62 kB
Lawrence Thibodeaux
ActualResult.pdf
30/Oct/19 00:04
29 kB
Lawrence Thibodeaux
ApproximateExpectedResult.pdf
30/Oct/19 00:07
31 kB
Lawrence Thibodeaux

FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters

Details

Description

Attachments

Attachments

Activity

People

Dates