Reproduced on Ubuntu 18.04.3 LTS
We use FOP 2.3 to generate PDFs based on HTML. We have found that the inclusion of a large number of certain Mathematical Unicode characters (such as https://www.compart.com/en/unicode/U+1D538 ) allows the PDF to be created without error, but the PDF generated cannot be opened by any PDF viewer.
We also use Lowagie PdfReader to validate that the PDF we generate is well-formed. The PdfReader threw the following Exception:
com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer not found.; Original message: PDF startxref not found.
Manual inspection has revealed that the trailer has indeed not been included. We've seen this issue can occur when the input and output streams are not closed or flushed properly – in our case, we are using the Java try-with-resources pattern to invoke close() automatically, so I don't believe this is our issue. I have also tried in vain closing our streams manually, as well as switching the order in which the close() happens.
Steps to Reproduce:
I have not been able to reproduce outside of our software, unfortunately, but I've included the HTML that causes the problem (reproHtml.txt) and the .xsl files we use. This is the code snippet that we use to convert the input HTML into a ByteArrayOutputStream:
The attached PDF is created (ActualResult.pdf)
An intact PDF can be created. For example, I've attached ApproximateExpectedResult.pdf where I've replaced the first letter with my name, which allows the PDF to render.
Build Date & Hardware: Date and hardware of the build in which you first encountered the bug.
FOP version 2.3, Build 2014-07-15 on Ubuntu 18.04.3 LTS
Additional Builds and Platforms: Whether or not the bug takes place on other platforms (or browsers, if applicable).
(Unable to test on other platforms.)
As you can see in the ApproximateExpectedResult.pdf, there is a mix of these Mathematical characters and normal Latin letter characters. Adding additional Latin characters or removing any of the Mathematical characters can sometimes allow the PDF to render, but it's hard to predict - I was not able to link it to any particular character or word.