Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
3.0.1 PDFBox
-
None
Description
I try to merge import a page into a pdf document and copy the font resources. With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a result document, including the required, embedded fonts.
Essentially I'm doing this steps in the code, while the first document is one empty page PDF/A, and the second document contains the roboto font, also a PDF/A document. All fonts are embedded.
PDDocument targetDoc = Loader.loadPDF(targetDocBytes); PDPage sourcePage = Loader.loadPDF(data).getPage(0); final var copiedPage = targetDoc.importPage(sourcePage); copiedPage.setResources(sourcePage.getResources());
In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if you open it in the Adobe Acrobat.
It shows a lot of errors, if you open it with the PDFBOX PreflightParser.
Here the error messages of the preflight parser:
1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
3.1.14 Invalid Font definition, Unknown font type: XML
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.1.8 Invalid Font definition
3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.
3.1.3 Invalid Font definition, null: FontFile entry is missing from FontDescriptor
3.3.2 Glyph error, invalid font dictionary ==>
and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot version from 15.01.2024.
@Test void importPageWithFonts_validateFontInfo() throws IOException { // given final var targetDocBytes = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf")); String[] additionalFiles = new String[]{ "roboto-14.pdf", }; PDDocument targetDoc = Loader.loadPDF(targetDocBytes); // when for (String fileName : Arrays.asList(additionalFiles)) { byte[] data = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName)); // verify source is valid PDPage sourcePage = Loader.loadPDF(data).getPage(0); final var copiedPage = targetDoc.importPage(sourcePage); copiedPage.setResources(sourcePage.getResources()); targetDoc.save(Files.createTempFile("merged-fonts", ".pdf").toFile()); } Path tmpFile = Files.createTempFile("fscd-merged", ".pdf"); targetDoc.save(tmpFile.toFile(), CompressParameters.DEFAULT_COMPRESSION); // then // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor assertFontsAreValid(tmpFile); } private static void assertFontsAreValid(Path tmpFile) throws IOException { PreflightParser parser = new PreflightParser(tmpFile.toFile()); final var documentToVerify = (PreflightDocument) parser.parse(); // Get validation result final var result = documentToVerify.validate(); final var resultString = result.getErrorsList().stream() .filter(err -> !err.getErrorCode() .matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter findings from the source documents .map(err -> err.getErrorCode() + " " + err.getDetails()).collect(Collectors.joining("\n")); assertTrue(resultString.isBlank(), resultString); }
The problem is still present with the snapshot version 3.0.2-2024-0115.083906-63.
Here is the output preflight parser output of the snapshot version:
1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
The input displays correctly:
The output file doesn't display the font correctly:
Attachments
Attachments
Issue Links
- fixes
-
PDFBOX-5775 importPage destroys annotations
- Closed