Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5752

Font errors after copying a page to another document

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.0.1 PDFBox
    • 3.0.2 PDFBox, 4.0.0
    • Writing
    • None

    Description

      I try to merge import a page into a pdf document and copy the font resources. With PDFBOX 2.0 the code worked perfectly fine, as expected, there is a result document, including the required, embedded fonts. 

      Essentially I'm doing this steps in the code, while the first document is one empty page PDF/A, and the second document contains the roboto font, also a PDF/A document. All fonts are embedded.

       

      PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
      PDPage sourcePage = Loader.loadPDF(data).getPage(0);
      final var copiedPage = targetDoc.importPage(sourcePage);
      copiedPage.setResources(sourcePage.getResources());

      In PDFBOX 3.0 it doesn't seeem to work any more, the document is corrupted if you open it in the Adobe Acrobat. 

      It shows a lot of errors, if you open it with the PDFBOX PreflightParser.

      Here the error messages of the preflight parser:

      1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.1.3 Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
      3.1.14 Invalid Font definition, Unknown font type: XML
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.1.8 Invalid Font definition
      3.1.2 Invalid Font definition, BCDGEE+TimesNewRomanPS-BoldMT: some mandatory fields are missing from the FontDescriptor: Type, ItalicAngle, FontBBox, Ascent, FontName, StemV, Flags, CapHeight, Descent.
      3.1.3 Invalid Font definition, null: FontFile entry is missing from FontDescriptor
      3.3.2 Glyph error, invalid font dictionary ==> 

      and here the complete test case. I used PDFBox 3.0.1 and the newest snapshot version from 15.01.2024.

       

         @Test
          void importPageWithFonts_validateFontInfo() throws IOException {
              // given
              final var targetDocBytes = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream("empty.pdf"));
              String[] additionalFiles = new String[]{
                  "roboto-14.pdf",
              };
              PDDocument targetDoc = Loader.loadPDF(targetDocBytes);
      
              // when
              for (String fileName : Arrays.asList(additionalFiles)) {
                  byte[] data = IOUtils.toByteArray(PdfUtilitiesTest.class.getClassLoader().getResourceAsStream(fileName));
                  // verify source is valid
                  PDPage sourcePage = Loader.loadPDF(data).getPage(0);
                  final var copiedPage = targetDoc.importPage(sourcePage);
                  copiedPage.setResources(sourcePage.getResources());
                  targetDoc.save(Files.createTempFile("merged-fonts", ".pdf").toFile());
              }
              Path tmpFile = Files.createTempFile("fscd-merged", ".pdf");
              targetDoc.save(tmpFile.toFile(), CompressParameters.DEFAULT_COMPRESSION);
      
              // then
              // font errors, e.g. Invalid Font definition, BCDFEE+Roboto-Regular: FontFile entry is missing from FontDescriptor
              assertFontsAreValid(tmpFile);
          }
          private static void assertFontsAreValid(Path tmpFile) throws IOException {
              PreflightParser parser = new PreflightParser(tmpFile.toFile());
              final var documentToVerify = (PreflightDocument) parser.parse();
              // Get validation result
              final var result = documentToVerify.validate();
              final var resultString = result.getErrorsList().stream()
                  .filter(err -> !err.getErrorCode()
                      .matches("7\\.11\\.2|3\\.1\\.11|2\\.1\\.2|2\\.2\\.1|2\\.4\\.3")) // filter findings from the source documents
                  .map(err -> err.getErrorCode() + " " + err.getDetails()).collect(Collectors.joining("\n"));
              assertTrue(resultString.isBlank(), resultString);
          }
      

       

      The problem is still present with the snapshot version 3.0.2-2024-0115.083906-63.

       

      Here is the output preflight parser output of the snapshot version:

      1.4 Trailer Syntax error, /XRef cross reference streams are not allowed
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding
      3.3.1 Glyph error, The character code 0 in the font program "BCDEEE+Calibri" is missing from the Character Encoding

       

      The input displays correctly:

      The output file doesn't display the font correctly:

      Attachments

        1. roboto-14.pdf
          16 kB
          Christian Haegele
        2. empty.pdf
          29 kB
          Christian Haegele
        3. image-2024-01-16-07-41-16-462.png
          36 kB
          Christian Haegele
        4. target-merged882552058302116763.pdf
          35 kB
          Christian Haegele
        5. image-2024-01-16-07-46-04-195.png
          35 kB
          Christian Haegele
        6. image-2024-01-16-07-47-05-883.png
          25 kB
          Christian Haegele

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              karma-works Christian Haegele
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: