Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5033

CFF FontParser exits with illegal offset in font

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.16, 2.0.20, 2.0.21
    • Fix Version/s: 1.8.17, 2.0.22, 3.0.0 PDFBox
    • Component/s: FontBox
    • Labels:
      None

      Description

      Dear Devs,

      we've encountered an issue with version 2.0.20 and 2.0.21 of PDFbox when trying to parse a PDF for text extraction that seem to have existed before seeĀ FOP-2751.

      I reproduced this issue with the pdfbox-app and the FuturaStd-Book.pdf of FOP-2751:

      Console output
      java -jar pdfbox-app-2.0.21.jar ExtractText FuturaStd-Book.pdf 
      Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
      SCHWERWIEGEND: Can't read the embedded Type1C font FuturaStd-Book
      java.io.IOException: illegal offset value 2949166 in CFF font
              at org.apache.fontbox.cff.CFFParser.readIndexDataOffsets(CFFParser.java:192)
              at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:201)
              at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:484)
              at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:122)
              at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:75)
              at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:102)
              at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:74)
              at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
              at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
              at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
              at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
              at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
              at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
              at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
              at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:397)
              at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:325)
              at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
              at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
              at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
              at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
              at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
      
      Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
      WARNUNG: New fonts found, font cache will be re-built
      Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
      WARNUNG: Building on-disk font cache, this may take a while
      Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
      WARNUNG: Finished building on-disk font cache, found 550 fonts
      Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
      WARNUNG: Using fallback font Courier for FuturaStd-Book
      

      Other examples fonts causing this issue are:

      • Can't read the embedded Type1C font COGXUZ+MetaPlusNormal-Caps
      • Can't read the embedded Type1C font DJTRFS+MetaPlusBold-CapsItalic
      • Can't read the embedded Type1C font EAFTRP+MetaPlusNormal-Caps
      • Can't read the embedded Type1C font GQHJVM+MetaPlusNormal-CapsItalic
      • Can't read the embedded Type1C font GUEVYR+MetaPlusBold-CapsItalic
      • Can't read the embedded Type1C font HYTBMP+MetaPlusNormal-CapsItalic
      • Can't read the embedded Type1C font IJCQXI+MetaPlusMedium-Caps
      • Can't read the embedded Type1C font JRIYJF+MetaPlusNormal-Caps
      • Can't read the embedded Type1C font JSQSJF+NeuzeitGro-Reg
      • Can't read the embedded Type1C font KUZTXD+MetaPlusBook-Roman
      • Can't read the embedded Type1C font LWIPLB+1496148105355.00001Arial.000-1
      • Can't read the embedded Type1C font MCDJBA+MetaSerif-BoldIta
      • Can't read the embedded Type1C font UNLUJK+Barmeno-Medium

      I couldn't find another issue about this. Is this already known?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                nuwanda Marius Heinzmann
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: