Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5155

Error extracting text from PDF - Can't read the embedded Type1 font FDFBJU+NewsGothic

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.22, 2.0.23
    • Fix Version/s: 2.0.24, 3.0.0 PDFBox
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      Java 11

      Description

      When i try to extract text from command line using pdfbox verision 2.0.22 and 2.023 I get the following error. The pdf is customer specific one, I can't share it here. Is this error because this particular font is not supported by pdfbox?

      Apr 07, 2021 1:55:06 PM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap WARNING: Invalid ToUnicode CMap in font FDFBJU+NewsGothic Apr 07, 2021 1:55:06 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init> SEVERE: Can't read the embedded Type1 font FDFBJU+NewsGothic java.io.IOException: Expected INTEGER or REAL but got NAME at org.apache.fontbox.type1.Type1Parser.arrayToNumbers(Type1Parser.java:256) at org.apache.fontbox.type1.Type1Parser.readSimpleValue(Type1Parser.java:168) at org.apache.fontbox.type1.Type1Parser.parseASCII(Type1Parser.java:139) at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:61) at org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:85) at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:263) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76) at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156) at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:394) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:322) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:269) at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
      

        Attachments

        1. image-2021-04-07-17-11-10-048.png
          53 kB
          nithin nambiar
        2. FDFBJU+NewsGothic-0034.pfa
          15 kB
          nithin nambiar
        3. FDFBJU+NewsGothic-Bold-0050.pfa
          12 kB
          nithin nambiar
        4. FDFBJU+NewsGothic-Bold-0050.pfa
          12 kB
          nithin nambiar
        5. image-2021-04-30-13-22-09-187.png
          59 kB
          Tilman Hausherr
        6. Screenshot 2021-04-30 at 12.34.20.png
          128 kB
          nithin nambiar
        7. image-2021-05-01-09-49-26-222.png
          99 kB
          Tilman Hausherr
        8. image-2021-05-01-12-54-26-202.png
          54 kB
          nithin nambiar
        9. image-2021-05-01-18-07-38-406.png
          52 kB
          nithin nambiar
        10. image-2021-05-04-09-45-53-271.png
          72 kB
          nithin nambiar
        11. image-2021-05-04-09-47-17-536.png
          152 kB
          nithin nambiar
        12. image-2021-05-04-09-47-46-988.png
          49 kB
          nithin nambiar
        13. image-2021-05-04-17-39-26-079.png
          121 kB
          nithin nambiar
        14. image-2021-05-04-17-41-37-186.png
          145 kB
          nithin nambiar
        15. Screenshot 2021-05-04 at 19.37.12.png
          77 kB
          nithin nambiar
        16. Screenshot 2021-05-04 at 19.37.43.png
          64 kB
          nithin nambiar
        17. Screenshot 2021-05-04 at 19.38.05.png
          65 kB
          nithin nambiar
        18. Screenshot 2021-05-04 at 20.49.24.png
          662 kB
          nithin nambiar
        19. Screenshot 2021-05-06 at 14.57.06.png
          170 kB
          nithin nambiar
        20. PDFBOX-679-toobig.pdf
          247 kB
          Tilman Hausherr
        21. QN563JY3FFTF2HHOCOHU3Z72RKCMQH3P-p2-reduced.pdf
          4 kB
          Tilman Hausherr

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              nnambiar nithin nambiar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: