Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1535

Extract text from PDF cause Nullpointer Exception in PDFStreamEngine.processEncodedText Method

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.7.1
    • Fix Version/s: 1.8.0
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      jdk 1.7_17

      Description

      The xpdfbin-win-3.03 -> pdftotext.exe works fine with this pdf File.

      Tried pdfbox Version 1.2.1 too, but same error.

      [org.apache.pdfbox.util.PDFStreamEngine] java.lang.NullPointerException
      java.lang.NullPointerException
      at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:357)
      at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
      at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)

        Attachments

        1. 1.pdf
          41 kB
          Alex
        2. PDFBOX1535-1.txt
          2 kB
          Andreas Lehmkühler

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              acoalex Alex
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: