Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1535

Extract text from PDF cause Nullpointer Exception in PDFStreamEngine.processEncodedText Method

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.7.1
    • 1.8.0
    • Text extraction
    • None
    • jdk 1.7_17

    Description

      The xpdfbin-win-3.03 -> pdftotext.exe works fine with this pdf File.

      Tried pdfbox Version 1.2.1 too, but same error.

      [org.apache.pdfbox.util.PDFStreamEngine] java.lang.NullPointerException
      java.lang.NullPointerException
      at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:357)
      at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
      at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)

      Attachments

        1. PDFBOX1535-1.txt
          2 kB
          Andreas Lehmkühler
        2. 1.pdf
          41 kB
          Alex

        Activity

          People

            lehmi Andreas Lehmkühler
            acoalex Alex
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: