Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3448

NullPointerException at org.apache.pdfbox.pdmodel.common.COSArrayList.convertFloatCOSArrayToList

    Details

      Description

      A number of valid PDF documents failing in Apache Tika 1.14-SNAPSHOT (PDF Box 2.0.2) on text extraction with following exception:

      org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@3e14c16d
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at com.wolterskluwer.atlas.transformer.processFileResources.DocumentsTextExtractor.extractText(DocumentsTextExtractor.java:44)
      at com.wolterskluwer.atlas.transformer.processFileResources.DocumentsTextExtractor.main(DocumentsTextExtractor.java:134)
      Caused by: java.lang.NullPointerException
      at org.apache.pdfbox.pdmodel.common.COSArrayList.convertFloatCOSArrayToList(COSArrayList.java:297)
      at org.apache.pdfbox.pdmodel.font.PDFont.getWidths(PDFont.java:462)
      at org.apache.pdfbox.pdmodel.font.PDFont.getWidth(PDFont.java:229)
      at org.apache.pdfbox.pdmodel.font.PDFont.getDisplacement(PDFont.java:212)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:695)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:564)
      at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
      at org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:136)
      at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
      at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:144)
      at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
      at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
      at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:112)
      at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:151)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      ... 6 more

      Please, find failing documents and log with exceptions StackTrace in attachments.

        Attachments

        1. 101119respmoeotprovidewitlist.pdf
          399 kB
          Yauheni Salopiy
        2. 110111respmemosuppmodiscov.pdf
          773 kB
          Yauheni Salopiy
        3. 110111respmoordcompeldisc.pdf
          896 kB
          Yauheni Salopiy
        4. 110111respmoordcompelexhibad.pdf
          1.53 MB
          Yauheni Salopiy
        5. 110111respmoordcompelexhibeg.pdf
          1.95 MB
          Yauheni Salopiy
        6. 110131respspprevieworddeny.pdf
          4.62 MB
          Yauheni Salopiy
        7. 110208respfinalstip.pdf
          1.72 MB
          Yauheni Salopiy
        8. 130429hospauthalbanydoughccrequestadmiss.pdf
          466 kB
          Yauheni Salopiy
        9. PDFBOX-3448_LOG.txt
          21 kB
          Yauheni Salopiy

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              Genstr Yauheni Salopiy
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: