Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-205

Miscellaneous errors on valid files

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.5, 2.0.0
    • Component/s: Parsing
    • Labels:
      None

      Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1570311
      Originally submitted by renaudw on 2006-10-03 15:04.

      As far as I can tell these files are valid. Some
      generate a warning in Acrobat Reader, some don't.

      I'm reporting them in case you want to have a look at
      the errors they generate. All files are accessible
      online by replacing /web/gfdocs with
      http://bat.library.ucsf.edu/data.

      /web/gfdocs/p/b/x/pbx00a99/pbx00a99.pdf skipped due
      to: java.io.IOException: Error: Expected an integer
      type, actual='BC3c#c3???' (stacktrace follows)
      java.io.IOException: Error: Expected an integer type,
      actual='BC3c#c3???'
      at org.pdfbox.pdfparser.BaseParser.readInt
      (BaseParser.java:1335)
      at org.pdfbox.pdfparser.PDFParser.parseObject
      (PDFParser.java:415)
      at org.pdfbox.pdfparser.PDFParser.parse
      (PDFParser.java:176)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.pa
      rseDocument(TextOverImageDocumentDetector.java:210)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:102)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.ma
      in(TextOverImageDocumentDetector.java:77)

      /web/gfdocs/p/e/b/peb40a99/peb40a99.pdf skipped due
      to: java.io.StreamCorruptedException: Error: data is
      null (stacktrace follows)
      java.io.StreamCorruptedException: Error: data is null
      at org.pdfbox.filter.LZWFilter.decode
      (LZWFilter.java:101)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:319)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:249)
      at org.pdfbox.cos.COSStream.getUnfilteredStream
      (COSStream.java:173)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredS
      tream(COSStreamArray.java:200)
      at org.pdfbox.pdfparser.PDFStreamParser.<init>
      (PDFStreamParser.java:91)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getStreamToken
      s(COSStreamArray.java:141)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream
      (PDFStreamEngine.java:189)
      at
      org.pdfbox.util.PDFStreamEngine.processStream
      (PDFStreamEngine.java:160)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.co
      llectTextRenderingModes
      (TextOverImageDocumentDetector.java:153)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:125)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:103)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.ma
      in(TextOverImageDocumentDetector.java:77)

      /web/gfdocs/q/l/n/qln20a99/qln20a99.pdf skipped due
      to: java.io.StreamCorruptedException: Error: data is
      null (stacktrace follows)
      java.io.StreamCorruptedException: Error: data is null
      at org.pdfbox.filter.LZWFilter.decode
      (LZWFilter.java:101)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:319)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:249)
      at org.pdfbox.cos.COSStream.getUnfilteredStream
      (COSStream.java:173)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredS
      tream(COSStreamArray.java:200)
      at org.pdfbox.pdfparser.PDFStreamParser.<init>
      (PDFStreamParser.java:91)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getStreamToken
      s(COSStreamArray.java:141)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream
      (PDFStreamEngine.java:189)
      at
      org.pdfbox.util.PDFStreamEngine.processStream
      (PDFStreamEngine.java:160)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.co
      llectTextRenderingModes
      (TextOverImageDocumentDetector.java:153)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:125)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:103)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.ma
      in(TextOverImageDocumentDetector.java:77)

      /web/gfdocs/q/p/e/qpe20a99/qpe20a99.pdf skipped due
      to: java.io.StreamCorruptedException: Error: data is
      null (stacktrace follows)
      java.io.StreamCorruptedException: Error: data is null
      at org.pdfbox.filter.LZWFilter.decode
      (LZWFilter.java:101)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:319)
      at org.pdfbox.cos.COSStream.doDecode
      (COSStream.java:249)
      at org.pdfbox.cos.COSStream.getUnfilteredStream
      (COSStream.java:173)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredS
      tream(COSStreamArray.java:200)
      at org.pdfbox.pdfparser.PDFStreamParser.<init>
      (PDFStreamParser.java:91)
      at
      org.pdfbox.pdmodel.common.COSStreamArray.getStreamToken
      s(COSStreamArray.java:141)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream
      (PDFStreamEngine.java:189)
      at
      org.pdfbox.util.PDFStreamEngine.processStream
      (PDFStreamEngine.java:160)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.co
      llectTextRenderingModes
      (TextOverImageDocumentDetector.java:153)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:125)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.de
      tect(TextOverImageDocumentDetector.java:103)
      at
      edu.ucsf.library.utils.TextOverImageDocumentDetector.ma
      in(TextOverImageDocumentDetector.java:77)

        Attachments

        1. PDFBOX205-Spbx00a99.pdf
          8 kB
          Andreas Lehmkühler
        2. PDFBOX205-Speb40a99.pdf
          126 kB
          Andreas Lehmkühler
        3. pdfbox-205-speb40a99.pdf-1.png
          64 kB
          Tilman Hausherr

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                Anonymous
              • Votes:
                1 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: