Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4097

Compressed object will lost when brute force search failed to handle compressed streams

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.8
    • 2.0.10, 3.0.0 PDFBox
    • Parsing
    • None

    Description

      Compressed object described in cross-reference streams will lost when brute force search failed to handle such streams.

      The attached PDF has an object 1336, but it had a offset that referenced to object 1828. The inconsistency led to a brute force search. (Introduced by COSParser.checkXrefOffsets)

      During the search (in bfSearchForObjStreams), Object stream 1828, 1829, 1830 failed to decompress due to "corrupted" stream(yes, the Params field was missing in the dictionary or the Filter was wrong). Thus, 462 compressed objects described in cross-reference streams are lost. Since important objects (the Root, the Pages, etc.) referred to objects in 1828 or something, all resolved to null (because the corrected XRefOffsets doens't have them). Further parsing is impossible.

      However, when I tried to bypass checkXrefOffsets, the PDF shows correctly without any (noticeable) error. It seemed that object 1336 is not used in the PDF.

      "Corrupted" 1828:

      1828 0 obj
      <<
      /Length 2176
      /Type /ObjStm
      /N 200
      /First 2103
      /Filter /FlatDecode
      >>
      ...

      It doesn't work well in bfSearchForObjStreams but works in parseObjectStream.

       

      Would it be nice to have a fallback to preserve compressed stream object key offsets, when we some error in brute force search?

      Attachments

        1. 奥美医疗-IPO.pdf
          2.18 MB
          Cheng Zhong

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              hust.zcheng Cheng Zhong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: