Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2402

NonSequentialPDFParser cannot recover from spurious closing brackets

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.8, 2.0.0
    • Fix Version/s: 1.8.8, 2.0.0
    • Component/s: Parsing
    • Labels:
      None

      Description

      The NonSequentialPDFParser fails if an object has a spurious closing tag (for example, a PDFArray with two closing brackets). In lenient mode, it would be good to at least attempt recovering from that. The attached patch, instead of throwing an exception in case the endObject string is not "endobj" or " obj", skips a character (the spurious character) and tries reading a string. It continues until either the file ends or an "endobj" is found.

      I have a document where this worked but I am not allowed to upload it, unfortunately. In any case the patch cannot make things worse, since it replaces throwing an exception with at least attempting to recover from it.

        Attachments

        1. NonSequentialPDFParser.patch
          2 kB
          Michele Balistreri
        2. file screenshot.png
          84 kB
          Michele Balistreri

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                briksoftware Michele Balistreri
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: