Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2163

inline image with EI in the middle incorrectly parsed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.6, 1.8.7, 2.0.0
    • Fix Version/s: 1.8.7, 2.0.0
    • Component/s: Parsing
    • Labels:

      Description

      This PDF
      http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
      has an exception because the end of an inline image is improperly detected. The stream looks like this:

      BI
        /W 452
        /H 169
        /BPC 8
        /CS /RGB
        /D [0.0 1.0 0.0 1.0 0.0 1.0]
        /F [/A85 /Fl]
      ID
      ......................................................
      ....................................................EI
      ......................................................
      ...
      ....
      EI Q
      

      The inline images are handled in PDFStreamParser. This is tricky, we look for followup bin data to check that it isn't an EI in the middle, but here it isn't bin data, but ascii85 stuff. We also can't request that there be a LF before the EI, because I remember that I had a PDF at work created by a well known company that doesn't use it.

        Attachments

        1. PDFBOX-2163-029016.pdf
          2.24 MB
          Tilman Hausherr

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                tilman Tilman Hausherr
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: