Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2163

inline image with EI in the middle incorrectly parsed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.8.6, 1.8.7, 2.0.0
    • 1.8.7, 2.0.0
    • Parsing

    Description

      This PDF
      http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
      has an exception because the end of an inline image is improperly detected. The stream looks like this:

      BI
        /W 452
        /H 169
        /BPC 8
        /CS /RGB
        /D [0.0 1.0 0.0 1.0 0.0 1.0]
        /F [/A85 /Fl]
      ID
      ......................................................
      ....................................................EI
      ......................................................
      ...
      ....
      EI Q
      

      The inline images are handled in PDFStreamParser. This is tricky, we look for followup bin data to check that it isn't an EI in the middle, but here it isn't bin data, but ascii85 stuff. We also can't request that there be a LF before the EI, because I remember that I had a PDF at work created by a well known company that doesn't use it.

      Attachments

        1. PDFBOX-2163-029016.pdf
          2.24 MB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: