Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1694

Bug in org.apache.pdfbox.io.Ascii85InputStream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.1
    • 1.8.3, 2.0.0
    • None
    • Any

    Description

      Method 'org.apache.pdfbox.io.Ascii85InputStream.read()' has bug when reading final set of char that are not modulo-4.
      Test file="www.mzweb.com.br/grupobimbo/web/arquivos/Bimbo_Historia_20070409_Esp.pdf".
      On page#0 there is a dictionary "323 0 obj << /Length 1492 /Filter [/Ascii85Decode /FlateDecode]>>"
      Last set of bytes to decode is "%f" or 0x25, 0x66
      Ascii85InputStream pads this to "%f~!!" and correctly generates the final byte 0x0f.
      Including the '~' end-of-data char in the padding is a major bug.
      If the final padding were "%f!!!", the final byte decoded would be 0x0e (which is wrong).
      The correct padding is the 'u' char, or "%fuuu" (See http://en.wikipedia.org/wiki/Ascii85)
      This is a quick fix.
      The PDF files for corporate website "Grupo Bimbo" include lots of examples using Ascii85Decode/

      Attachments

        1. test.java
          4 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              peterwcostello Peter Costello
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.5h
                  0.5h
                  Remaining:
                  Remaining Estimate - 0.5h
                  0.5h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified