Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3338

CCITT Fax decoder fails

    XMLWordPrintableJSON

Details

    Description

      I have a PDF which does not render in PDFBox. It contains pages from a scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the same message just with "white" instead of "black"). Unfortunately, the PDF contains sensitive data and I cannot share it.

      As a test, I have replaced the TIFFFaxDecoder by the class CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked fine after that and PDFToImage produced the expected result.

      I have extracted the first few bytes of the TIFF to show the problem without sharing the confidential content. See the attached test program and test file.

      I have tested this against latest trunk version of PDFBox, but I think the decoder implementation is basically the same in all versions.

      Attachments

        1. TestCCITTFaxDecoder.java
          1 kB
          Petr Slaby
        2. 1.tiff
          0.1 kB
          Petr Slaby
        3. CCITTFaxFilter.patch
          33 kB
          Petr Slaby
        4. CCITTFaxDecoderStream.java
          25 kB
          Tilman Hausherr
        5. PDFBOX-3338-014261-p3.pdf
          104 kB
          Tilman Hausherr
        6. CCITTFaxDecoderStream-Changes-by-Petr-and-Tilman-diff.txt
          5 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              pslabycz Petr Slaby
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: