Details

      Description

      I have a PDF which does not render in PDFBox. It contains pages from a scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the same message just with "white" instead of "black"). Unfortunately, the PDF contains sensitive data and I cannot share it.

      As a test, I have replaced the TIFFFaxDecoder by the class CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked fine after that and PDFToImage produced the expected result.

      I have extracted the first few bytes of the TIFF to show the problem without sharing the confidential content. See the attached test program and test file.

      I have tested this against latest trunk version of PDFBox, but I think the decoder implementation is basically the same in all versions.

        Attachments

        1. 1.tiff
          0.1 kB
          Petr Slaby
        2. CCITTFaxDecoderStream.java
          25 kB
          Tilman Hausherr
        3. CCITTFaxDecoderStream-Changes-by-Petr-and-Tilman-diff.txt
          5 kB
          Tilman Hausherr
        4. CCITTFaxFilter.patch
          33 kB
          Petr Slaby
        5. PDFBOX-3338-014261-p3.pdf
          104 kB
          Tilman Hausherr
        6. TestCCITTFaxDecoder.java
          1 kB
          Petr Slaby

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                pslabycz Petr Slaby
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: