Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4211

Some text is missing in JBIG2 images

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0 JBIG2
    • 3.0.1 JBIG2
    • JBIG2
    • None

    Description

      This PDF has problems.

      See pages "iii" (2), "ix" (6), "x" (7) etc. These pages have most of their text missing or in wrong position when rendered in PDFDebugger. Each page is a JBIG2 image that uses Huffman encoding. You may need to use for example -Xmx1000M to avoid OutOfMemoryError because the JBIG2 images are very large.

      Apply my patch to file EncodedTable.java in package org.apache.pdfbox.jbig2.decoder.huffman. I have fixed one line of code that does not follow the JBIG2 standard.

      The JBIG2 standard is freely available here

      Attachments

        1. EncodedTable.patch
          0.8 kB
          Jani Pehkonen
        2. page_iii_(2).pdf
          209 kB
          Jani Pehkonen

        Activity

          People

            tilman Tilman Hausherr
            janipe Jani Pehkonen
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: