Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4598

oversized jbig2 decoded result that causing unnecessary operation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.0.2 JBIG2
    • 3.0.3 JBIG2
    • JBIG2
    • None
    • Patch

    Description

      Hi! I am using pdfbox 2.0.16 and jbig2-imageio 3.0.2 to read JBIG2 images, and found some issue to report.

      It seems like jbig2-imageio creates oversized BufferedImage, and this also makes pdfbox to do unnecessary operations.

      To read Jbig2 image, pdfbox with jbig2-imageio do followings:

      1. find JBIG2 ImageReader (https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L67)

      2. read Image and get BufferedImage as a result (https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L106)

      2-1. JBIG2 ImageIO 3.0.2 get decoded bitmap (https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L249)

      2-2. return the given bitmap as buffered image (https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L259)

      The problem is
      At step 2-1, roughly 59MB Bitmap is created for given Jbig2 image on the second page of sample.pdf (which is correct),
      but oversize(473MB, roughly) BufferedImage is returned at the step 2-2.

      I think this is because jbig2-imageio uses a raster based on a PixelInterleavedSampleModel and IndexColorModel with 8 bits.
      https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L177
      https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L286
      https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L291

      This also makes pdfbox to check a pixel size of the color model of result buffered image,
      https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L116

      and to create another BufferedImage with binary type since it is not 1. (jbig2 is 1-bit depth)
      https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L122

      I think we should call createPackedRaster and use the returned raster which is based on MultiPixelPackedSampleModel, and use IndexColorModel with 1 bits since jbig2 is for bi-level image. Please check the attached patch. I tested with the patch, and it seems like this patch works well.

      You can reproduce this issue with the second of the sample.pdf file that I attached.
      You can also download the file from here: http://www.newsgn.com/data/newsgn_com/pdf/201802/2018022229524590.pdf

      Attachments

        1. amb_2.jb2
          28 kB
          Hee Jeong Kim
        2. approach_1.patch
          2 kB
          Hee Jeong Kim
        3. approach_2.patch
          3 kB
          Hee Jeong Kim
        4. approach_3.patch
          4 kB
          Hee Jeong Kim
        5. sample.pdf
          39.28 MB
          Hee Jeong Kim
        6. use_packed_raster_to_read_Jbig2_image.patch
          2 kB
          Hee Jeong Kim

        Activity

          People

            tilman Tilman Hausherr
            Heejeong Kim Hee Jeong Kim
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: