Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3768

Optimize SampledImageReader.from1Bit()

    XMLWordPrintableJSON

Details

    Description

      The from1bit() path passes a raster to colorSpace.toRGBImage(raster) where an RGB BufferedImage is created, which means a big memory footprint for scanned images.

      I tried optimizing by using the raster to create smaller BufferedImages. Instead of calling colorSpace.toRGBImage(raster) where the raster would be copied into an RGB image, I did this:

      byte[] indexedValues = new byte[] { 0, (byte)0xFF };
      ColorModel colorModel = new IndexColorModel(1, 2, indexedValues, indexedValues, indexedValues);
      return new BufferedImage(colorModel, raster, false, null);
      

      Sadly, this resulted in a bigger memory footprint.

      Lowest possible -Xmx setting to convert a file with 300dpi A4 scans: 76m
      With the optimization: 123m

      The stack trace suggests that java copies the image to an RGB image:

      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.awt.image.DataBufferInt.<init>(Unknown Source)
          at java.awt.image.Raster.createPackedRaster(Unknown Source)
          at java.awt.image.DirectColorModel.createCompatibleWritableRaster(Unknown Source)
          at java.awt.image.BufferedImage.<init>(Unknown Source)
          at sun.java2d.loops.GraphicsPrimitive.convertFrom(Unknown Source)
          at sun.java2d.loops.GraphicsPrimitive.convertFrom(Unknown Source)
          at sun.java2d.loops.MaskBlit$General.MaskBlit(Unknown Source)
          at sun.java2d.loops.Blit$GeneralMaskBlit.Blit(Unknown Source)
          at sun.java2d.pipe.DrawImage.blitSurfaceData(Unknown Source)
          at sun.java2d.pipe.DrawImage.renderImageCopy(Unknown Source)
          at sun.java2d.pipe.DrawImage.copyImage(Unknown Source)
          at sun.java2d.pipe.DrawImage.copyImage(Unknown Source)
          at sun.java2d.pipe.ValidatePipe.copyImage(Unknown Source)
          at sun.java2d.SunGraphics2D.copyImage(Unknown Source)
          at sun.java2d.pipe.DrawImage.makeBufferedImage(Unknown Source)
          at sun.java2d.pipe.DrawImage.renderImageXform(Unknown Source)
          at sun.java2d.pipe.DrawImage.transformImage(Unknown Source)
          at sun.java2d.pipe.DrawImage.transformImage(Unknown Source)
          at sun.java2d.pipe.DrawImage.transformImage(Unknown Source)
          at sun.java2d.pipe.ValidatePipe.transformImage(Unknown Source)
          at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
          at org.apache.pdfbox.rendering.PageDrawer.drawBufferedImage(PageDrawer.java:1007) 
      

      After I mentioned this on the dev mailing list, pslabycz replied:

      your message caught my attention, so I could not resist to try and investigate it a little. I did not get too far and do not have the time to do any tests, but maybe at least a small hint. To at least have a chance that the sun java2d machinery draws the image without converting it first, BufferedImage.getType() must return something else than TYPE_CUSTOM. (At least I think so) For IndexColorModel, the raster has to be either BytePackedRaster or ByteComponentRaster. ByteComponentRaster resulting in BufferedImage type TYPE_BYTE_INDEXED is a safer bet.

      So I looked at the source of BufferedImage and everything created by a user is TYPE_CUSTOM. Thus I tried using a TYPE_BYTE_BINARY image, but I got the same OOM stack trace suggesting a copying is taking place. I tried getting drawImage in the debugger but couldn't. But a look at the source code

      http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/sun/java2d/pipe/DrawImage.java

      shows at line 381 that java wants a "helper" and if there isn't, then it will convert to RGB / ARGB. And that is what's done according to the stack trace.

      What I didn't search in the source code is what "helpers" would be available.

      Then, in an act of desperation, I tried TYPE_BYTE_GRAY. This worked! It uses 1 byte per pixel, thus saves 2/3 of the RGB footprint, and the intermediate raster.

      Minimal -Xmx setting got down to -Xmx26m.

      Attachments

        Activity

          People

            tilman Tilman Hausherr
            tilman Tilman Hausherr
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: