Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3734

out of memory issue when convert scaned pdf to image

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 2.0.5
    • Fix Version/s: None
    • Component/s: Rendering
    • Labels:
    • Environment:
      win7 64bit, jdk 1.7 64bit

      Description

      i had a scaned pdf file which just 2.8M, when try pdf to image feature, i get OOM with -Xmx200m:


      at java.awt.image.DataBufferByte.<init>(DataBufferByte.java:92)
      at java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleModel.java:415)
      at sun.awt.image.ByteInterleavedRaster.<init>(ByteInterleavedRaster.java:89)
      at sun.awt.image.ByteInterleavedRaster.createCompatibleWritableRaster(ByteInterleavedRaster.java:1281)
      at sun.awt.image.ByteInterleavedRaster.createCompatibleWritableRaster(ByteInterleavedRaster.java:1292)
      at org.apache.pdfbox.filter.DCTFilter.fromBGRtoRGB(DCTFilter.java:246)
      at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:171)
      at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
      at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
      at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
      at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:124)
      at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
      at org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
      at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
      at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:206)
      at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145)

      After i enlarge jvm max heap size to 500M, then it works.

      I know pdf rendering is very difficulty, but do we have some manner to avoid consumpting so much memory? whatever it is a bit surprized pdfbox use 500M memory to handle one page of scaned pdf (total 2.8M). ratio is around 200 times.

      But as per me, it is ok to decrease some quality of image converted. (actually the quality of original image in pdf not good as well. ). Tell me if we do have such methods. I will help try.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yachunmiao Yachun Miao
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: