Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-209

java.lang.OutOfMemoryError while parsing pdf file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Parsing
    • Labels:
      None

      Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1581061
      Originally submitted by hui85 on 2006-10-19 23:47.

      I want to parse text from a PDF file and use
      PDFTextStripper. Most of the PDF files work. But in the
      following case I get an OutOfMemoryError.
      The PDF file I want to parse is about 312k and my JVM
      Xmx is about 512m.

      I get the following stackTrace:

      java.lang.OutOfMemoryError
      at java.util.zip.Inflater.inflateBytes(Native Method)
      at java.util.zip.Inflater.inflate(Unknown Source)
      at java.util.zip.InflaterInputStream.read(Unknown Source)
      at
      org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
      at org.pdfbox.cos.COSStream.doDecode(COSStream.java:319)
      at org.pdfbox.cos.COSStream.doDecode(COSStream.java:249)
      at
      org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:173)
      at
      org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91)
      at
      org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:135)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:189)
      at
      org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
      at
      org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
      at
      org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
      at
      org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)

      I use pdfbox-0.7.3, tested on Win 2000, JVM 1.4.2.
      The file that causes the error is to big to attach
      (271K zipped).
      Please mail me and I will send it.

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      see test-1581061.pdf

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      Thanks, please email the PDF to ben@benlitchfield.com or
      upload to ftp.pdfbox.org

      Thanks,
      Ben

        Attachments

          Activity

            People

            • Assignee:
              tboehme Timo Boehme
              Reporter:
              Anonymous
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: