Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-209

java.lang.OutOfMemoryError while parsing pdf file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Cannot Reproduce
    • None
    • None
    • Parsing
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1581061
      Originally submitted by hui85 on 2006-10-19 23:47.

      I want to parse text from a PDF file and use
      PDFTextStripper. Most of the PDF files work. But in the
      following case I get an OutOfMemoryError.
      The PDF file I want to parse is about 312k and my JVM
      Xmx is about 512m.

      I get the following stackTrace:

      java.lang.OutOfMemoryError
      at java.util.zip.Inflater.inflateBytes(Native Method)
      at java.util.zip.Inflater.inflate(Unknown Source)
      at java.util.zip.InflaterInputStream.read(Unknown Source)
      at
      org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
      at org.pdfbox.cos.COSStream.doDecode(COSStream.java:319)
      at org.pdfbox.cos.COSStream.doDecode(COSStream.java:249)
      at
      org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:173)
      at
      org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91)
      at
      org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:135)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:189)
      at
      org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
      at
      org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
      at
      org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
      at
      org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)

      I use pdfbox-0.7.3, tested on Win 2000, JVM 1.4.2.
      The file that causes the error is to big to attach
      (271K zipped).
      Please mail me and I will send it.

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      see test-1581061.pdf

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      Thanks, please email the PDF to ben@benlitchfield.com or
      upload to ftp.pdfbox.org

      Thanks,
      Ben

      Attachments

        Activity

          People

            tboehme Timo Boehme
            Anonymous Anonymous
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: