Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
Description
[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1581061
Originally submitted by hui85 on 2006-10-19 23:47.
I want to parse text from a PDF file and use
PDFTextStripper. Most of the PDF files work. But in the
following case I get an OutOfMemoryError.
The PDF file I want to parse is about 312k and my JVM
Xmx is about 512m.
I get the following stackTrace:
java.lang.OutOfMemoryError
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Unknown Source)
at java.util.zip.InflaterInputStream.read(Unknown Source)
at
org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:319)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:249)
at
org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:173)
at
org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91)
at
org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:135)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:189)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
I use pdfbox-0.7.3, tested on Win 2000, JVM 1.4.2.
The file that causes the error is to big to attach
(271K zipped).
Please mail me and I will send it.
[comment on SourceForge]
Originally sent by benlitchfield.
Logged In: YES
user_id=601708
see test-1581061.pdf
[comment on SourceForge]
Originally sent by benlitchfield.
Logged In: YES
user_id=601708
Thanks, please email the PDF to ben@benlitchfield.com or
upload to ftp.pdfbox.org
Thanks,
Ben