[PDFBOX-1692] java.lang.OutOfMemoryError: Java heap space - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.8.2
Fix Version/s: 1.8.3, 2.0.0
Component/s: Text extraction
Labels:
None
Environment:
Windows 7
java version 1.7.0_17 (build 1.7.0_17-b02/64-Bit Server VM build 23.7-01)
pdfbox-app-1.8.2.jar

Description

Hello,

I have a problem with text extraction.
The problem is not enough memory in VM during the text extraction!

My Code:
String pdfFile = "D:\testfolder\test1fd9a_test.pdf"; //size of file 168 KB
PDDocument document = PDDocument.load(pdfFile, true);

PDFTextStripper stripper = null;
try {
stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
stripper.writeText(document, outputWriter);
} catch () {
}

You get an error:
java.lang.OutOfMemoryError: Java heap space

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Errors_when_buidling_pdfbox.jpg.png
28/Aug/13 09:08
132 kB
Anouar
PDFBOX-1692.patch
25/Aug/13 20:22
1 kB
Tilman Hausherr
test_1fd9a_test.pdf
12/Aug/13 16:17
164 kB
Christian Czech
test_1fd9a_test-01.png
19/Aug/13 21:15
172 kB
Tilman Hausherr
test_1fd9a_test-02.png
19/Aug/13 21:15
83 kB
Tilman Hausherr

Issue Links

depends upon

PDFBOX-1653 Fix pdfbox eating up big chunks of memory for identical CID mappings

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Christian Czech

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Aug/13 16:16

Updated:: 30/Nov/13 17:02

Resolved:: 01/Sep/13 11:01