[PDFBOX-890] Can't extract text from PDF - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 1.5.0
Component/s: Text extraction
Labels:
None

Description

I have created a simply pdf by using Bullzip PDF printer (virtual Windows printer).
PDFBOX is not able to parse text from this PDF, it just return some low ascii chars.

command:
@java -jar pdfbox-app-1.3.1.jar ExtractText -console test.pdf

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBOX-890.patch
19/Nov/10 17:48
0.8 kB
Martijn Brinkers
test.pdf
09/Nov/10 14:30
7 kB
Igor Spasic

Issue Links

is depended upon by

TIKA-547 Can't extract PDF text

Resolved

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Igor Spasic

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/Nov/10 14:29

Updated:: 06/Apr/13 14:55

Resolved:: 06/Apr/13 14:55