[PDFBOX-1353] PDFBox extracts wrong characters for some korean pdf files. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.7.0
Fix Version/s: 1.7.1
Component/s: Text extraction
Labels:
- newbie
Environment:
jdk1.6, both Windows and Linux

Description

PDFBox1.7.0 extracts wrong characters for some korean pdf files with ratio of about 25%.

I attach two pdf files such as those and output.

Thanks a lot.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

output.jpg
02/Jul/12 13:37
295 kB
Michael Chung
test1.pdf
02/Jul/12 13:35
1.32 MB
Michael Chung
test2.pdf
02/Jul/12 13:35
284 kB
Michael Chung

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Michael Chung

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Jul/12 13:28

Updated:: 25/Jul/12 06:01

Resolved:: 15/Jul/12 17:31