[PDFBOX-915] some pdf file for chinese can't extracted by correct encode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: 1.3.1, 2.0.0
Fix Version/s: None
Component/s: Text extraction
Labels:
None
Environment:
jdk1.5

Description

i used the PDFTextStripper to extracted the contents of pdf which include chinese code ,some file can extracted correct ,but some is extracted with wrong code.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

821-2302.pdf
07/Dec/10 08:59
835 kB
chenlong

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: chenlong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Dec/10 08:56

Updated:: 04/Mar/15 11:17

Resolved:: 04/Mar/15 11:17