[PDFBOX-4265] Not able to extract text from Japanese PDF - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Not A Bug
Affects Version/s: 2.0.2
Fix Version/s: None
Component/s: Text extraction
Labels:
None
Environment:
Windows 10, Region settings set to Japanese

Description

Not able to extract text from Japanese PDF attached(jpn.pdf).

Although, it works well with another Japanese PDF.

Also, Is there any overloaded method that accepts Encoding for text extraction? If yes, please let us know.

Thank you.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jpn.pdf
13/Jul/18 05:59
293 kB
Viral Valand
jpn.txt
13/Jul/18 09:01
5 kB
Viral Valand
CommandLine.txt
13/Jul/18 09:02
66 kB
Viral Valand

Activity

People

Assignee:: Unassigned

Reporter:: Viral Valand

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Jul/18 06:00

Updated:: 13/Jul/18 16:36

Resolved:: 13/Jul/18 16:36