[PDFBOX-3792] Getting lots of warnings "No Unicode mapping for..." when extract text - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.0.5
Fix Version/s: None
Component/s: Text extraction
Labels:
None

Flags:

Important

Description

When I use PDFbox to extract text, I get lots of warnings and as output I only get garbage. But when I use Abode Acrobat to export the attached PDF file to text, it works fine. I have attached the original PDF file, the text output and the log with warnings. And besides, PDF file seems to have a Type-1 font embedded with a custom encoding.I have checked lots of reports on JIRA issue tracker, still find no way to solve it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OutputText.txt
15/May/17 07:03
0.3 kB
sunny xia
IssueLog.txt
15/May/17 07:03
10 kB
sunny xia
FileWithIssue.pdf
15/May/17 07:03
21 kB
sunny xia

Activity

People

Assignee:: Unassigned

Reporter:: sunny xia

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/May/17 07:03

Updated:: 19/May/17 09:58

Resolved:: 19/May/17 09:58