[PDFBOX-5] CJK decoding - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7.0
Component/s: Text extraction
Labels:
None

Description

[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=765686
Originally submitted by bguan on 2003-07-03 17:57.

Another feature I need a lot is the correct interpretation
of CJK encoding.

Yes, I know PDF can be a pain when it comes to
correctly interpreting CJK charsets, as many factors are
involved, including whether a font (or its subset) is
embeded or not.

Attached is a simple Korean PDF that so far has not
been correctly interpreted by any java based
opensource libraries. Though it could be rendered
correctly by XPDF on linux and also Windows.

[attachment on SourceForge]
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552835&aid=765686&file_id=80181
CJK.zip (), 142061 bytes
CJK PDF, output and test program

[comment on SourceForge]
Originally sent by bguan.
Logged In: YES
user_id=815589

Hello Ben,

Thanks for the response. I just downloaded PDFBox 0.6.5 and
wrote a little sample program to test it against 3 CJK PDF files
I have, and the output is still no good. I have attached my
sample program, the 3 PDFs and the output in the attached
zip file.

Can you tell me what I am foing wrong?

The PDF files were generated by using Adobe Acrobat 5.0
using embeded fonts I believe.

Thank you.

[comment on SourceForge]
Originally sent by benlitchfield.
Logged In: YES
user_id=601708

There was no attachment with this. I have done some CJK
work in the 0.6.5 release. Please attach the document and I
can take a look at it.(Make sure you check the 'attach file'
checkbox)

Ben

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBOX5-CJK.zip
10/Mar/10 18:30
139 kB
Andreas Lehmkühler

Issue Links

depends upon

PDFBOX-654 Extracting CJK text

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Anonymous

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 04/Jul/03 00:57

Updated:: 29/May/12 16:21

Resolved:: 06/Nov/11 16:42