Details
-
New Feature
-
Status: Closed
-
Resolution: Fixed
-
None
-
None
Description
[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=765686
Originally submitted by bguan on 2003-07-03 17:57.
Another feature I need a lot is the correct interpretation
of CJK encoding.
Yes, I know PDF can be a pain when it comes to
correctly interpreting CJK charsets, as many factors are
involved, including whether a font (or its subset) is
embeded or not.
Attached is a simple Korean PDF that so far has not
been correctly interpreted by any java based
opensource libraries. Though it could be rendered
correctly by XPDF on linux and also Windows.
[attachment on SourceForge]
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552835&aid=765686&file_id=80181
CJK.zip (), 142061 bytes
CJK PDF, output and test program
[comment on SourceForge]
Originally sent by bguan.
Logged In: YES
user_id=815589
Hello Ben,
Thanks for the response. I just downloaded PDFBox 0.6.5 and
wrote a little sample program to test it against 3 CJK PDF files
I have, and the output is still no good. I have attached my
sample program, the 3 PDFs and the output in the attached
zip file.
Can you tell me what I am foing wrong?
The PDF files were generated by using Adobe Acrobat 5.0
using embeded fonts I believe.
Thank you.
[comment on SourceForge]
Originally sent by benlitchfield.
Logged In: YES
user_id=601708
There was no attachment with this. I have done some CJK
work in the 0.6.5 release. Please attach the document and I
can take a look at it.(Make sure you check the 'attach file'
checkbox)
Ben
Attachments
Attachments
Issue Links
- depends upon
-
PDFBOX-654 Extracting CJK text
- Closed