Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This is an update for PDFBOX-420 filed by Takashi Komatsubara.
In this patch, if "Identity-H" is used as encoding of font and the font doesn't supply TO_UNICODE table, then encoding name is generated from CID information (Registry and Ordering). This idea is borrowed from pdfminer[1], an another PDF library written in Python. I don't see any test failures with this patch.
I have published this patch last year[2], and got some good feedbacks from Japanese users[3].
[1] http://www.unixuser.org/~euske/python/pdfminer/index.html
[2] https://code.launchpad.net/~aishimoto/+junk/pdfbox-ja,
https://code.launchpad.net/~aishimoto/+junk/pdfbox-1.0.0-ja
[3] http://d.hatena.ne.jp/atsuoishimoto/20091211/1260533539
Attachments
Attachments
Issue Links
- is depended upon by
-
PDFBOX-55 Invalid character while extracting text from a chinese pdf
- Closed
-
PDFBOX-5 CJK decoding
- Closed
- relates to
-
PDFBOX-420 Japanese Characters are garbled.
- Closed
-
PDFBOX-259 support request chinese-traditional
- Closed