Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.16, 2.0.17
Description
In v2.0.16 or v2.0.17 running the following code. The expected output is obtained in v2.0.15 and earlier.
PDFTextStripper stripper = new PDFTextStripper();
PDDocument doc = PDDocument.load(new File("KR1020067006547.pdf"));
stripper.getText(doc);
results in errors like the following and missing characters
Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+17679 (17679) in font LHXXBJ+¹ÙÅÁ-Identity-H
Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+16131 (16131) in font LHXXBJ+¹ÙÅÁ-Identity-H
Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+14802 (14802) in font LHXXBJ+¹ÙÅÁ-Identity-H
This change is likely related to PDFBOX-4549
Attachments
Attachments
Issue Links
- is related to
-
PDFBOX-5090 Missing text extraction under certain conditions starting with apache pdfbox 2.0.18
- Closed
-
PDFBOX-5350 Regression unicode mapping in Korean document
- Closed