Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4661

Regression No Unicode mapping with Identity-H font

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      In v2.0.16 or v2.0.17 running the following code. The expected output is obtained in v2.0.15 and earlier.

      PDFTextStripper stripper = new PDFTextStripper();
      PDDocument doc = PDDocument.load(new File("KR1020067006547.pdf"));
      stripper.getText(doc);

      results in errors like the following and missing characters

      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+17679 (17679) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+16131 (16131) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+14802 (14802) in font LHXXBJ+¹ÙÅÁ-Identity-H

      This change is likely related to PDFBOX-4549

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              dan2097 Daniel Lowe

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment