Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4661

Regression No Unicode mapping with Identity-H font

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In v2.0.16 or v2.0.17 running the following code. The expected output is obtained in v2.0.15 and earlier.

      PDFTextStripper stripper = new PDFTextStripper();
      PDDocument doc = PDDocument.load(new File("KR1020067006547.pdf"));
      stripper.getText(doc);

      results in errors like the following and missing characters

      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+17679 (17679) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+16131 (16131) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+14802 (14802) in font LHXXBJ+¹ÙÅÁ-Identity-H

      This change is likely related to PDFBOX-4549

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lehmi Andreas Lehmkühler
            dan2097 Daniel Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment