Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4661

Regression No Unicode mapping with Identity-H font

    XMLWordPrintableJSON

Details

    Description

      In v2.0.16 or v2.0.17 running the following code. The expected output is obtained in v2.0.15 and earlier.

      PDFTextStripper stripper = new PDFTextStripper();
      PDDocument doc = PDDocument.load(new File("KR1020067006547.pdf"));
      stripper.getText(doc);

      results in errors like the following and missing characters

      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+17679 (17679) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+16131 (16131) in font LHXXBJ+¹ÙÅÁ-Identity-H
      Sep 27, 2019 11:49:37 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
      WARNING: No Unicode mapping for CID+14802 (14802) in font LHXXBJ+¹ÙÅÁ-Identity-H

      This change is likely related to PDFBOX-4549

      Attachments

        1. KR1020067006547.pdf
          132 kB
          Daniel Lowe

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              dan2097 Daniel Lowe
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: