Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5328

Failing to get multiple encodings from cmap table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.8.16, 2.0.24
    • 2.0.25, 3.0.0 PDFBox
    • FontBox
    • None

    Description

      As reported by Ty Lewis in the users mailing list, see here

      Unicode encodings for GID 8712: List(U+f967)
      Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 3):
      List(U+4e0d, U+f967)
      Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 4):
      List(U+f967)
      

      I made some java code to reproduce this:

      File fontFile = new File("NotoSansSC-Regular.otf");
      OTFParser otfParser = new OTFParser(false);
      OpenTypeFont otf = otfParser.parse(fontFile);
      
      CmapLookup unicodeCmapLookup = otf.getUnicodeCmapLookup();
      List<Integer> charCodes = unicodeCmapLookup.getCharCodes(8712);
      System.out.println(charCodes);
      
      CmapTable cmapTable = otf.getCmap();
      CmapSubtable unicodeFullCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_FULL);
      
      CmapSubtable unicodeBmpCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_BMP);
      
      List<Integer> unicodeBmpCharCodes = unicodeBmpCmapTable.getCharCodes(8712);
      List<Integer> unicodeFullCharCodes = unicodeFullCmapTable.getCharCodes(8712);
      
      System.out.println(unicodeBmpCharCodes);
      System.out.println(unicodeFullCharCodes);
      

      A look in the tables with DTL OTMaster 3.7 light shows there are indeed two entries. A search for them (in hex) shows the characters 不 and 不.

      Attachments

        1. NotoSansSC-Regular.otf
          8.09 MB
          Tilman Hausherr

        Activity

          People

            tilman Tilman Hausherr
            tilman Tilman Hausherr
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: