Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-696

PDTrueTypeFont limits number of glyph widths to 256. This can easily be removed.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: Parsing
    • Labels:
    • Environment:
      Ubuntu Karmic

      Description

      Currently the support for fonts with exotic glyphs are limited at best. Making PDFBox render chinese characters has proved to be a bit of a pain ...

      One blocker we ran into was the limitation of glyph widths to 256 individual widths. In PDTrueTypeFont.java, we find this in loadDescriptorDictionary():

      int firstChar = 0;
      int maxWidths=256;
      HorizontalMetricsTable hMet = ttf.getHorizontalMetrics();
      int[] widthValues = hMet.getAdvanceWidth();
      List widths = new ArrayList(maxWidths);

      The "int maxWidths=256" affects the remaining code so glyph widths for codepoints larger than 256 are ignored. We found that there is no need to impose such a limitation, and that having it makes it impossible to generate a proper /W dictionary when generating a cidfonttype2. Simply replacing the hard coded value 256 with the following seems to be a perfectly usable solution:

      int firstChar = 0;
      //int maxWidths=256; <---- No hard coded value
      int maxWidths = glyphToCCode.length; // <---- rather use the counted number of codepoints
      HorizontalMetricsTable hMet = ttf.getHorizontalMetrics();
      int[] widthValues = hMet.getAdvanceWidth();
      List widths = new ArrayList(maxWidths);
      Integer zero = new Integer( 250 );

      Is it possible to have this change added to 1.2.0?

      Also we would be more than happy to contribute some code that shows how you can use PDFBox to produce PDF's containing special characters (asian, chinese etc) by using codepoint-to-glyph mapping and copy-paste working (/tounicode). The code allows API users to simply use UTF-8 strings and not worry about any of the tricky font handling details.

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              michael.berg@bergconsult.com Michael Berg
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: