Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1283

Unicode characters displayed with wrong Advance


    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 2.0.0
    • Component/s: Swing GUI
    • Labels:


      The file AnnahmeReport_MitRussischTest.pdf is not displayed correctly. The advance of the characters is calculated wrong. The document is displayed correctly in Adobe Reader.

      In PDCIDFont.java the method extractWidths() fills widthCache with the character widths based on the array in the "W" Dictionary. The widthCache seems to translate from from Unicode to character width but the "W" Dictionary translates from CID-code to character width.

      In this PDF file the TTF font is embedded and the CID code is identical to the glyph code in the TTF font. A cmap maps from unicode directly to the cid/gid in the ttf font.

      So this cache is filled in the wrong way or when accessing the cache it is not taken into account that this array containes the widths based on the cid/gid.

      The cmap encoding has to be used when filling the cache or when reading the values from the cache

      I checked if Adobe Reader uses the values in /W to determine the widths to rule out the case that
      the PDF file is faulty and adobe reader just ignores the faulty /W array.

      When changing the entries for the glyphs number 20..23 in the /W array of the bold font
      (first 4 values in the second line of the array which match to characters '1'..'4')
      then the numbers are displayed with wrong widths in AdobeReader while nothing changes in PDFBox.
      (file AnnahmeReport_MitRussischTest_Modified.pdf)




            • Assignee:
              lehmi Andreas Lehmkühler
              duesi Daniel Schwinn
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: