Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4811

Glyphs getting lost when rendering

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.19
    • 2.0.20, 3.0.0 PDFBox
    • FontBox
    • None

    Description

      I missed a rendering change (sorry) in the linked PDF.js issue that happened in PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad input due to having different data.

      The CMap has these ranges:

      4 begincodespacerange
      <00><7f>
      <c080><dfbf>
      <e08080><efbfbf>
      <f0808080><f7bfbfbf>
      endcodespacerange
      

      The content stream has segments like

      (Check\340up Date:2020/ 3/ 4  11:46) Tj
      

      0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range fits, and this means it reads 4 bytes until it noticed that this has failed. After the failure it doesn't reposition. So this is displayed as "Check ·Date" instead of "Check -up Date", i.e. input is lost. The "·" is the default glyph.

      The solution is to remember the position and to reposition there. I'm using mark() and reset() which, surprisingly, works both when loading in memory and when loading with temp file.

      Attachments

        1. PDFJS-11768.pdf-1-before.png
          113 kB
          Tilman Hausherr
        2. PDFJS-11768.pdf-1-after.png
          116 kB
          Tilman Hausherr

        Activity

          People

            tilman Tilman Hausherr
            tilman Tilman Hausherr
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: