Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5230

Zero-width non-joiner characters visible in generated PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.16
    • None
    • FontBox, PDModel, Writing
    • None

    Description

      I'd like to use the zero-width non-joiner (ZWNJ) character to prevent character shaping in some cases when using Arabic and Indic scripts. This works correctly using some fonts like Arial Unicode (character shaping is prevented and no ZWNJ glyph is visible in the PDF), but does not work correctly when using fonts like Tahoma or Google Noto Sans Regular, where the ZWNJ character is visible in the PDF. The ZWNJ glyph is not visible when using these fonts in other programs, like Microsoft Word.

      I suspect that the `advanceWidth` settings in the `hmtx` table should be taken into account somehow but are not, because the `advanceWidth` for this glyph is 0 in both of these fonts which are erroneously generating visual artifacts for the ZWNJ character (Tahoma and Google Noto Sans Regular).

      Test case generating the attached PDF file:

      public class ZwnjTest {
          public static void main(String[] args) throws IOException {
              try (PDDocument document = new PDDocument()) {
      
                  PDPage page = new PDPage(PDRectangle.LETTER);
                  document.addPage(page);
      
                  try (PDPageContentStream stream = new PDPageContentStream(document, page)) {
      
                      // Tahoma: ZWNJ glyph is a vertical bar, but advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
                      PDFont tahoma = PDType0Font.load(document, new File("C:/Windows/Fonts/tahoma.ttf"));
                      stream.beginText();
                      stream.setFont(tahoma, 20);
                      stream.newLineAtOffset(50, 650);
                      stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C1"); // U+200C = zero width non-joiner
                      stream.endText();
      
                      // Arial Unicode: ZWNJ glyph contains no outline -> not shown in PDF (as expected)
                      PDFont arialu = PDType0Font.load(document, new File("C:/Windows/Fonts/ARIALUNI.TTF"));
                      stream.beginText();
                      stream.setFont(arialu, 20);
                      stream.newLineAtOffset(50, 600);
                      stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C2"); // U+200C = zero width non-joiner
                      stream.endText();
      
                      // Google Noto Sans Regular: ZWNJ glyph is a vertical bar, but advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
                      PDFont gnotos = PDType0Font.load(document, new File("noto-sans-regular.ttf"));
                      stream.beginText();
                      stream.setFont(gnotos, 20);
                      stream.newLineAtOffset(50, 550);
                      stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C3"); // U+200C = zero width non-joiner
                      stream.endText();
                  }
      
                  document.save("zwnj.pdf");
              }
          }
      }
      

      Attachments

        1. Af.pdf
          32 kB
          Tilman Hausherr
        2. zwnj.pdf
          22 kB
          Daniel Gredler
        3. zwnj.png
          8 kB
          Daniel Gredler
        4. zwnj-pdfkit.pdf
          22 kB
          Daniel Gredler

        Activity

          People

            Unassigned Unassigned
            sdanig Daniel Gredler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: