Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4793

Questionable fallback font for some embedded chinese fonts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.18, 2.0.19
    • 2.0.20, 3.0.0 PDFBox
    • Rendering
    • None

    Description

      Issue:
      I tried to render PDFs, that contain embedded chinese fonts. Neither the PDF Debugger, nor printouts of the document (PDFPrintable), nor the PDFRenderer can display/render the chinese glyphs correctly and will render placeholders instead.

      Assumptions:
      I assume, that said embedded fonts are incomplete and don't contain all glyphs, that would be required to render the text properly and therefore PDFbox attempts to use the previously determined fallback font. (!?)


      And fails to find the glyphs in said fallback font.

      Which is not surprising, as the Fallback font "MalgunGothic-Semilight" (Windows standard font) does not contain chinese characters.

      Debugging:
      I tried to understand how the fallback font is determined and what could be done to solve this problem on my end. But I was unable to find a satisfying solution.
      My best guess so far is, that the CIDFontMapping (FontMapperImpl) is to blame for determining an unfit fallback font.
      Although it seems to check, whether required codepages are contained in a fallback font, it still does rank the Malgun font as the topscorer and best substitute font, even though it does clearly not contain all required codepages.

      My opinion:
      This is troubling, as better fit fonts exist and could have been selected. (ie.: Adobe Stong Std) And are indeed included in the CIDFontMapping, but seemingly are scoring lower for some reason.

      Further information:
      I can not disclose the document in question, however I found a document (pdf_font-zhcn.pdf) in another issue (PDFBOX-3132), that can be used to reproduce the issue (ie.: by dropping it into the PDF Debugger)

      Attachments

        1. screenshot-9.png
          34 kB
          Christian Appl
        2. screenshot-8.png
          4 kB
          Christian Appl
        3. screenshot-7.png
          50 kB
          Christian Appl
        4. screenshot-6.png
          107 kB
          Christian Appl
        5. screenshot-5.png
          21 kB
          Christian Appl
        6. screenshot-4.png
          42 kB
          Christian Appl
        7. screenshot-3.png
          8 kB
          Christian Appl
        8. screenshot-2.png
          4 kB
          Christian Appl
        9. screenshot-10.png
          29 kB
          Christian Appl
        10. PDFJS-10699.pdf
          292 kB
          Tilman Hausherr
        11. pdf_font-zhcn.pdf
          238 kB
          Christian Appl
        12. image-2020-03-06-11-49-25-187.png
          4 kB
          Christian Appl
        13. image-2020-03-06-11-48-56-813.png
          34 kB
          Christian Appl
        14. image-2020-03-06-11-35-53-580.png
          198 kB
          Tilman Hausherr
        15. image-2020-03-04-10-31-03-065.png
          29 kB
          Christian Appl
        16. image-2020-03-04-10-09-25-343.png
          7 kB
          Christian Appl
        17. image-2020-03-04-09-58-01-055.png
          2 kB
          Christian Appl
        18. image-2020-03-04-09-49-42-323.png
          2 kB
          Christian Appl

        Activity

          People

            tilman Tilman Hausherr
            capSVD Christian Appl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: