Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5358

Add support for UTF-8 in strings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Peter Wyatt recently published an article on UTF-8 strings in PDF 2.0: https://www.pdfa.org/understanding-utf-8-in-pdf-2-0/

      The article includes a link to a test file he created: https://github.com/pdf-association/pdf20examples/blob/master/pdf20-utf8-test.pdf 

      Our debugger shows that we may need to add support for this (see attached).  This was with PDFBox 2.0.25.  I didn't have a chance to test with 3.x or the 2.x snapshot.

      I don't think we're necessarily covering all the changes yet in PDF 2.0, but I thought I'd open this issue for at least discussion.

      Attachments

        1. image-2022-01-15-20-14-26-875.png
          20 kB
          Tilman Hausherr
        2. screenshot-1.png
          101 kB
          Tilman Hausherr
        3. Screen Shot 2022-01-06 at 9.18.09 AM.png
          301 kB
          Tim Allison

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: