Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3864

UTF16 encoded string to PDFDocEncoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.6
    • 2.0.7, 3.0.0 PDFBox
    • PDModel
    • None

    Description

      From torakiki in the mailing list:

      Hi, we came across this case where we are basically cloning outline items
      where the original outline title is a UTF16BE encoded text string
      containing the value 00A0 (non break space). We later use the string to
      assign the title in a new outline item and the A0 is recognised as a € sign.
      Here is a simple test:

              COSString victim = COSString
                      .parseHex("FEFF004300680061007000740065007200A0");
              PDOutlineItem node = new PDOutlineItem();
              node.setTitle(victim.getString());
      

      If you look at the node dictionary you'll see that the title value is
      Chapter€

      The cause is that in the initialization of PDFDocEncoding it was forgotten that there are "holes" in the 0..255 sequence. I'll add that and a test.

      Attachments

        Activity

          People

            tilman Tilman Hausherr
            tilman Tilman Hausherr
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: