Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3864

UTF16 encoded string to PDFDocEncoding

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.6
    • Fix Version/s: 2.0.7, 3.0.0 PDFBox
    • Component/s: PDModel
    • Labels:
      None

      Description

      From Andrea Vacondio in the mailing list:

      Hi, we came across this case where we are basically cloning outline items
      where the original outline title is a UTF16BE encoded text string
      containing the value 00A0 (non break space). We later use the string to
      assign the title in a new outline item and the A0 is recognised as a € sign.
      Here is a simple test:

              COSString victim = COSString
                      .parseHex("FEFF004300680061007000740065007200A0");
              PDOutlineItem node = new PDOutlineItem();
              node.setTitle(victim.getString());
      

      If you look at the node dictionary you'll see that the title value is
      Chapter€

      The cause is that in the initialization of PDFDocEncoding it was forgotten that there are "holes" in the 0..255 sequence. I'll add that and a test.

        Attachments

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              tilman Tilman Hausherr
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: