Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1242

Handle non ISO-8859-1 chars with drawString

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5.0, 1.6.0
    • 2.0.0
    • Writing
    • None

    Description

      The PDPageContentStream.drawString take a String as argument, it construct a COSString of the input.
      If the input contain chars above 255, the COSString is prefixed 0xFe, 0xff and the bytes are taken from the
      input as "UTF-16BE" encoded.

      Back in the drawString method this unicode16 encoded COSString is appended as a "ISO-8859-1"

      appendRawCommands( new String( buffer.toByteArray(), "ISO-8859-1"));

      The result of this is that a line with UTF-16 chars is shown prefix with þÿ, and with double space between the other chars.
      The chars above 255 are shown as the two corresponding ISO-8859-1 characters.

      As a side question to this observation, is there an alternative way to use Pdfbox, to support UTF16?

      Attachments

        Issue Links

          Activity

            People

              jahewson John Hewson
              peterandersen Peter Andersen
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: