Uploaded image for project: 'Apache Trafodion'
  1. Apache Trafodion
  2. TRAFODION-2515

Question mark is used instead of Unicode replacement character

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0-incubating
    • Fix Version/s: None
    • Component/s: sql-general
    • Labels:
      None

      Description

      When we convert text to a character set and encounter an invalid character, we should translate it into the "replacement character" of that character set. For ASCII and ISO-8859-1, we just use a question mark, since there is not special replacement character. When we convert to Unicode, however, we should use U+FFFD as the replacement character (often displayed as a black diamond with a question mark inside).

      Test case:

      cqd TRANSLATE_ERROR 'off';
      select converttohex(TRANSLATE(_ucs2 X'D8340041' using UCS2toUTF8)) from (values(0))x;

      The source value is an invalid bit pattern followed by "A" (0041). Right now the result shows 3F41 as the output, as Unicode or ASCII text this is "?A". With the correct replacement character, the result should be EFBFBD41, with EFBFBD being the UTF-8 encoding of U+FFFD.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hzeller Hans Zeller
                Reporter:
                hzeller Hans Zeller
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: