Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2477

Invalid characters in UCS2 to UTF8 translation are not handled correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0-incubating
    • 2.2.0
    • sql-cmp
    • None

    Description

      When translating from UCS-2 to UTF-8, using CAST or TRANSLATE(... UCS2TOUTF8), all valid characters will map easily to a UTF-8 character. However, if we encounter invalid code points or invalid UTF-16 surrogate pairs, those could raise errors. Right now we just suppress those errors. Instead we should either translate them to the Unicode "replacement character" U+FFFD or we should raise an error. Ideally, we should have a CQD that decides which of these two actions to take.

      Test case:

      create table tbaducs2(a char(10) character set ucs2);

      – DC00 is a low-order UTF-16 surrogate, on its own this is invalid
      insert into tbaducs2 values(_ucs2 X'DC000041');

      select translate(a using ucs2toutf8) from tbaducs2;
      – this returns an empty string - no error, no replacement character

      Attachments

        Issue Links

          Activity

            People

              hzeller Hans Zeller
              hzeller Hans Zeller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: