[TRAFODION-2477] Invalid characters in UCS2 to UTF8 translation are not handled correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0-incubating
Fix Version/s: 2.2.0
Component/s: sql-cmp
Labels:
None

Description

When translating from UCS-2 to UTF-8, using CAST or TRANSLATE(... UCS2TOUTF8), all valid characters will map easily to a UTF-8 character. However, if we encounter invalid code points or invalid UTF-16 surrogate pairs, those could raise errors. Right now we just suppress those errors. Instead we should either translate them to the Unicode "replacement character" U+FFFD or we should raise an error. Ideally, we should have a CQD that decides which of these two actions to take.

Test case:

create table tbaducs2(a char(10) character set ucs2);

– DC00 is a low-order UTF-16 surrogate, on its own this is invalid
insert into tbaducs2 values(_ucs2 X'DC000041');

select translate(a using ucs2toutf8) from tbaducs2;
– this returns an empty string - no error, no replacement character

Attachments

Issue Links

is related to

TRAFODION-2515 Question mark is used instead of Unicode replacement character

Open

links to

GitHub Pull Request #986

Activity

People

Assignee:: Hans Zeller

Reporter:: Hans Zeller

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Feb/17 19:18

Updated:: 02/Mar/17 23:43

Resolved:: 02/Mar/17 23:43