Description
When inserting values to varchar column of UTF8 character set, and the values are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters), the performance will degrade exponentially according to the number of alphabet characters.
>> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8); >> INSERT INTO t1 VALUES ('aaa...中文');
Here is a simple test result with different number of 'a's in the insert value:
100 'a's 10 ms 1k 'a's 100 ms 10k 'a's 10 seconds 60k 'a's 160 seconds
The reason is because we internally convert UTF8 to UCS2, which means all 'a's will become "a\0"s, and then ConstValue::getText() function will try to replace all '\0' to printable string "\\0", and thus cause the performance degrade exponentially with the number of 'a's in the string.
const NAString ConstValue::getText() const { if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE) { ... // Change imbedded NULL and \377 chars to \0 and \377 // This comes up in key values quite often. size_t index; while((index = result.first('\0')) != NA_NPOS && index != result.length()) result(index,1) = "\\0"; while((index = result.first('\377')) != NA_NPOS && index != result.length()) result(index,1) = "\\377"; ... }
Attachments
Issue Links
- links to