Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-3265

INSERT values to VARCHAR column of UTF8 charset can be extremely slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • any
    • 2.4
    • sql-exe
    • None

    Description

      When inserting values to varchar column of UTF8 character set, and the values are a mixture of alphabet and Chinese (or other none-ISO8859-1 characters), the performance will degrade exponentially according to the number of alphabet characters.

      >> CREATE TABLE t1 (a VARCHAR(65535) CHARACTER SET UTF8);
      >> INSERT INTO t1 VALUES ('aaa...中文');
      

      Here is a simple test result with different number of 'a's in the insert value:

      100  'a's                 10 ms
      1k   'a's                 100 ms
      10k  'a's                 10 seconds
      60k  'a's                 160 seconds

      The reason is because we internally convert UTF8 to UCS2, which means all 'a's will become "a\0"s, and then ConstValue::getText() function will try to replace all '\0' to printable string "\­\0", and thus cause the performance degrade exponentially with the number of 'a's in the string.

      const NAString ConstValue::getText() const
       {
       if(getType()->getTypeQualifier() == NA_CHARACTER_TYPE)
       {
       ...
      
      // Change imbedded NULL and \377 chars to \0 and \377
       // This comes up in key values quite often.
       size_t index;
       while((index = result.first('\0')) != NA_NPOS
       && index != result.length())
       result(index,1) = "\\0";
       while((index = result.first('\377')) != NA_NPOS
       && index != result.length())
       result(index,1) = "\\377";
      
      ...
       }
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hezhenxing He Zhenxing
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h