Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11127

Document the UTF8_MODE query option and relavent string functions

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.1.0
    • Impala 4.1.0
    • None
    • None

    Description

      Since IMPALA-2019 is resolved, we can document the UTF8_MODE query option added in it now. The query option will turn on the UTF-8 aware behavior of string functions. The relevant string functions and their UTF-8 aware behaviors are:

      • LENGTH(STRING a)
        • returns the number of UTF-8 characters instead of bytes
      • SUBSTR(STRING a, INT start [, INT len])
        SUBSTRING(STRING a, INT start [, INT len])()
        • the substring start position and length is counted by UTF-8 characters instead of bytes
      • REVERSE(STRING a)
        • the unit of the operation is a UTF-8 character, ie. it won't reverse bytes inside a UTF-8 character.
      • INSTR(STRING str, STRING substr[, BIGINT position[, BIGINT occurrence]])
        LOCATE(STRING substr, STRING str[, INT pos])
        • These functions have an optional position argument. The return values are also positions in the string. In UTF-8 mode, these positions are counted by UTF-8 characters instead of bytes.
      • mask functions
        • The unit of the operation is a UTF-8 character, ie. they won't mask the string byte-to-byte.
      • upper/lower/initcap
        • These functions will recognize non-ascii characters and transform them based on the current locale used by the Impala process.

      Attachments

        Activity

          People

            shajini shajini thayasingh
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: