Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9662

Add builtin functions for masking UTF-8 strings

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.1.0
    • Backend
    • None

    Description

      The existing mask functions can only deal with ASCII characters. It will be very useful to provide mask functions that can deal with UTF-8 characters, or to improve current mask fuctions to deal with them as Hive does. Otherwise, Impala may leak information since we count each UTF-8 character as three. For example, if we want to mask the last two characters, Impala only masks the last UTF-8 character.

      hive> select mask_last_n('SQL引擎', 2, 'x', 'x', 'x', 'x');
      SQLxx
      impala> select mask_last_n('SQL引擎', 2, 'x', 'x', 'x', 'x');
      SQL引�xx
      

      Some common scenarios:

      • Masking the last two UTF-8 characters of Chinese names.
      • Show only the first several UTF-8 characters of Chinese addresses and mask all the remaining characters.

      However, this depends on our BE support for UTF-8 strings.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: