Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6096

RLIKE/REGEXP operator and regexp_* functions require different character escaping

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 2.5.0
    • None
    • Frontend
    • None
    • ghx-label-7

    Description

      When escaping a regex-special character such as ( or [, the REGEXP/RLIKE operator requires a triple escape such as

      \\\(

      while the regexp_like/regexp_extract/etc. functions only require a double escape such as

      \\(

      Here's a test proving the difference in requirement:

      CREATE TABLE test_regexp (a STRING);
      INSERT INTO test_regexp VALUES ('This is a (string) with [special chars] that need e.s.c.a.p.i.n.g');
      
      -- The below will fail
      SELECT regexp_extract(a, 'This is a \\((.*)\\) with .*', 1) FROM test_regexp WHERE a RLIKE 'This is a \\(string.*';
      
      -- The below will pass correctly
      SELECT regexp_extract(a, 'This is a \\((.*)\\) with .*', 1) FROM test_regexp WHERE a RLIKE 'This is a \\\(string.*';
      

      The failure message is:

      invalid regular expression in 'a RLIKE 'This is a \\(string.*''

      Could the escape format be unified between the two? Per the documentation, both are supposed to be following the standard RE2 syntax.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: