Details

      Description

      REGEX_REPLACE() is expensive since it's depends on the Google RE2 library and sometimes just a simple replace is needed.

      Reference:

        Activity

        Show
        zamsden_impala_ad21 Zachary added a comment - https://gerrit.cloudera.org/#/c/5776/
        Hide
        zamsden_impala_ad21 Zachary added a comment -

        Even with a debug build, this is 2x faster. On a release build, the new REPLACE() implementation absolutely demolishes REGEXP_REPLACE, it is faster by 5-6x for simple string patterns. It should be much faster as well for non-matching strings, returning the actual original in place, whereas re:: probably always has to generate an output buffer.

        One place where REGEXP_REPLACE might win is on huge strings with very sparse matches, because it should be able to copy non-matching output in place, but that is not likely a common use case, and something we could always optimize later down the line.

        Show
        zamsden_impala_ad21 Zachary added a comment - Even with a debug build, this is 2x faster. On a release build, the new REPLACE() implementation absolutely demolishes REGEXP_REPLACE, it is faster by 5-6x for simple string patterns. It should be much faster as well for non-matching strings, returning the actual original in place, whereas re:: probably always has to generate an output buffer. One place where REGEXP_REPLACE might win is on huge strings with very sparse matches, because it should be able to copy non-matching output in place, but that is not likely a common use case, and something we could always optimize later down the line.

          People

          • Assignee:
            zamsden Zach Amsden
            Reporter:
            grahn Greg Rahn
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development