Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14318

Vectorization: LIKE should use matches() instead of find(0)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 1.2.1, 1.3.0, 2.2.0
    • Fix Version/s: None
    • Component/s: Vectorization
    • Labels:
      None

      Description

      Checking for a match instead of find() would allow matcher to exit early instead of looking for sub-sequences beyond the first non-match.

      In UDFLike.java, the complex pattern checker uses matches() and the vectorized version uses find(0), which is more expensive.

      Benchmark                            Mode  Cnt    Score    Error  Units
      RegexBench.testGreedyRegexHit        avgt    5  379.316 ± 32.444  ns/op
      RegexBench.testGreedyRegexHitCheck   avgt    5  344.895 ± 15.436  ns/op
      RegexBench.testGreedyRegexMiss       avgt    5  497.193 ± 18.168  ns/op
      RegexBench.testGreedyRegexMissCheck  avgt    5  171.872 ±  8.588  ns/op
      

      The miss in match is nearly ~3x more expensive per-row with the .find(0) over the .match() check version.

      The pattern match scenario is nearly the same.

      The lazy scenario makes it slower when there's a hit (because match runs the check till end, but ~2x faster when there's a miss).

      RegexBench.testLazyRegexHit          avgt    5   78.398 ±  6.007  ns/op
      RegexBench.testLazyRegexHitCheck     avgt    5  120.557 ±  4.396  ns/op
      RegexBench.testLazyRegexMiss         avgt    5  387.594 ± 25.672  ns/op
      RegexBench.testLazyRegexMissCheck    avgt    5  154.489 ± 13.622  ns/op
      

        Attachments

        1. HIVE-14318.1.patch
          0.8 kB
          Gopal Vijayaraghavan

          Issue Links

            Activity

              People

              • Assignee:
                gopalv Gopal Vijayaraghavan
                Reporter:
                gopalv Gopal Vijayaraghavan
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: