Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-965

PERFORMANCE: optimize common case in matches (PORegex)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • impl
    • None

    Description

      Some frequently seen use cases of 'matches' comparison operator have follow properties -
      1. The rhs is a constant string . eg "c1 matches 'abc%' "
      2. Regexes such that look for matching prefix , suffix etc are very common. eg - "abc%', "%abc", '%abc%'

      To optimize for these common cases , PORegex.java can be changed to -
      1. Compile the pattern (rhs of matches) re-use it if the pattern string has not changed.
      2. Use string comparisons for simple common regexes (in 2 above).

      The implementation of Hive like clause uses similar optimizations.

      Attachments

        1. poregex2.patch
          33 kB
          Ankit Modi
        2. automaton.jar
          168 kB
          Ankit Modi

        Issue Links

          Activity

            People

              ankit.modi Ankit Modi
              thejas Thejas Nair
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: