Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10770

Recognize additional common factors in Filter predicates

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • CBO
    • None

    Description

      Currently, we canonize predicates at the term level (i.e. "a or b or a" becomes "a or b" but we do not attempt to recognize terms that are equivalent). Further, we do not exploit e.g. the symmetry of '=' (i.e. a = b iff b = a).

      • A first extension would be to normalize comparisons between field references and literals so that the lower field reference is always on the left. So, "$6 = $3" becomes "$3 = $6"; "$6 > $3" becomes "$3< $6". And "literal <= $5" becomes "$5 >= literal". This would not damage performance, and would improve a few plans.
      • Another possible extension. Given the predicate "(a or b) and ((x and a) or (y and b))", the first factor can be removed so the expression consists only of "(x and a) or (y and b)".
        One possible way to recognize such cases is to transform the second factor to CNF i.e. "(x or y) and (x or b) and (a or y) and (a or b)", and as it contains "(a or b)", we would know that we can discard it. Then we could just use the original expression i.e. "(x and a) or (y and b)" in the predicate, once we have done the check.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: