Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21857

Sort conditions in a filter predicate to accelerate query processing

    XMLWordPrintableJSON

Details

    Description

      Following approach similar to http://db.cs.berkeley.edu/jmh/miscpapers/sigmod93.pdf .

      To reorder predicates in AND conditions, we could rank each of elements in the clauses in increasing order based on following formula:

      rank = (selectivity - 1) / cost per tuple
      

      Similarly, for OR conditions:

      rank = (-selectivity) / cost per tuple
      

      Selectivity can be computed with FilterSelectivityEstimator. For cost per tuple, we will need to come up with some heuristic based on how expensive is the evaluation of the functions contained in that predicate. Custom UDFs could be annotated.

      Attachments

        1. HIVE-21857.01.patch
          12 kB
          jcamachorodriguez
        2. HIVE-21857.02.patch
          958 kB
          jcamachorodriguez
        3. HIVE-21857.03.patch
          2.28 MB
          jcamachorodriguez
        4. HIVE-21857.04.patch
          4.23 MB
          jcamachorodriguez
        5. HIVE-21857.05.patch
          4.24 MB
          jcamachorodriguez
        6. HIVE-21857.06.patch
          4.24 MB
          jcamachorodriguez
        7. HIVE-21857.07.patch
          4.24 MB
          jcamachorodriguez
        8. HIVE-21857.08.patch
          4.25 MB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m