Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7187

Improve selectivity estimates for range predicates when using histogram

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.17.0
    • Component/s: None
    • Labels:

      Description

      2 types of selectivity estimation improvements need to be done:

      1. For range predicates on the same column, we need to collect all such predicates in 1 group and do a histogram lookup for them together.
      For instance:

       WHERE a > 10 AND b < 20 AND c = 100 AND a <= 50 AND b < 50
      

      Currently, the Drill behavior is to treat each of the conjuncts independently and multiply the individual selectivities. However, that will not give the accurate estimates. Here, we want to group the predicates on 'a' together and do a single lookup. Similarly for 'b'.

      2. NULLs are not maintained by the histogram but when doing the selectivity calculations, the histogram should use the totalRowCount as the denominator rather than the non-null count.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                amansinha100 Aman Sinha
                Reporter:
                amansinha100 Aman Sinha
                Reviewer:
                Gautam Parai
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: