Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8042

Better selectivity estimate for BETWEEN

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 4.5.0, Impala 4.4.1
    • Frontend
    • None

    Description

      The analyzer rewrites a BETWEEN expression into a pair of inequalities. IMPALA-8037 explains that the planner then groups all such non-quality conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that the analyzer should handle inequalities better.

      BETWEEN is a special case and informs the final result. If we assume a selectivity of s for inequality, then BETWEEN should be something like s/2. The intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a third of values, then c BETWEEN x AND y should be a narrower set of values, say ⅙.

      [Ramakrishnan an Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the general expression x <= c AND c <= Y. Note the discrepancy between the compound inequality case and the BETWEEN case, likely reflecting the additional information we obtain when the user chooses to use BETWEEN.

      To implement a special BETWEEN selectivity in Impala, we must remember the selectivity of BETWEEN during the rewrite to a compound inequality.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rizaon Riza Suminto
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment