Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8042

Better selectivity estimate for BETWEEN

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.1.0
    • None
    • Frontend
    • None
    • ghx-label-1

    Description

      The analyzer rewrites a BETWEEN expression into a pair of inequalities. IMPALA-8037 explains that the planner then groups all such non-quality conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that the analyzer should handle inequalities better.

      BETWEEN is a special case and informs the final result. If we assume a selectivity of s for inequality, then BETWEEN should be something like s/2. The intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a third of values, then c BETWEEN x AND y should be a narrower set of values, say ⅙.

      [Ramakrishnan an Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the general expression x <= c AND c <= Y. Note the discrepancy between the compound inequality case and the BETWEEN case, likely reflecting the additional information we obtain when the user chooses to use BETWEEN.

      To implement a special BETWEEN selectivity in Impala, we must remember the selectivity of BETWEEN during the rewrite to a compound inequality.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers

            Dates

              Created:
              Updated:

              Slack

                Issue deployment