Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4743

HashJoin's not fully parallelized in query plan

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.8.0
    • None

    Description

      The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join.

      To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround.

      For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity.

      planner.filter.min_selectivity_estimate_factor 
      planner.filter.max_selectivity_estimate_factor 
      

      When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. The FILTER operator only operates on the input of its immediate upstream operator (e.g. SCAN, AGG). If two different filters are present in the same plan, they might have different selectivities based on their immediate upstream operators ROWCOUNT.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rhou Robert Hou
            gparai Gautam Parai
            Robert Hou Robert Hou
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment