[DRILL-4771] Drill should avoid doing the same join twice if count(distinct) exists - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.9.0
Component/s: None
Labels:
None

Description

When the query has one distinct aggregate and one or more non-distinct aggregates, the join instance need not produce the join-based plan. We can generate multi-phase aggregates. Another approach would be to use grouping sets. However, Drill is unable to support grouping sets and instead relies on the join-based plan (see the plan below)

select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
  LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
    LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
      LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
        LogicalJoin(condition=[=($7, $9)], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
      LogicalAggregate(group=[{0, 1}])
        LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
          LogicalJoin(condition=[=($7, $9)], joinType=[inner])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]])
            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

The more efficient form should look like this

select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
  LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
    LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
      LogicalJoin(condition=[=($7, $9)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

Attachments

Activity

People

Assignee:: Gautam Parai

Reporter:: Gautam Parai

Reviewer:: Khurram Faraaz

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Jul/16 21:10

Updated:: 04/Oct/16 00:05

Resolved:: 20/Sep/16 01:50