[IMPALA-1270] Consider adding distinct aggregation to subqueries as perf optimization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 2.0
Fix Version/s: Impala 4.0.0
Component/s: Frontend
Labels:
- planner

Epic Link:
Subquery Support
Target Version:

Product Backlog

Description

We should consider other rewrites for exists. For q4, another rewrite is an inner join + distinct:

select
  o_orderpriority,
  count(distinct l_orderkey) as order_count
from lineitem l
inner join orders o
  on (o.o_orderkey = l.l_orderkey and
      l.l_commitdate < l.l_receiptdate)
where
  o_orderdate >= '1993-07-01' and
  o_orderdate < '1993-10-01'
group by
  o_orderpriority
order by
  o_orderpriority

This can run much faster because we have more flexibility on how we execute the inner join. We get killed partitioning lineitem now.

Attachments

Issue Links

is duplicated by

IMPALA-1728 sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Nong Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/Sep/14 20:12

Updated:: 15/Dec/20 19:57

Resolved:: 15/Jul/20 18:00