[TAJO-1010] Improve multiple DISTINCT aggregation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0
Component/s: Planner/Optimizer
Labels:
None

Description

Currently, tajo provides three stage for optimizing distinct query aggregation. But it just supports one column for distinct aggregation as follows:

Query1

select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
from table1
group by a.flag

If you write two more columns for distinct aggregation, you can't apply optimized distinct aggregation as follows:

Query2

select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
, count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
from table1
group by a.flag

In this case, you may see low performance for your query. Thus, we need to improve multiple DISTINCT aggregation. Correctly, we should support three stage for multiple DISTINCT aggregation.

Attachments

Issue Links

is related to

TAJO-601 Improve distinct aggregation query processing

Resolved

Activity

People

Assignee:: JaeHwa Jung

Reporter:: JaeHwa Jung

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Aug/14 09:52

Updated:: 20/Nov/14 12:18

Resolved:: 08/Oct/14 02:38