Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1010

Improve multiple DISTINCT aggregation.

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Planner/Optimizer
    • Labels:
      None

      Description

      Currently, tajo provides three stage for optimizing distinct query aggregation. But it just supports one column for distinct aggregation as follows:

      Query1
      select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
      from table1
      group by a.flag
      

      If you write two more columns for distinct aggregation, you can't apply optimized distinct aggregation as follows:

      Query2
      select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
      , count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
      from table1
      group by a.flag
      

      In this case, you may see low performance for your query. Thus, we need to improve multiple DISTINCT aggregation. Correctly, we should support three stage for multiple DISTINCT aggregation.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                blrunner Jaehwa Jung
                Reporter:
                blrunner Jaehwa Jung
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: