Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1010

Improve multiple DISTINCT aggregation.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • Planner/Optimizer
    • None

    Description

      Currently, tajo provides three stage for optimizing distinct query aggregation. But it just supports one column for distinct aggregation as follows:

      Query1
      select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
      from table1
      group by a.flag
      

      If you write two more columns for distinct aggregation, you can't apply optimized distinct aggregation as follows:

      Query2
      select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
      , count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
      from table1
      group by a.flag
      

      In this case, you may see low performance for your query. Thus, we need to improve multiple DISTINCT aggregation. Correctly, we should support three stage for multiple DISTINCT aggregation.

      Attachments

        Issue Links

          Activity

            People

              blrunner JaeHwa Jung
              blrunner JaeHwa Jung
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: