Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1006

Do not sort group by rows without order by

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • None

    Description

      Assuming a sql query like below which will generate 55000 groups:

      SELECT count(1) as count,SUM(int_column) as sum_column, MAX(int_column) as max_column2,MIN(int_column) as min_column,AVG(int_column) as avg_column FROM table1 WHERE int_column IS NOT NULL GROUP BY int_column2 ORDER BY int_column DESC LIMIT 200;
      

      From AgreegatePlan we could see the resultIterator will be set to MergeSortRowKeyResultIterator during group by, and the MergeSortRowKeyResultIterator needs an OrderedResultIterator. As a result, no matter whether the group by query is with order by or not, it'll ALWAYS be sorted first, which is unnecessary.

      To improve this, we could modify the code to not trigger orderby iterator when groupby w/o orderby, and sort the result within each group on client side instead.

      On the other side, in the groupby plus orderby case, now the sort on RegionServer side is triggered sequentially, which cause s poor performance especially w/ big region number. We should improve this by getting an element from each scanner earlier to trigger the sort and make the sorting in parallel.

      More details, please refer to the attached patch. Any comment/suggestion will be highly appreciated.

      Attachments

        1. PHOENIX-1006.patch
          14 kB
          jay wong
        2. PHOENIX-1006v2.patch
          15 kB
          jay wong

        Activity

          People

            Unassigned Unassigned
            jaywong jay wong
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: