Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1006

Do not sort group by rows without order by

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Assuming a sql query like below which will generate 55000 groups:

      SELECT count(1) as count,SUM(int_column) as sum_column, MAX(int_column) as max_column2,MIN(int_column) as min_column,AVG(int_column) as avg_column FROM table1 WHERE int_column IS NOT NULL GROUP BY int_column2 ORDER BY int_column DESC LIMIT 200;

      From AgreegatePlan we could see the resultIterator will be set to MergeSortRowKeyResultIterator during group by, and the MergeSortRowKeyResultIterator needs an OrderedResultIterator. As a result, no matter whether the group by query is with order by or not, it'll ALWAYS be sorted first, which is unnecessary.

      To improve this, we could modify the code to not trigger orderby iterator when groupby w/o orderby, and sort the result within each group on client side instead.

      On the other side, in the groupby plus orderby case, now the sort on RegionServer side is triggered sequentially, which cause s poor performance especially w/ big region number. We should improve this by getting an element from each scanner earlier to trigger the sort and make the sorting in parallel.

      More details, please refer to the attached patch. Any comment/suggestion will be highly appreciated.




            • Assignee:
              jaywong jay wong


              • Created:

                Issue deployment