Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3181

Support limit per group in Window

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      We have a lot of queries like the following to retrieve top N tuples per group:

      SELECT x, y FROM
           (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) 
           AS rn FROM t1) t2 WHERE rn <= 3;
      

      The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort.

      In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is row_number(). We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later).

      Filter (rn <= 3)
        +- Window (window#0={Partition by x order by y ROW_NUMBER()})
      

      to

      Filter (rn <= 3)
        +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
      

      Thoughts? Objections?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hyuan Haisheng Yuan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: