Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11679

Inconsistent push down of limit with unpartitioned row_number()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-9

    Description

      In case of unpartitioned row_number() having a <= predicate on row number and limit means the same, so these two queries should lead to an equivalent plan:

      a:
      select * from (select l_orderkey, row_number() OVER (ORDER by l_orderkey) as rnum from tpch_parquet.lineitem) s
      where rnum <= 10000;

      b:
      select * from (select l_orderkey, row_number() OVER (ORDER by l_orderkey) as rnum from
      tpch_parquet.lineitem) s
      limit 10000;

      Currently a. will use to a top-n node while b. will use a sort node.
      For rnum <= 1000 a. will also use a top-n node

      Meanwhile if there is also a rnum > X clause (essentially an OFFSET), then limit has lower bounds for using top-n:

      c:
      select * from (select l_orderkey, row_number() OVER (ORDER by l_orderkey) as rnum fromtpch_parquet.lineitem) s
      where rnum > 900 and rnum <= 1000

      d:
      select * from (select l_orderkey, row_number() OVER (ORDER by l_orderkey) as rnum from tpch_parquet.lineitem) s
      where rnum > 900 limit 1000

      c. will use a top-n node while d. will use a sort node

      Besides not using the more optimal top-n (for low limits) another problem is that the analyitic-eval-node will process all rows, even when all further rows will be dropped by the predicate on row_number(). This is problematic as it runs on a single node/thread.

      A solution could be to recognize < and > predicates on unpartitioned row_number() as limit and offset.

      Attachments

        Activity

          People

            Unassigned Unassigned
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: