Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6553

Fix TopN for unnest operator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14.0
    • None

    Description

      Plan for the query with unnest is chosen non-optimally:

      select customer.c_custkey, customer.c_name, t.o.o_orderkey,t.o.o_totalprice
      from dfs.`lateraljoin/multipleFiles` customer,
      unnest(customer.c_orders) t(o)
      order by customer.c_custkey, t.o.o_orderkey, t.o.o_totalprice
      limit 50
      

      Plan:

      00-00    Screen
      00-01      ProjectAllowDup(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        SelectionVectorRemover
      00-03          Limit(fetch=[50])
      00-04            SelectionVectorRemover
      00-05              Sort(sort0=[$0], sort1=[$2], sort2=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC])
      00-06                Project(c_custkey=[$2], c_name=[$3], EXPR$2=[ITEM($4, 'o_orderkey')], EXPR$3=[ITEM($4, 'o_totalprice')])
      00-07                  LateralJoin(correlation=[$cor0], joinType=[inner], requiredColumns=[{1}])
      00-09                    Project(T0¦¦**=[$0], c_orders=[$1], c_custkey=[$2], c_name=[$3])
      00-11                      Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`**`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
      00-08                    Project(c_orders0=[$0])
      00-10                      Unnest [srcOp=00-07] 
      

      A similar query, but with flatten:

      select f.c_custkey, f.c_name, f.o.o_orderkey, f.o.o_totalprice from (select c_custkey, c_name, flatten(c_orders) as o from dfs.`lateraljoin/multipleFiles` customer) f order by f.c_custkey, f.o.o_orderkey, f.o.o_totalprice limit 50
      

      has plan:

      00-00    Screen
      00-01      Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        SelectionVectorRemover
      00-03          Limit(fetch=[50])
      00-04            SelectionVectorRemover
      00-05              TopN(limit=[50])
      00-06                Project(c_custkey=[$0], c_name=[$1], EXPR$2=[ITEM($2, 'o_orderkey')], EXPR$3=[ITEM($2, 'o_totalprice')])
      00-07                  Flatten(flattenField=[$2])
      00-08                    Project(c_custkey=[$0], c_name=[$1], o=[$2])
      00-09                      Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`c_custkey`, `c_name`, `c_orders`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
      

      The main difference is that for the case of unnest, a project wasn't pushed to the scan and Limit with Sort weren't converted to TopN.
      The first problem is tracked by DRILL-6545 and this Jira aims to fix the problem with TopN

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              volodymyr Vova Vysotskyi
              Hanumath Rao Maduri Hanumath Rao Maduri
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: