Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6896

Extraneous columns being projected past a join

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.15.0
    • Fix Version/s: Future
    • Component/s: None
    • Labels:
      None

      Description

      Robert Hou noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14. Analysis revealed that an extra column was being projected in 1.15 and the slowdown was because the extra column was being unnecessarily pushed across an exchange.

      Here is a simplified query written by Aman Sinha that exhibits the same problem :

      In first plan, o_custkey and o_comment are both extraneous projections.
      In the second plan (on 1.14.0), also, there is an extraneous projection: o_custkey but not o_comment.

      On 1.15.0:
      -------------

      explain plan without implementation for 
          select
            c.c_custkey
          from
             cp.`tpch/customer.parquet` c 
               left outer join cp.`tpch/orders.parquet` o 
            on c.c_custkey = o.o_custkey
           and o.o_comment not like '%special%requests%'
         ;
      
      DrillScreenRel
        DrillProjectRel(c_custkey=[$0])
          DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])
            DrillJoinRel(condition=[=($2, $0)], joinType=[right])
              DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
                DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
              DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])
      

      On 1.14.0:
      -------------

      DrillScreenRel
        DrillProjectRel(c_custkey=[$0])
          DrillProjectRel(c_custkey=[$1], o_custkey=[$0])
            DrillJoinRel(condition=[=($1, $0)], joinType=[right])
              DrillProjectRel(o_custkey=[$0])
                DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
                  DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
              DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                amansinha100 Aman Sinha
                Reporter:
                karthikm Karthikeyan Manivannan
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: