Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6219

Filter pushdown doesn't work with star operator if there is a subquery with it's own filter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13.0
    • None
    • None
    • None

    Description

      Data set:
      The data is generated used the attached file: DRILL_6118_data_source.csv
      Data gen commands:

      create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0] in (1, 3);
      create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0]=2;
      create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` where columns[0]>3;
      

      Steps:

      1. Execute the following query:
      select * from (select * from dfs.drillTestDir.`DRILL_6118_parquet_partitioned_by_folders` where c1>2) where c1>3

      Expected result:
      Filrers "c1>3" and "c1>2" should both be pushed down so only the data from the folder "d3" should be scanned.

      Actual result:
      The data from the folders "d1" and "d3" are being scanned so as only filter "c1>2" is pushed down

      Physical plan:

      00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 10.0, cumulative cost = {201.0 rows, 581.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105545
      00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 10.0, cumulative cost = {200.0 rows, 580.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105544
      00-02        SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T25¦¦**): rowcount = 10.0, cumulative cost = {190.0 rows, 570.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105543
      00-03          Filter(condition=[>(ITEM($0, 'c1'), 3)]) : rowType = RecordType(DYNAMIC_STAR T25¦¦**): rowcount = 10.0, cumulative cost = {180.0 rows, 560.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105542
      00-04            Project(T25¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T25¦¦**): rowcount = 20.0, cumulative cost = {160.0 rows, 440.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105541
      00-05              SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T25¦¦**, ANY c1): rowcount = 20.0, cumulative cost = {140.0 rows, 420.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105540
      00-06                Filter(condition=[>($1, 2)]) : rowType = RecordType(DYNAMIC_STAR T25¦¦**, ANY c1): rowcount = 20.0, cumulative cost = {120.0 rows, 400.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105539
      00-07                  Project(T25¦¦**=[$0], c1=[$1]) : rowType = RecordType(DYNAMIC_STAR T25¦¦**, ANY c1): rowcount = 40.0, cumulative cost = {80.0 rows, 160.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105538
      00-08                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/DRILL_6118_parquet_partitioned_by_folders/d1/0_0_0.parquet], ReadEntryWithPath [path=/drill/testdata/DRILL_6118_parquet_partitioned_by_folders/d3/0_0_0.parquet]], selectionRoot=maprfs:/drill/testdata/DRILL_6118_parquet_partitioned_by_folders, numFiles=2, numRowGroups=2, usedMetadataFile=false, columns=[`**`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY c1): rowcount = 40.0, cumulative cost = {40.0 rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 105537
      

      Note: Works fine if select column names instead of "*"

      Attachments

        1. DRILL_6118_data_source.csv
          2 kB
          Anton Gozhiy

        Issue Links

          Activity

            People

              Unassigned Unassigned
              angozhiy Anton Gozhiy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: