Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3418

Partition Pruning : We are over-pruning and this leads to wrong results

    XMLWordPrintableJSON

Details

    Description

      git.commit.id.abbrev=c199860

      We are over-pruning based on the below plan.

      explain plan for select * from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
      00-00    Screen
      00-01      Project(*=[$0])
      00-02        Project(T17¦¦*=[$0])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[AND(OR(=($1, 1993), =($2, 'jun')), OR(=($1, 1991), =($2, 'aug'), >(ITEM($3, 0), 5000)))])
      00-05              Project(T17¦¦*=[$0], dir0=[$1], dir1=[$2], columns=[$3])
      00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_26.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_7.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring, numFiles=2, columns=[`*`]]])
      

      We have 7 year partitions and each year has 12 further month partitions.

      I executed a count query based on the above filters with & without partitioning in place and the values were different

      Without Partitions :

      select count(*) from tbl_nopartitions where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
      +---------+
      | EXPR$0  |
      +---------+
      | 14515   |
      +---------+
      1 row selected (0.903 seconds)
      

      With Partitions :

      select count(*) from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
      +---------+
      | EXPR$0  |
      +---------+
      | 1800    |
      +---------+
      1 row selected (0.49 seconds)
      

      The data is larger than 10 MB to upload here

      Attachments

        1. DRILL-3418.patch
          5 kB
          Steven Phillips
        2. DRILL-3418_2015-06-29_15:08:57.patch
          6 kB
          Steven Phillips
        3. DRILL-3418_2015-06-29_17:08:52.patch
          7 kB
          Steven Phillips

        Activity

          People

            sphillips Steven Phillips
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: