Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
git.commit.id.abbrev=5a34d81
I used the below query to create a paritioned data set
create table `lineitem` partition by (l_moddate) as select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from cp.`tpch/lineitem.parquet` l;
The plan for the below query only scans one file
explain plan for select * from `lineitem` where l_moddate = date '1994-07-01'; 00-00 Screen 00-01 Project(*=[$0]) 00-02 Project(*=[$0]) 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/tpch_single_partition/lineitem/0_0_31.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/tpch_single_partition/lineitem, numFiles=1, columns=[`*`]]])
However the below plan indicates a full table scan
explain plan for select count(*) from `tpch_single_partition/lineitem` where l_moddate = date '1994-07-01'; 00-00 Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02 Project($f0=[0]) 00-03 SelectionVectorRemover 00-04 Filter(condition=[=($0, 1994-07-01)]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/ctas_auto_partition/tpch_single_partition/lineitem]], selectionRoot=/drill/testdata/ctas_auto_partition/tpch_single_partition/lineitem, numFiles=1, columns=[`l_moddate`]]])
Attachments
Attachments
Issue Links
- is duplicated by
-
DRILL-3379 Passing references when cloning ParquetGroupScan causes incorrect planning
- Resolved