Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.6.0
-
None
Description
code base
#Fri Sep 12 14:08:02 PDT 2014
git.commit.id.abbrev=9e16466
I have a parquet file (tpcds data) which contains null value on a column. The total count of the column:
0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet`;
------------
EXPR$0 |
------------
2880404 |
------------
The count without considering null is:
0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is not null;
------------
EXPR$0 |
------------
2750408 |
------------
But the count for null value is zero:
0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
------------
EXPR$0 |
------------
0 |
------------
Here is the physical plan look like for this query:
0: jdbc:drill:schema=dfs> explain plan for select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
----------------------+
text | json |
----------------------+
00-00 Screen 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) 00-02 Filter(condition=[IS NULL($0)]) 00-03 ProducerConsumer 00-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/user/root/mondrian/tpcds/p1/store_sales.parquet]], selectionRoot=/user/root/mondrian/tpcds/p1/store_sales.parquet, columns=[SchemaPath [`ss_quantity`]]]]) |
{ "head" : Unknown macro: { "version" }
, ], Unknown macro: { "default" }
, Unknown macro: { "psv" }
}, , , { "pop" : "filter", "@id" : 2, "child" : 3, "expr" : "isnull(`ss_quantity`) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 720101.0 }, Unknown macro: { "pop" }
, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 72010.1 } ] |
----------------------+
Attachments
Attachments
Issue Links
- is duplicated by
-
DRILL-1362 Count(nullable-column) is incorrectly pushed into group scan operator
- Resolved