Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.10.0
Description
The issue can be reproduced as follows:
> create table part_tbl_parq (a integer) partitioned by (p STRING) stored as parquet; > insert into part_tbl_parq partition(p="val100") values(100); > select * from part_tbl_parq; +-----+--------+ | a | p | +-----+--------+ | 100 | val100 | +-----+--------+ > select p, count(a) from part_tbl_parq group by p; +--------+----------+ | p | count(a) | +--------+----------+ | val100 | 1 | +--------+----------+ > select p, count(*) from part_tbl_parq group by p; +---+----------+ | p | count(*) | +---+----------+ | | 0 | +---+----------+
The result of the last select is obviously incorrect.
The problem does not happen if the table is partitioned by an INT column:
> create table part_tbl_parq2 (a integer) partitioned by (p integer) stored as parquet; > insert into part_tbl_parq2 partition(p=100) values(100); > select p, count(*) from part_tbl_parq2 group by p; +-----+----------+ | p | count(*) | +-----+----------+ | 100 | 1 | +-----+----------+
Attachments
Issue Links
- is broken by
-
IMPALA-5036 Improve COUNT(*) performance of Parquet scans.
- Resolved