Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
Impala 2.2
-
None
-
None
Description
Expected behaviour (note the SMALLINT partition):
[localhost:21000] > CREATE TABLE working (a String) PARTITIONED BY (b SMALLINT); [localhost:21000] > INSERT INTO working (a, b) VALUES ("A",1); [localhost:21000] > INSERT INTO working (a, b) VALUES ("B",1); [localhost:21000] > INSERT INTO working (a, b) VALUES ("C",2); [localhost:21000] > COMPUTE STATS working; [localhost:21000] > SHOW TABLE STATS working; Query: show TABLE STATS working +-------+-------+--------+------+--------------+-------------------+--------+-------------------+ | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+ | 1 | 2 | 2 | 4B | NOT CACHED | NOT CACHED | TEXT | false | | 2 | 1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | | Total | 3 | 3 | 6B | 0B | | | | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
Now the same steps with a TINYINT partition type:
[localhost:21000] > CREATE TABLE broken (a String) PARTITIONED BY (b TINYINT); [localhost:21000] > INSERT INTO broken (a, b) VALUES ("A",1); [localhost:21000] > INSERT INTO broken (a, b) VALUES ("B",1); [localhost:21000] > INSERT INTO broken (a, b) VALUES ("C",2); [localhost:21000] > COMPUTE STATS broken; [localhost:21000] > SHOW TABLE STATS broken; Query: show TABLE STATS broken +-------+-------+--------+------+--------------+-------------------+--------+-------------------+ | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+ | 1 | 0 | 2 | 4B | NOT CACHED | NOT CACHED | TEXT | false | | 2 | 0 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | | Total | 3 | 3 | 6B | 0B | | | | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
Notice that all the partitions have numRows=0. The incorrect number of rows can negatively impact the query planning.