Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
Impala 2.7.0
-
None
Description
Impala does not refresh boolean partitions correctly if there are multiple directories corresponding to a "true" or "false" partition key value.
The root cause of this issue is HIVE-6590. Impala can generally handle the strange metadata state caused by HIVE-6590, except when doing a refresh <table> or refresh <partition>.
Reproduction
In Hive:
CREATE TABLE tbl (i INT) PARTITIONED BY (b BOOLEAN); INSERT OVERWRITE TABLE tbl PARTITION(b=false) VALUES(1); INSERT OVERWRITE TABLE tbl PARTITION(b=FALSE) VALUES(2); INSERT OVERWRITE TABLE tbl PARTITION(b=true) VALUES(10);
In Impala:
invalidate metadata tbl; show files in tbl; +------------------------------------------------------------+------+-----------+ | Path | Size | Partition | +------------------------------------------------------------+------+-----------+ | hdfs://localhost:20500/test-warehouse/tbl/b=false/000000_0 | 2B | b=FALSE | | hdfs://localhost:20500/test-warehouse/tbl/b=FALSE/000000_0 | 2B | b=FALSE | | hdfs://localhost:20500/test-warehouse/tbl/b=true/000000_0 | 3B | b=TRUE | +------------------------------------------------------------+------+-----------+ refresh tbl; show files in tbl; +------------------------------------------------------------+------+-----------+ | Path | Size | Partition | +------------------------------------------------------------+------+-----------+ | hdfs://localhost:20500/test-warehouse/tbl/b=false/000000_0 | 2B | b=FALSE | | hdfs://localhost:20500/test-warehouse/tbl/b=FALSE/000000_0 | 2B | b=FALSE | | hdfs://localhost:20500/test-warehouse/tbl/b=false/000000_0 | 2B | b=FALSE | | hdfs://localhost:20500/test-warehouse/tbl/b=true/000000_0 | 3B | b=TRUE | +------------------------------------------------------------+------+-----------+
Notice how some files are reported multiple times. Queries also return wrong results:
select sum(i) from tbl; +--------+ | sum(i) | +--------+ | 14 | +--------+
A similar problem occurs with refresh <partition>.
Workaround
- invalidate metadata <table> fixes the table metadata
- ensure that boolean partitions only have a single corresponding HDFS directory, i.e., try to avoid
HIVE-6590