Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
-
None
-
ghx-label-3
Description
Following steps can give inconsistent results.
// Create a partitioned table create table test(a int) partitioned by (b int); // Create two partitions b=1/b=2 mapped to the same HDFS location. insert into test partition(b=1) values (1); alter table test add partition (b=2) location 'hdfs://localhost:20500/test-warehouse/test/b=1/' [localhost:21000] > show partitions test; Query: show partitions test +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ | 1 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/test/b=1 | | 2 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/test/b=1 | | Total | -1 | 2 | 4B | 0B | | | | | +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+ // Insert new data into one of the partitions insert into test partition(b=1) values (2); // Newly added file is reflected only in the added partition files. show files in test; Query: show files in test +----------------------------------------------------------------------------------------------------+------+-----------+ | Path | Size | Partition | +----------------------------------------------------------------------------------------------------+------+-----------+ | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=1 | | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=1 | | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=2 | +----------------------------------------------------------------------------------------------------+------+-----------+ invalidate metadata test; show files in test; // After invalidation, the newly added file now shows up in both the partitions. Query: show files in test +----------------------------------------------------------------------------------------------------+------+-----------+ | Path | Size | Partition | +----------------------------------------------------------------------------------------------------+------+-----------+ | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=1 | | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=1 | | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=2 | | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=2 | +----------------------------------------------------------------------------------------------------+------+-----------+
So, depending whether the user invalidates the table, they can see different results. The bug is in the following code.
private FileMetadataLoadStats resetAndLoadFileMetadata( Path partDir, List<HdfsPartition> partitions) throws IOException { FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir); .... .... .... for (HdfsPartition partition: partitions) partition.setFileDescriptors(newFileDescs); <======
We only update the added file metadata for the new partition (copy-on-write way). Instead we should update the source descriptors so that it is reflected in the other partitions too.