Description
After executing the rename command in impala-shell, you will lose the table column stats, as the underlining table name stored in the table column stats table in HMS are not updated.
Please see the following test case:
-- Create a table, load with some data, initially with no stats [10.17.74.162:21000] > CREATE TABLE test (a int); Query: create TABLE test (a int) Fetched 0 row(s) in 0.14s [10.17.74.162:21000] > [10.17.74.162:21000] > INSERT OVERWRITE TABLE test VALUES (1), (2), (3); Query: insert OVERWRITE TABLE test VALUES (1), (2), (3) Inserted 3 row(s) in 4.84s [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW TABLE STATS test; Query: show TABLE STATS test +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | -1 | 1 | 6B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 0.07s [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW COLUMN STATS test; Query: show COLUMN STATS test +--------+------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+------+------------------+--------+----------+----------+ | a | INT | -1 | -1 | 4 | 4 | +--------+------+------------------+--------+----------+----------+ Fetched 1 row(s) in 0.01s -- now compute the stats and show stats [10.17.74.162:21000] > COMPUTE STATS test; Query: compute STATS test +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 1 column(s). | +-----------------------------------------+ Fetched 1 row(s) in 2.24s [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW TABLE STATS test; Query: show TABLE STATS test +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | 3 | 1 | 6B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 0.01s [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW COLUMN STATS test; Query: show COLUMN STATS test +--------+------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+------+------------------+--------+----------+----------+ | a | INT | 3 | -1 | 4 | 4 | +--------+------+------------------+--------+----------+----------+ Fetched 1 row(s) in 0.01s -- Finally rename the table you will see that column stats are lost [10.17.74.162:21000] > ALTER TABLE test RENAME TO test_bak; Query: alter TABLE test RENAME TO test_bak [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW TABLE STATS test_bak; Query: show TABLE STATS test_bak +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | 3 | 1 | 6B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 5.39s [10.17.74.162:21000] > [10.17.74.162:21000] > SHOW COLUMN STATS test_bak; Query: show COLUMN STATS test_bak +--------+------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+------+------------------+--------+----------+----------+ | a | INT | -1 | -1 | 4 | 4 | +--------+------+------------------+--------+----------+----------+ Fetched 1 row(s) in 0.02s [10.17.74.162:21000] > [10.17.74.162:21000] > COMPUTE STATS test_bak; Query: compute STATS test_bak +-----------------------------------------+ | summary | +-----------------------------------------+ | Updated 1 partition(s) and 1 column(s). | +-----------------------------------------+ Fetched 1 row(s) in 0.42s [10.17.74.162:21000] > SHOW TABLE STATS test_bak; Query: show TABLE STATS test_bak +-------+--------+------+--------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats | +-------+--------+------+--------------+--------+-------------------+ | 3 | 1 | 6B | NOT CACHED | TEXT | false | +-------+--------+------+--------------+--------+-------------------+ Fetched 1 row(s) in 0.01s [10.17.74.162:21000] > show COLUMN STATS test_bak; Query: show COLUMN STATS test_bak +--------+------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +--------+------+------------------+--------+----------+----------+ | a | INT | 3 | -1 | 4 | 4 | +--------+------+------------------+--------+----------+----------+ Fetched 1 row(s) in 0.01s
I also checked the "TAB_COL_STATS" table in HMS DB, see the result:
-- After initial COMPUTE STATS hive=> SELECT "TABLE_NAME", "COLUMN_NAME", "COLUMN_TYPE", "TBL_ID", "NUM_DISTINCTS" FROM "TAB_COL_STATS" WHERE "TABLE_NAME" = 'test' AND "DB_NAME" = 'case_66505'; TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | NUM_DISTINCTS ------------+-------------+-------------+--------+--------------- test | a | INT | 34859 | 3 (1 row) -- After RENAME, "test" is still there hive=> SELECT "TABLE_NAME", "COLUMN_NAME", "COLUMN_TYPE", "TBL_ID", "NUM_DISTINCTS" FROM "TAB_COL_STATS" WHERE "TABLE_NAME" = 'test' AND "DB_NAME" = 'case_66505'; TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | NUM_DISTINCTS ------------+-------------+-------------+--------+--------------- test | a | INT | 34859 | 3 (1 row) -- After RENAME, can't find "test_bak" hive=> SELECT "TABLE_NAME", "COLUMN_NAME", "COLUMN_TYPE", "TBL_ID", "NUM_DISTINCTS" FROM "TAB_COL_STATS" WHERE "TABLE_NAME" = 'test_bak' AND "DB_NAME" = 'case_66505'; TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | NUM_DISTINCTS ------------+-------------+-------------+--------+--------------- (0 rows) -- After RENAME and COMPUTE STATS on "test_bak" hive=> SELECT "TABLE_NAME", "COLUMN_NAME", "COLUMN_TYPE", "TBL_ID", "NUM_DISTINCTS" FROM "TAB_COL_STATS" WHERE "TABLE_NAME" = 'test_bak' AND "DB_NAME" = 'case_66505'; TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | NUM_DISTINCTS ------------+-------------+-------------+--------+--------------- test_bak | a | INT | 34859 | 3 (1 row) -- In the end you end up with two tables in the stats, one does not exists hive=> SELECT "TABLE_NAME", "COLUMN_NAME", "COLUMN_TYPE", "TBL_ID", "NUM_DISTINCTS" FROM "TAB_COL_STATS" WHERE "DB_NAME" = 'case_66505' AND "TABLE_NAME" in ('test', 'test_bak'); TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | NUM_DISTINCTS ------------+-------------+-------------+--------+--------------- test | a | INT | 34859 | 3 test_bak | a | INT | 34859 | 3 (2 rows)