Description
Steps to replicate the issue:
Step1:
CREATE TABLE impala_partition_test1 ( a INT ) PARTITIONED BY ( b STRING ); alter table impala_partition_test1 add partition(b="part1"); alter table impala_partition_test1 add partition(b="part2"); alter table impala_partition_test1 add partition(b="part3"); alter table impala_partition_test1 add partition(b="part4");
Step2: Populating the partitions
for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; done
Step3: Run compute incremental stats impala_partition_test1;
Step4: In HMS DB when you run the below query
select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;
You will be noticing
Step5: After you drop the stats [drop stats impala_partition_test1 ] you still be noticing impala_intermediate_stats_num_chunks left unremoved.
When you have million partitions this could contribute to 37mb I suppose. Requesting you to remove impala_intermediate_stats_num_chunks while we drop stats from table.