Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10208

Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Catalog
    • ghx-label-11

    Description

      Steps to replicate the issue:

      Step1:

      CREATE TABLE impala_partition_test1 (                                          
         a INT                                                                               
       )                                                                                     
       PARTITIONED BY (                                                                      
         b STRING                                                                            
       ); alter table impala_partition_test1 add partition(b="part1");
       alter table impala_partition_test1 add partition(b="part2");
       alter table impala_partition_test1 add partition(b="part3");
       alter table impala_partition_test1 add partition(b="part4");
      

      Step2: Populating the partitions

      for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; done
       for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; done
       for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; done
       for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; done
      

      Step3: Run compute incremental stats impala_partition_test1;

      Step4: In HMS DB when you run the below query

      select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;	
      

      You will be noticing

      Step5: After you drop the stats [drop stats impala_partition_test1 ] you still be noticing impala_intermediate_stats_num_chunks left unremoved.

      When you have million partitions this could contribute to 37mb I suppose. Requesting you to remove impala_intermediate_stats_num_chunks while we drop stats from table.

      Attachments

        1. image-2020-10-02-10-39-18-642.png
          564 kB
          Venkat Sambath
        2. image-2020-10-02-10-38-48-144.png
          456 kB
          Venkat Sambath
        3. image-2020-10-02-10-38-16-153.png
          544 kB
          Venkat Sambath

        Activity

          People

            chufucun Fucun Chu
            Venkat Sambath Venkat Sambath
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: