Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13470

Stats loaded twice for Iceberg tables

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • ghx-label-6

    Description

      When we load an Iceberg table, apparently the table stats are loaded twice from HMS.
      These are the HMS logs when we load an Iceberg table in Impala:

      2024-10-07 19:09:52,926 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-190]: 194: get_table : tbl=hive.yri_kf_csi.calls
      2024-10-07 19:09:52,926 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-190]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_table : tbl=hive.yri_kf_csi.calls    
      2024-10-07 19:09:52,930 INFO  org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Starting translation for processor Impala4.0.0.7.3.1.0-141@v2h0332.sjc.cloudera.com on list 1
      2024-10-07 19:09:52,930 INFO  org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Table calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
      2024-10-07 19:09:52,931 INFO  org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Transformer return list of 1
      2024-10-07 19:09:52,936 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info
      2024-10-07 19:09:52,936 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-2]: ugi=impala/v2h0306.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.54    cmd=get_all_write_event_info    
      2024-10-07 19:09:52,958 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_config_value: name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__
      2024-10-07 19:09:52,958 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_config_value: name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__    
      2024-10-07 19:09:52,963 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: table=hive.yri_kf_csi.calls
      2024-10-07 19:09:52,964 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls    
      2024-10-07 19:09:52,971 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys : tbl=hive.yri_kf_csi.calls
      2024-10-07 19:09:52,971 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls    
      2024-10-07 19:09:52,972 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls
      2024-10-07 19:09:52,972 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_foreign_keys : parentdb=null parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls    
      2024-10-07 19:09:52,991 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: table=hive.yri_kf_csi.calls
      2024-10-07 19:09:52,991 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls    
      2024-10-07 19:09:52,998 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls newtbl=calls
      2024-10-07 19:09:52,998 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM    ip=172.20.33.80    cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls    
       

      get_table_statistics_req() seems to be called twice, once in HdfsTable.load() and once in IcebergTable.load()

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gaborkaszab Gabor Kaszab
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: