Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-6
Description
When we load an Iceberg table, apparently the table stats are loaded twice from HMS.
These are the HMS logs when we load an Iceberg table in Impala:
2024-10-07 19:09:52,926 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-190]: 194: get_table : tbl=hive.yri_kf_csi.calls 2024-10-07 19:09:52,926 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-190]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_table : tbl=hive.yri_kf_csi.calls 2024-10-07 19:09:52,930 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Starting translation for processor Impala4.0.0.7.3.1.0-141@v2h0332.sjc.cloudera.com on list 1 2024-10-07 19:09:52,930 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Table calls,#bucket=0,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null 2024-10-07 19:09:52,931 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [TThreadPoolServer WorkerProcess-190]: Transformer return list of 1 2024-10-07 19:09:52,936 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-2]: 7: get_all_write_event_info 2024-10-07 19:09:52,936 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-2]: ugi=impala/v2h0306.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.54 cmd=get_all_write_event_info 2024-10-07 19:09:52,958 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_config_value: name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__ 2024-10-07 19:09:52,958 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_config_value: name=hive.exec.default.partition.name defaultValue=__HIVE_DEFAULT_PARTITION__ 2024-10-07 19:09:52,963 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: table=hive.yri_kf_csi.calls 2024-10-07 19:09:52,964 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls 2024-10-07 19:09:52,971 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_primary_keys : tbl=hive.yri_kf_csi.calls 2024-10-07 19:09:52,971 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_primary_keys : tbl=hive.yri_kf_csi.calls 2024-10-07 19:09:52,972 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_foreign_keys : parentdb=null parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls 2024-10-07 19:09:52,972 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_foreign_keys : parentdb=null parenttbl=null foreigndb=yri_kf_csi foreigntbl=calls 2024-10-07 19:09:52,991 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: get_table_statistics_req: table=hive.yri_kf_csi.calls 2024-10-07 19:09:52,991 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=get_table_statistics_req: table=hive.yri_kf_csi.calls 2024-10-07 19:09:52,998 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [TThreadPoolServer WorkerProcess-8]: 9: alter_table: hive.yri_kf_csi.calls newtbl=calls 2024-10-07 19:09:52,998 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [TThreadPoolServer WorkerProcess-8]: ugi=impala/v2h0332.sjc.cloudera.com@SJC.CLOUDERA.COM ip=172.20.33.80 cmd=alter_table: hive.yri_kf_csi.calls newtbl=calls
get_table_statistics_req() seems to be called twice, once in HdfsTable.load() and once in IcebergTable.load()