Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Bug
-
Impala 2.0
-
None
-
None
Description
Several users have reported that after upgrading to Impala 2.0 and Hive 0.13.1 queries will fail on tables if they have run compute stats.
After running compute stats, they will get the following error when running queries on those tables:
AnalysisException: Failed to load metadata for table: default.stats_test CAUSED BY: TableLoadingException: Failed to load metadata for table: stats_test CAUSED BY: TTransportException: null
The Hive metastore log has the following error:
2014-10-24 09:37:02,862 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 183: source:/172.24.16.192 get_table_statistics_req: db=default table=stats_test 2014-10-24 09:37:02,862 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=hive ip=/172.24.16.192 cmd=source:/172.24.16.192 get_table_statistics_req: db=default table=stats_test 2014-10-24 09:37:02,873 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(251)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value! at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240) at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) at org.apache.thrift.TUnion.write(TUnion.java:152) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:550) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:488) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj.write(ColumnStatisticsObj.java:414) at org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:388) at org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:338) at org.apache.hadoop.hive.metastore.api.TableStatsResult.write(TableStatsResult.java:288) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
Additional notes from the list:
Dropping and re-creating the tables and partitions in Hive restores access (if you're fortunate enough to have external tables) but running COMPUTE STATS on the recreated table will render it inaccessible again. Table metadata and data remains accessible in Hive even when inaccessible in Impala. We can compute stats in Hive without issue, and Impala seems to be able to make use of them, as the 'Warning: Missing relevant stats...' message disappears from query profiles.