Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
-
None
-
ghx-label-1
Description
Since IMPALA-8240, we allow event-processor to retry for MetastoreNotificationFetchExceptions. However, there are several places that we haven't converted HMS failures in fetching events into MetastoreNotificationFetchExceptions:
1. getNextMetastoreEvents() throws IllegalStateException if it fails to create a MetaStoreClient.
E1024 05:00:58.458434 258 MetastoreEventsProcessor.java:888] Unexpected exception received while processing event Java exception follows: java.lang.IllegalStateException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:105) at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:78) at org.apache.impala.catalog.MetaStoreClientPool.getClient(MetaStoreClientPool.java:205) at org.apache.impala.catalog.Catalog.getMetaStoreClient(Catalog.java:397) at org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:802) at org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:848) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:869) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:98) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:151) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:122) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115) at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:99) ... 13 moreCaused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor948.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84) ... 18 moreCaused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: Failure to initialize security context at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:171) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:244) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39) at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51) at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport.open(TUGIAssumingTransport.java:48) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:758) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:271) at sun.reflect.GeneratedConstructorAccessor948.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:98) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:151) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:122) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115) at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:99) at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:78) at org.apache.impala.catalog.MetaStoreClientPool.getClient(MetaStoreClientPool.java:205) at org.apache.impala.catalog.Catalog.getMetaStoreClient(Catalog.java:397) at org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:802) at org.apache.impala.catalog.events.MetastoreEventsProcessor.getNextMetastoreEvents(MetastoreEventsProcessor.java:848) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:869) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:829) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:271) ... 22 more
2. processEvents() doesn't handle the failures of getCurrentEventId() as MetastoreNotificationFetchExceptions. Instead, getCurrentEventId() throws CatalogException:
E1114 16:01:11.121475 28921 MetastoreEventsProcessor.java:942] Unexpected exception received while processing event Java exception follows: org.apache.impala.catalog.CatalogException: Unable to fetch the current notification event id. Check if metastore service is accessible at org.apache.impala.catalog.events.MetastoreEventsProcessor.getCurrentEventId(MetastoreEventsProcessor.java:744) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:922) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) E1114 16:01:11.136973 28921 MetastoreEventsProcessor.java:1190] Notification event is null W1114 16:01:11.137122 28921 MetastoreEventsProcessor.java:913] Event processing is skipped since status is ERROR. Last synced event id is 8406252
Event-processor should distinguish these HMS errors and don't go into the ERROR state. So it can retry until the connection to HMS is back to normal.
Attachments
Issue Links
- relates to
-
IMPALA-8240 Event processor should keep trying if metastore is unavailable
- Resolved
Commit 5af8fef199b60fb7725971b419596a36e48b1eec in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5af8fef19 ]
IMPALA-12561: Event-processor shouldn't go into ERROR state for failures in fetching eventsAny failures in fetching HMS events should be retriable. Event-processor
should not go into the ERROR state which can only be recovered by a
global INVALIDATE METADATA command.
This patch deals with the failure in creating a new MetaStoreClient
by throwing a MetastoreClientInstantiationException instead of an
IllegalStateException. Previously the IllegalStateException could fail
the process of fetching HMS events. Now callers can catch the
MetastoreClientInstantiationException and convert it into
MetastoreNotificationFetchException if the process is retriable. So the
event-processor can retry in the next round. There are still other
callers of Catalog#getMetaStoreClient() that don't catch the new
exception since their work can't be easily retried.
Also makes sure MetastoreEventsProcessor.getCurrentEventId() only throws
MetastoreNotificationFetchException. Previously it throws
CatalogException which will fail the event-processor. Note that
CatalogException is used for errors in accessing objects in the Catalog,
e.g. table not found. We shouldn't throw it when fetching HMS events
fails.
Tests:
thrown as expected. To mimic HMS connection failures, use a
customized MetastoreClientPool that uses wrong HMS port.
class previously only runs in exhaustive jobs due to long running
time. Optimize the test to only restart HMS. Adds a new option,
-if_not_running, for run-hive-server.sh to avoid unneccessary
restarts.
Change-Id: I775684d473fdbfb9f0531234f59a6239bd0873e3
Reviewed-on: http://gerrit.cloudera.org:8080/20707
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>