Details
-
Brainstorming
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.98.14
-
None
-
None
Description
During an investigation in which we were seeing unexpected NoServerForRegionException errors, the root cause turned out to be a KeeperException that got lost and so resulted in a misleading top level indication.
The underlying exception with partial stacktrace is this:
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1289) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684) at org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:2032) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:203) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:58) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateMeta(HConnectionManager.java:1209) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1175) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1301) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1178) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1135) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:976)
Here is some additional information:
- The exception first gets caught here
- It gets logged and rethrown from here
- It gets caught again, logged and rethrown here
- This finally gets caught and rethrown as InterruptedException here
When thrown as InterruptedException, the cause is lost, so the code catching it can't (and currently doesn't) determine the cause. Perhaps the exception should be preserved and passed on to the caller such that it is available when finally the NoServerForRegionException is thrown here. Alternatively, a more meaningful exception could also be thrown instead of a misleading NoServerForRegionException, especially in cases where the failure indicates a more permanent condition.