Description
All master threads are blocked waiting on this call to return:
"MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0 tid=0x00007f4b0408b000 nid=0x7821 in Object.wait() [0x00007f4ada24d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked <0x000000041c374f50> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881) at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208) at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250) at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225) at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634) - locked <0x000000041c1f0d80> (a org.apache.hadoop.hbase.master.RegionStates) at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Master is stuck trying to find hbase:meta on the server that just crashed and that we just recovered:
Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
Will add more detail in a sec.