Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.20.4
-
None
-
Ubuntu 8.04.4 LTS, Hadoop 0.20.2, Amazon EC2 x-large cluster
-
Reviewed
Description
We have a 15-node (14RS+1Master) hbase cluster. We just recently upgraded from 0.20.3 to 0.20.4. This cluster does have colocated hadoop MR, but we mostly use another MR cluster to hit it. Upon start, the cluster runs the jobs fine for about an hour. Afterwards, an RS seems to have locked up. Doing a get for a row in region being served by that region server hangs (cannot even ctrl+c out of the hbase shell). Attached is the thread dump. Verified in UI that the affect server runs on 0.20.4 and not 0.20.3.