[HBASE-2545] Unresponsive region server, potential deadlock - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.20.4
Fix Version/s: 0.20.5, 0.90.0
Component/s: regionserver
Labels:
None
Environment:

Ubuntu 8.04.4 LTS, Hadoop 0.20.2, Amazon EC2 x-large cluster

Hadoop Flags:

Reviewed

Description

We have a 15-node (14RS+1Master) hbase cluster. We just recently upgraded from 0.20.3 to 0.20.4. This cluster does have colocated hadoop MR, but we mostly use another MR cluster to hit it. Upon start, the cluster runs the jobs fine for about an hour. Afterwards, an RS seems to have locked up. Doing a get for a row in region being served by that region server hangs (cannot even ctrl+c out of the hbase shell). Attached is the thread dump. Verified in UI that the affect server runs on 0.20.4 and not 0.20.3.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

2545-trunk.txt
14/May/10 18:25
5 kB
Michael Stack
hbase-2545.txt
14/May/10 18:01
5 kB
Todd Lipcon
hbase-2545.txt
14/May/10 18:01
0.8 kB
Todd Lipcon
hbase-2545.txt
14/May/10 17:22
0.8 kB
Todd Lipcon
hbase-2545.txt
14/May/10 17:15
0.3 kB
Todd Lipcon
hbase-hadoop-regionserver-mi-prod-hbase05.ec2.biz360.com.out
14/May/10 05:28
221 kB
Kris Jirapinyo

Activity

People

Assignee:: Todd Lipcon

Reporter:: Kris Jirapinyo

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/May/10 05:27

Updated:: 12/Oct/12 06:15

Resolved:: 14/May/10 18:52