-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.7.1, 3.0.0-alpha1
-
Fix Version/s: 2.8.0, 3.0.0-alpha1
-
Component/s: datanode
-
Labels:None
-
Target Version/s:
-
Hadoop Flags:Reviewed
I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock.
In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen.
Reason for deadLock:
1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos.
DN lock then boos lock
2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
bpos lock and then DN lock.
So, this can clearly create dead lock.
I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock.
Thoughts?
- is duplicated by
-
HDFS-9310 TestDataNodeHotSwapVolumes fails occasionally
-
- Resolved
-