Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.1, 3.0.0-alpha1
-
None
-
Reviewed
Description
I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock.
In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen.
Reason for deadLock:
1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos.
DN lock then boos lock
2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
bpos lock and then DN lock.
So, this can clearly create dead lock.
I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock.
Thoughts?
Attachments
Attachments
Issue Links
- is duplicated by
-
HDFS-9310 TestDataNodeHotSwapVolumes fails occasionally
- Resolved