Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.4-alpha, 3.0.0-alpha1
-
None
-
None
Description
Here's a deadlock scenario that cropped up during pipeline recovery, debugged through jstacks. Todd tipped me off to this one.
- Pipeline fails, client initiates recovery. We have the old leftover DataXceiver, and a new one doing recovery.
- New DataXceiver does recoverRbw, grabbing the FsDatasetImpl lock
- Old DataXceiver is in BlockReceiver#computePartialChunkCrc, calls FsDatasetImpl#getTmpInputStreams and blocks on the FsDatasetImpl lock.
- New DataXceiver ReplicaInPipeline#stopWriter, interrupting the old DataXceiver and then joining on it.
- Boom, deadlock. New DX holds the FsDatasetImpl lock and is joining on the old DX, which is in turn waiting on the FsDatasetImpl lock.
Attachments
Attachments
Issue Links
- duplicates
-
HDFS-3655 Datanode recoverRbw could hang sometime
- Resolved