Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.4-alpha, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None

      Description

      Here's a deadlock scenario that cropped up during pipeline recovery, debugged through jstacks. Todd tipped me off to this one.

      1. Pipeline fails, client initiates recovery. We have the old leftover DataXceiver, and a new one doing recovery.
      2. New DataXceiver does recoverRbw, grabbing the FsDatasetImpl lock
      3. Old DataXceiver is in BlockReceiver#computePartialChunkCrc, calls FsDatasetImpl#getTmpInputStreams and blocks on the FsDatasetImpl lock.
      4. New DataXceiver ReplicaInPipeline#stopWriter, interrupting the old DataXceiver and then joining on it.
      5. Boom, deadlock. New DX holds the FsDatasetImpl lock and is joining on the old DX, which is in turn waiting on the FsDatasetImpl lock.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andrew.wang Andrew Wang
                Reporter:
                andrew.wang Andrew Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: