Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4851

Deadlock in pipeline recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.4-alpha, 3.0.0-alpha1
    • None
    • datanode
    • None

    Description

      Here's a deadlock scenario that cropped up during pipeline recovery, debugged through jstacks. Todd tipped me off to this one.

      1. Pipeline fails, client initiates recovery. We have the old leftover DataXceiver, and a new one doing recovery.
      2. New DataXceiver does recoverRbw, grabbing the FsDatasetImpl lock
      3. Old DataXceiver is in BlockReceiver#computePartialChunkCrc, calls FsDatasetImpl#getTmpInputStreams and blocks on the FsDatasetImpl lock.
      4. New DataXceiver ReplicaInPipeline#stopWriter, interrupting the old DataXceiver and then joining on it.
      5. Boom, deadlock. New DX holds the FsDatasetImpl lock and is joining on the old DX, which is in turn waiting on the FsDatasetImpl lock.

      Attachments

        1. hdfs-4851-1.patch
          1 kB
          Andrew Wang

        Issue Links

          Activity

            People

              andrew.wang Andrew Wang
              andrew.wang Andrew Wang
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: