Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9137

DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1, 3.0.0-alpha1
    • 2.8.0, 3.0.0-alpha1
    • datanode
    • None
    • Reviewed

    Description

      I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock.
      In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen.

      Reason for deadLock:

      1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos.
      DN lock then boos lock

      2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
      bpos lock and then DN lock.

      So, this can clearly create dead lock.
      I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock.

      Thoughts?

      Attachments

        1. HDFSS-9137.02.patch
          2 kB
          Uma Maheswara Rao G
        2. HDFS-9137.01-WithPreservingRootExceptions.patch
          2 kB
          Uma Maheswara Rao G
        3. HDFS-9137.00.patch
          2 kB
          Uma Maheswara Rao G

        Issue Links

          Activity

            People

              umamaheswararao Uma Maheswara Rao G
              umamaheswararao Uma Maheswara Rao G
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: