Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9137

DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1, 3.0.0-alpha1
    • 2.8.0, 3.0.0-alpha1
    • datanode
    • None
    • Reviewed

    Description

      I can see this code flows between DataNode#refreshVolumes and BPOfferService#registrationSucceeded could cause deadLock.
      In practice situation may be rare as user calling refreshVolumes at the time DN registration with NN. But seems like issue can happen.

      Reason for deadLock:

      1) refreshVolumes will be called with DN lock and after at the end it will also trigger Block report. In the Block report call, BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes readLock on bpos.
      DN lock then boos lock

      2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
      bpos lock and then DN lock.

      So, this can clearly create dead lock.
      I think simple fix could be to move triggerBlockReport call outside out DN lock and I feel that call may not be really needed inside DN lock.

      Thoughts?

      Attachments

        1. HDFS-9137.00.patch
          2 kB
          Uma Maheswara Rao G
        2. HDFS-9137.01-WithPreservingRootExceptions.patch
          2 kB
          Uma Maheswara Rao G
        3. HDFSS-9137.02.patch
          2 kB
          Uma Maheswara Rao G

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            umamaheswararao Uma Maheswara Rao G
            umamaheswararao Uma Maheswara Rao G
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment