Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11352

Potential deadlock in NN when failing over

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      HDFS-11180 fixed a general class of deadlock that can occur when failing over between the MetricsSystemImpl and FSEditLog (see comments on that JIRA for more details). In trunk and branch-2/branch-2.8 this fix was successful by making the metrics calls not synchronize on FSEditLog.

      In branch-2.6 and branch-2.7 there is one more method, FSNamesystem#getTransactionsSinceLastCheckpoint, which still requires the lock on FSEditLog and thus can result in the same deadlock scenario. This can be seen by running TestFSNamesystemMBean#testWithFSEditLogLock with the patch in HDFS-11290 on either of these branches (it fails currently).

      Attachments

        1. HDFS-11352-branch-2.7.000.patch
          0.8 kB
          Erik Krogen

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xkrogen Erik Krogen
            xkrogen Erik Krogen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment