Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11352

Potential deadlock in NN when failing over

    Details

      Description

      HDFS-11180 fixed a general class of deadlock that can occur when failing over between the MetricsSystemImpl and FSEditLog (see comments on that JIRA for more details). In trunk and branch-2/branch-2.8 this fix was successful by making the metrics calls not synchronize on FSEditLog.

      In branch-2.6 and branch-2.7 there is one more method, FSNamesystem#getTransactionsSinceLastCheckpoint, which still requires the lock on FSEditLog and thus can result in the same deadlock scenario. This can be seen by running TestFSNamesystemMBean#testWithFSEditLogLock with the patch in HDFS-11290 on either of these branches (it fails currently).

        Issue Links

          Activity

          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Akira Ajisaka.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Akira Ajisaka .
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Hi Yongjun Zhang, yes, it's correct.

          Show
          ajisakaa Akira Ajisaka added a comment - Hi Yongjun Zhang , yes, it's correct.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Erik Krogen and Akira Ajisaka,

          Thanks for your work here.

          It looks to me that the reason trunk doesn't need this patch because it has HDFS-7501. Because HDFS-7501 is not backported to 2.7.x and 2.6.x, we had the need for HDFS-11352 here. Does that sound correct to you?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Erik Krogen and Akira Ajisaka , Thanks for your work here. It looks to me that the reason trunk doesn't need this patch because it has HDFS-7501 . Because HDFS-7501 is not backported to 2.7.x and 2.6.x, we had the need for HDFS-11352 here. Does that sound correct to you? Thanks.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Committed this to branch-2.7 and branch-2.6. Thanks Erik Krogen for the fix!

          Show
          ajisakaa Akira Ajisaka added a comment - Committed this to branch-2.7 and branch-2.6. Thanks Erik Krogen for the fix!
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Nice catch! LGTM, +1.

          Show
          ajisakaa Akira Ajisaka added a comment - Nice catch! LGTM, +1.
          Hide
          xkrogen Erik Krogen added a comment -

          Whitespace issues and ASF license are clearly not related. All of the unit tests seem unrelated and pass fine locally.

          Akira Ajisaka, pinging based on involvement in HDFS-11180/HDFS-11290, can you review?

          Show
          xkrogen Erik Krogen added a comment - Whitespace issues and ASF license are clearly not related. All of the unit tests seem unrelated and pass fine locally. Akira Ajisaka , pinging based on involvement in HDFS-11180 / HDFS-11290 , can you review?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 12m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 8s branch-2.7 passed
          +1 compile 1m 5s branch-2.7 passed with JDK v1.8.0_121
          +1 compile 1m 5s branch-2.7 passed with JDK v1.7.0_121
          +1 checkstyle 0m 29s branch-2.7 passed
          +1 mvnsite 1m 2s branch-2.7 passed
          +1 mvneclipse 0m 17s branch-2.7 passed
          +1 findbugs 3m 2s branch-2.7 passed
          +1 javadoc 0m 58s branch-2.7 passed with JDK v1.8.0_121
          +1 javadoc 1m 45s branch-2.7 passed with JDK v1.7.0_121
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 54s the patch passed with JDK v1.8.0_121
          +1 javac 0m 54s the patch passed
          +1 compile 0m 58s the patch passed with JDK v1.7.0_121
          +1 javac 0m 58s the patch passed
          +1 checkstyle 0m 26s the patch passed
          +1 mvnsite 0m 58s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 1266 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 whitespace 0m 33s The patch 70 line(s) with tabs.
          +1 findbugs 3m 9s the patch passed
          +1 javadoc 0m 55s the patch passed with JDK v1.8.0_121
          +1 javadoc 1m 42s the patch passed with JDK v1.7.0_121
          -1 unit 46m 4s hadoop-hdfs in the patch failed with JDK v1.7.0_121.
          -1 asflicense 0m 22s The patch generated 3 ASF License warnings.
          140m 9s



          Reason Tests
          JDK v1.8.0_121 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
            hadoop.hdfs.server.datanode.TestBlockReplacement
          JDK v1.7.0_121 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Issue HDFS-11352
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12848350/HDFS-11352-branch-2.7.000.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 70ac25b542b2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / 1cf20b3
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18213/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18213/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 12m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 8s branch-2.7 passed +1 compile 1m 5s branch-2.7 passed with JDK v1.8.0_121 +1 compile 1m 5s branch-2.7 passed with JDK v1.7.0_121 +1 checkstyle 0m 29s branch-2.7 passed +1 mvnsite 1m 2s branch-2.7 passed +1 mvneclipse 0m 17s branch-2.7 passed +1 findbugs 3m 2s branch-2.7 passed +1 javadoc 0m 58s branch-2.7 passed with JDK v1.8.0_121 +1 javadoc 1m 45s branch-2.7 passed with JDK v1.7.0_121 +1 mvninstall 0m 51s the patch passed +1 compile 0m 54s the patch passed with JDK v1.8.0_121 +1 javac 0m 54s the patch passed +1 compile 0m 58s the patch passed with JDK v1.7.0_121 +1 javac 0m 58s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 58s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 1266 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 0m 33s The patch 70 line(s) with tabs. +1 findbugs 3m 9s the patch passed +1 javadoc 0m 55s the patch passed with JDK v1.8.0_121 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_121 -1 unit 46m 4s hadoop-hdfs in the patch failed with JDK v1.7.0_121. -1 asflicense 0m 22s The patch generated 3 ASF License warnings. 140m 9s Reason Tests JDK v1.8.0_121 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.datanode.TestBlockReplacement JDK v1.7.0_121 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Issue HDFS-11352 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12848350/HDFS-11352-branch-2.7.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 70ac25b542b2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / 1cf20b3 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18213/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/18213/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18213/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xkrogen Erik Krogen added a comment -

          Attaching patch with the (one-line) fix.

          Show
          xkrogen Erik Krogen added a comment - Attaching patch with the (one-line) fix.

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              xkrogen Erik Krogen
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development