Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9661

Deadlock in DN.FsDatasetImpl between moveBlockAcrossStorage and createRbw

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0, 2.8.0, 2.7.1, 2.7.2
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We found a deadlock in dn.FsDatasetImpl between moveBlockAcrossStorage and createRbw from rpc call: replaceBlock/writeBlock. The dn's jstack result is

      1. HDFS-9661.0.patch
        2 kB
        ade
      2. hdfs-9661-jstack.jpg.png
        779 kB
        ade
      3. HDFS-9661.001.patch
        2 kB
        ade

        Issue Links

          Activity

          Hide
          vinayrpet Vinayakumar B added a comment -

          +1 for the patch.

          Pending Jenkins

          Show
          vinayrpet Vinayakumar B added a comment - +1 for the patch. Pending Jenkins
          Hide
          drankye Kai Zheng added a comment -

          Good catch and nice report!

          The patch can solve the deadlock issue. Not sure if any other similar case like this and how to prevent such deadlock cleanly.
          Wonder if it's possible to consider a unified model for the lock here. For operations similar to FsDatasetImpl#moveBlockAcrossStorage and FsDatasetImpl#createRbw, they need to choose volume and obtain lock on RoundRobinVolumeChoosingPolicy, then need to lock on FsDatasetImpl in volume.getAvailable. So to avoid such deadlock situation, maybe in each thread, before the operation, avoid any lock on FsDatasetImpl object; during the operation, get lock on VolumeChoosingPolicy first.

          Show
          drankye Kai Zheng added a comment - Good catch and nice report! The patch can solve the deadlock issue. Not sure if any other similar case like this and how to prevent such deadlock cleanly. Wonder if it's possible to consider a unified model for the lock here. For operations similar to FsDatasetImpl#moveBlockAcrossStorage and FsDatasetImpl#createRbw , they need to choose volume and obtain lock on RoundRobinVolumeChoosingPolicy , then need to lock on FsDatasetImpl in volume.getAvailable . So to avoid such deadlock situation, maybe in each thread, before the operation, avoid any lock on FsDatasetImpl object; during the operation, get lock on VolumeChoosingPolicy first.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 30s trunk passed
          +1 compile 0m 40s trunk passed with JDK v1.8.0_66
          +1 compile 0m 41s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 51s trunk passed
          +1 javadoc 1m 5s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 48s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 44s the patch passed
          +1 compile 0m 37s the patch passed with JDK v1.8.0_66
          +1 javac 0m 37s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_91
          +1 javac 0m 39s the patch passed
          -1 checkstyle 0m 16s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 119 unchanged - 0 fixed = 121 total (was 119)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 2s the patch passed
          +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 41s the patch passed with JDK v1.7.0_91
          -1 unit 63m 33s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 63m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 22s Patch does not generate ASF License warnings.
          152m 32s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783034/HDFS-9661.0.patch
          JIRA Issue HDFS-9661
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4cfb664cbde7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / edc43a9
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14157/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14157/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 30s trunk passed +1 compile 0m 40s trunk passed with JDK v1.8.0_66 +1 compile 0m 41s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 48s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 44s the patch passed +1 compile 0m 37s the patch passed with JDK v1.8.0_66 +1 javac 0m 37s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_91 +1 javac 0m 39s the patch passed -1 checkstyle 0m 16s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 119 unchanged - 0 fixed = 121 total (was 119) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 2s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 41s the patch passed with JDK v1.7.0_91 -1 unit 63m 33s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 63m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 22s Patch does not generate ASF License warnings. 152m 32s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783034/HDFS-9661.0.patch JIRA Issue HDFS-9661 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4cfb664cbde7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / edc43a9 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14157/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14157/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14157/console This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -

          It missed 2.7.2rc2, but it might be voted down again. Please mark it as a blocker if you think it needs to be in 2.7.2. Otherwise, please target 2.7.3.

          Show
          kihwal Kihwal Lee added a comment - It missed 2.7.2rc2, but it might be voted down again. Please mark it as a blocker if you think it needs to be in 2.7.2. Otherwise, please target 2.7.3.
          Hide
          vinayrpet Vinayakumar B added a comment -

          ade, Can you update the patch with checkstyle fixes.

          Show
          vinayrpet Vinayakumar B added a comment - ade , Can you update the patch with checkstyle fixes.
          Hide
          aderen ade added a comment -

          update the code style

          Show
          aderen ade added a comment - update the code style
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 13s trunk passed
          +1 compile 0m 44s trunk passed with JDK v1.8.0_66
          +1 compile 0m 45s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 17s trunk passed
          +1 mvnsite 0m 57s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 2m 6s trunk passed
          +1 javadoc 1m 9s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 52s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 41s the patch passed with JDK v1.8.0_66
          +1 javac 0m 41s the patch passed
          +1 compile 0m 44s the patch passed with JDK v1.7.0_91
          +1 javac 0m 44s the patch passed
          +1 checkstyle 0m 18s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 15s the patch passed
          +1 javadoc 1m 5s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91
          -1 unit 57m 35s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 54m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          140m 24s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.TestLeaseRecovery2
            hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestNNThroughputBenchmark



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783307/HDFS-9661.001.patch
          JIRA Issue HDFS-9661
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9945b4bd4b4e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 57d0a94
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14170/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14170/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 13s trunk passed +1 compile 0m 44s trunk passed with JDK v1.8.0_66 +1 compile 0m 45s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 6s trunk passed +1 javadoc 1m 9s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 52s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 49s the patch passed +1 compile 0m 41s the patch passed with JDK v1.8.0_66 +1 javac 0m 41s the patch passed +1 compile 0m 44s the patch passed with JDK v1.7.0_91 +1 javac 0m 44s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 15s the patch passed +1 javadoc 1m 5s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91 -1 unit 57m 35s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 54m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 140m 24s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.TestLeaseRecovery2   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.namenode.TestNNThroughputBenchmark Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783307/HDFS-9661.001.patch JIRA Issue HDFS-9661 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9945b4bd4b4e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 57d0a94 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14170/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14170/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14170/console This message was automatically generated.
          Hide
          vinayrpet Vinayakumar B added a comment -

          +1 for the latest patch.
          Will commit shortly

          Show
          vinayrpet Vinayakumar B added a comment - +1 for the latest patch. Will commit shortly
          Hide
          vinayrpet Vinayakumar B added a comment -

          committed to trunk, branch-2, branch-2.8 and branch-2.7
          thanks ade for the catch and patch

          thanks Kai Zheng and Kihwal Lee

          Show
          vinayrpet Vinayakumar B added a comment - committed to trunk, branch-2, branch-2.8 and branch-2.7 thanks ade for the catch and patch thanks Kai Zheng and Kihwal Lee
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9140 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9140/)
          HDFS-9661. Deadlock in DN.FsDatasetImpl between moveBlockAcrossStorage (vinayakumarb: rev 14255786908f991fd2022480fe5575533a3dc7ce)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9140 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9140/ ) HDFS-9661 . Deadlock in DN.FsDatasetImpl between moveBlockAcrossStorage (vinayakumarb: rev 14255786908f991fd2022480fe5575533a3dc7ce) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

            People

            • Assignee:
              aderen ade
              Reporter:
              aderen ade
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development