Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.0, 3.0.0-beta1, 2.8.2
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The dataset lock is very highly contended. The unfair nature can be especially harmful to the heartbeat handling. Under high loads, partially expose by HDFS-12136 introducing disk i/o within the lock, the heartbeat handling thread may process commands so slowly due to the contention that the node becomes stale or falsely declared dead. The unfair lock is not helping and appears to be causing frequent starvation under load.

      1. HDFS-12137.branch-2.patch
        1 kB
        Daryn Sharp
      2. HDFS-12137.trunk.patch
        1 kB
        Daryn Sharp
      3. HDFS-12137.trunk.patch
        1 kB
        Daryn Sharp

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12010 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12010/)
        HDFS-12137. DN dataset lock should be fair. Contributed by Daryn Sharp. (kihwal: rev 8d86a93915ee00318289535d9c78e48b75c8359d)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12010 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12010/ ) HDFS-12137 . DN dataset lock should be fair. Contributed by Daryn Sharp. (kihwal: rev 8d86a93915ee00318289535d9c78e48b75c8359d) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
        Hide
        kihwal Kihwal Lee added a comment - - edited

        Thanks for the review, Xiao Chen and the patch, Daryn.
        Just committed to trunk, branch-2 and branch-2.8.

        Show
        kihwal Kihwal Lee added a comment - - edited Thanks for the review, Xiao Chen and the patch, Daryn. Just committed to trunk, branch-2 and branch-2.8.
        Hide
        kihwal Kihwal Lee added a comment -

        +1 the patch looks good. The test failure is not related.

        Show
        kihwal Kihwal Lee added a comment - +1 the patch looks good. The test failure is not related.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 2m 19s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
              trunk Compile Tests
        +1 mvninstall 13m 35s trunk passed
        +1 compile 0m 49s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 54s trunk passed
        -1 findbugs 1m 42s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 39s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 50s the patch passed
        +1 compile 0m 47s the patch passed
        +1 javac 0m 47s the patch passed
        +1 checkstyle 0m 34s the patch passed
        +1 mvnsite 0m 50s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 51s the patch passed
        +1 javadoc 0m 37s the patch passed
              Other Tests
        -1 unit 64m 50s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        92m 29s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-12137
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877315/HDFS-12137.trunk.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux bf376120b301 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 75c0220
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20278/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/20278/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20278/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20278/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 2m 19s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 13m 35s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 54s trunk passed -1 findbugs 1m 42s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 39s trunk passed       Patch Compile Tests +1 mvninstall 0m 50s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed +1 checkstyle 0m 34s the patch passed +1 mvnsite 0m 50s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 37s the patch passed       Other Tests -1 unit 64m 50s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 92m 29s Reason Tests Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-12137 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877315/HDFS-12137.trunk.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bf376120b301 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 75c0220 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20278/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/20278/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20278/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20278/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        The trunk precommit failed to download the patch and failed. I just kicked the build again and it looks like going this time.

        Modes:  Sentinel  MultiJDK  Jenkins  Robot  Docker  ResetRepo  UnitTests 
        Processing: HDFS-12137
        ERROR: Unsure how to process HDFS-12137.
        
        Show
        kihwal Kihwal Lee added a comment - The trunk precommit failed to download the patch and failed. I just kicked the build again and it looks like going this time. Modes: Sentinel MultiJDK Jenkins Robot Docker ResetRepo UnitTests Processing: HDFS-12137 ERROR: Unsure how to process HDFS-12137.
        Hide
        daryn Daryn Sharp added a comment -

        Reposting trunk to kick precommit just to be thorough.

        Show
        daryn Daryn Sharp added a comment - Reposting trunk to kick precommit just to be thorough.
        Hide
        xiaochen Xiao Chen added a comment -

        I think there was HDFS-10923 (committed the reverted) which would fix this as a side effect.
        This fix looks more surgical though, +1.

        Show
        xiaochen Xiao Chen added a comment - I think there was HDFS-10923 (committed the reverted) which would fix this as a side effect. This fix looks more surgical though, +1.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 21s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
              branch-2 Compile Tests
        +1 mvninstall 7m 44s branch-2 passed
        +1 compile 0m 48s branch-2 passed with JDK v1.8.0_131
        +1 compile 0m 49s branch-2 passed with JDK v1.7.0_131
        +1 checkstyle 0m 31s branch-2 passed
        +1 mvnsite 0m 59s branch-2 passed
        +1 findbugs 2m 14s branch-2 passed
        +1 javadoc 0m 40s branch-2 passed with JDK v1.8.0_131
        +1 javadoc 1m 5s branch-2 passed with JDK v1.7.0_131
              Patch Compile Tests
        +1 mvninstall 0m 48s the patch passed
        +1 compile 0m 43s the patch passed with JDK v1.8.0_131
        +1 javac 0m 43s the patch passed
        +1 compile 0m 47s the patch passed with JDK v1.7.0_131
        +1 javac 0m 47s the patch passed
        +1 checkstyle 0m 27s the patch passed
        +1 mvnsite 0m 52s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 19s the patch passed
        +1 javadoc 0m 39s the patch passed with JDK v1.8.0_131
        +1 javadoc 1m 3s the patch passed with JDK v1.7.0_131
              Other Tests
        -1 unit 50m 45s hadoop-hdfs in the patch failed with JDK v1.7.0_131.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        133m 58s



        Reason Tests
        JDK v1.8.0_131 Failed junit tests hadoop.hdfs.TestEncryptionZones
          hadoop.hdfs.server.namenode.TestDecommissioningStatus
        JDK v1.7.0_131 Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain
          hadoop.hdfs.server.balancer.TestBalancerRPCDelay
          hadoop.hdfs.TestEncryptionZonesWithKMS



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:5e40efe
        JIRA Issue HDFS-12137
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877150/HDFS-12137.branch-2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux fec2734037db 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2 / d83e871
        Default Java 1.7.0_131
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/20264/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_131.txt
        JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20264/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20264/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       branch-2 Compile Tests +1 mvninstall 7m 44s branch-2 passed +1 compile 0m 48s branch-2 passed with JDK v1.8.0_131 +1 compile 0m 49s branch-2 passed with JDK v1.7.0_131 +1 checkstyle 0m 31s branch-2 passed +1 mvnsite 0m 59s branch-2 passed +1 findbugs 2m 14s branch-2 passed +1 javadoc 0m 40s branch-2 passed with JDK v1.8.0_131 +1 javadoc 1m 5s branch-2 passed with JDK v1.7.0_131       Patch Compile Tests +1 mvninstall 0m 48s the patch passed +1 compile 0m 43s the patch passed with JDK v1.8.0_131 +1 javac 0m 43s the patch passed +1 compile 0m 47s the patch passed with JDK v1.7.0_131 +1 javac 0m 47s the patch passed +1 checkstyle 0m 27s the patch passed +1 mvnsite 0m 52s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 19s the patch passed +1 javadoc 0m 39s the patch passed with JDK v1.8.0_131 +1 javadoc 1m 3s the patch passed with JDK v1.7.0_131       Other Tests -1 unit 50m 45s hadoop-hdfs in the patch failed with JDK v1.7.0_131. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 133m 58s Reason Tests JDK v1.8.0_131 Failed junit tests hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.TestDecommissioningStatus JDK v1.7.0_131 Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain   hadoop.hdfs.server.balancer.TestBalancerRPCDelay   hadoop.hdfs.TestEncryptionZonesWithKMS Subsystem Report/Notes Docker Image:yetus/hadoop:5e40efe JIRA Issue HDFS-12137 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877150/HDFS-12137.branch-2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fec2734037db 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / d83e871 Default Java 1.7.0_131 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/20264/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_131.txt JDK v1.7.0_131 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20264/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20264/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        Passes a fair lock to instrumented lock instead of allowing it to implicitly create an unfair lock.

        Only difference in trunk/branch-2 is context of import.

        Show
        daryn Daryn Sharp added a comment - Passes a fair lock to instrumented lock instead of allowing it to implicitly create an unfair lock. Only difference in trunk/branch-2 is context of import.

          People

          • Assignee:
            daryn Daryn Sharp
            Reporter:
            daryn Daryn Sharp
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development