Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9145

Tracking methods that hold FSNamesytemLock for too long

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      It will be helpful that if we can have a way to track (or at least log a msg) if some operation is holding the FSNamesystem lock for a long time.

      1. HDFS-9145.000.patch
        5 kB
        Mingliang Liu
      2. HDFS-9145.001.patch
        6 kB
        Mingliang Liu
      3. HDFS-9145.002.patch
        6 kB
        Mingliang Liu
      4. HDFS-9145.003.patch
        6 kB
        Mingliang Liu
      5. testlog.txt
        59 kB
        Kihwal Lee

        Issue Links

          Activity

          Hide
          jnp Jitendra Nath Pandey added a comment -

          Thanks for filing this Jing Zhao. It will be super useful.

          Show
          jnp Jitendra Nath Pandey added a comment - Thanks for filing this Jing Zhao . It will be super useful.
          Hide
          liuml07 Mingliang Liu added a comment -

          The v0 patch simply logs the time interval, if the read/write FSLock is held for a longer time than threshold.

          Show
          liuml07 Mingliang Liu added a comment - The v0 patch simply logs the time interval, if the read/write FSLock is held for a longer time than threshold.
          Hide
          liuml07 Mingliang Liu added a comment -

          The v1 patch implements the write lock with re-enter support. The read lock was removed from this patch as the holding time should be thread-aware.

          Show
          liuml07 Mingliang Liu added a comment - The v1 patch implements the write lock with re-enter support. The read lock was removed from this patch as the holding time should be thread-aware.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 20m 14s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 2s There were no new javac warning messages.
          +1 javadoc 10m 28s There were no new javadoc warning messages.
          -1 release audit 0m 20s The applied patch generated 1 release audit warnings.
          +1 checkstyle 2m 17s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 common tests 13m 8s Tests failed in hadoop-common.
          -1 hdfs tests 190m 28s Tests failed in hadoop-hdfs.
              251m 28s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.namenode.TestFSNamesystem
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.server.namenode.TestINodeAttributeProvider
          Timed out tests org.apache.hadoop.http.TestHttpServerLifecycle
            org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled
            org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12765467/HDFS-9145.001.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / fde729f
          Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/patchReleaseAuditProblems.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12844/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12844/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 20m 14s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 2s There were no new javac warning messages. +1 javadoc 10m 28s There were no new javadoc warning messages. -1 release audit 0m 20s The applied patch generated 1 release audit warnings. +1 checkstyle 2m 17s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 4m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 13m 8s Tests failed in hadoop-common. -1 hdfs tests 190m 28s Tests failed in hadoop-hdfs.     251m 28s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestFSNamesystem   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.server.namenode.TestINodeAttributeProvider Timed out tests org.apache.hadoop.http.TestHttpServerLifecycle   org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled   org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765467/HDFS-9145.001.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / fde729f Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/patchReleaseAuditProblems.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12844/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12844/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12844/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          I think here we do not need to use a stack to track multi-entrance scenario, considering 1) the current code usually do not grab the write lock multiple times, and 2) the information about the total holding time of a thread may be good enough for us. So we can only use the hold count information to track the time between last_unlock and first_lock.

          Show
          jingzhao Jing Zhao added a comment - I think here we do not need to use a stack to track multi-entrance scenario, considering 1) the current code usually do not grab the write lock multiple times, and 2) the information about the total holding time of a thread may be good enough for us. So we can only use the hold count information to track the time between last_unlock and first_lock.
          Hide
          liuml07 Mingliang Liu added a comment -

          Thank you Jing Zhao for your review. The idea to track longest lock holding interval for multiple-reentrance scenario makes sense to me. The v2 patch address this comment.

          Show
          liuml07 Mingliang Liu added a comment - Thank you Jing Zhao for your review. The idea to track longest lock holding interval for multiple-reentrance scenario makes sense to me. The v2 patch address this comment.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 20m 13s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 2s There were no new javac warning messages.
          +1 javadoc 10m 32s There were no new javadoc warning messages.
          -1 release audit 0m 19s The applied patch generated 1 release audit warnings.
          +1 checkstyle 2m 19s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 4m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 common tests 6m 40s Tests failed in hadoop-common.
          -1 hdfs tests 138m 42s Tests failed in hadoop-hdfs.
              193m 24s  



          Reason Tests
          Failed unit tests hadoop.ipc.TestDecayRpcScheduler
            hadoop.net.TestClusterTopology
            hadoop.hdfs.TestLeaseRecovery2
          Timed out tests org.apache.hadoop.hdfs.server.namenode.TestNameNodeXAttr
            org.apache.hadoop.hdfs.TestRenameWhileOpen



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12765927/HDFS-9145.002.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / def374e
          Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/patchReleaseAuditProblems.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12904/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12904/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 20m 13s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 2s There were no new javac warning messages. +1 javadoc 10m 32s There were no new javadoc warning messages. -1 release audit 0m 19s The applied patch generated 1 release audit warnings. +1 checkstyle 2m 19s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 4m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 6m 40s Tests failed in hadoop-common. -1 hdfs tests 138m 42s Tests failed in hadoop-hdfs.     193m 24s   Reason Tests Failed unit tests hadoop.ipc.TestDecayRpcScheduler   hadoop.net.TestClusterTopology   hadoop.hdfs.TestLeaseRecovery2 Timed out tests org.apache.hadoop.hdfs.server.namenode.TestNameNodeXAttr   org.apache.hadoop.hdfs.TestRenameWhileOpen Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765927/HDFS-9145.002.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / def374e Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/patchReleaseAuditProblems.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12904/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12904/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12904/console This message was automatically generated.
          Hide
          liuml07 Mingliang Liu added a comment -

          The release audit is unrelated.

          All failing tests can pass locally (Mac OS X and Gentoo Linux), and seem unrelated.

          Show
          liuml07 Mingliang Liu added a comment - The release audit is unrelated. All failing tests can pass locally (Mac OS X and Gentoo Linux), and seem unrelated.
          Hide
          jingzhao Jing Zhao added a comment -

          The patch looks good to me. One minor optional thing to do is to use a boolean capturing whether to log first, and do the real log out of the lock. Another nit is that in the warning message, we can add a newline before the stack trace msg. Other than this +1.

          Show
          jingzhao Jing Zhao added a comment - The patch looks good to me. One minor optional thing to do is to use a boolean capturing whether to log first, and do the real log out of the lock. Another nit is that in the warning message, we can add a newline before the stack trace msg. Other than this +1.
          Hide
          liuml07 Mingliang Liu added a comment -

          Thank you Jing Zhao for your review. The v3 patch addresses the two comments.

          Show
          liuml07 Mingliang Liu added a comment - Thank you Jing Zhao for your review. The v3 patch addresses the two comments.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 20m 30s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 7s There were no new javac warning messages.
          +1 javadoc 10m 39s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 2m 16s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 4m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 6m 59s Tests passed in hadoop-common.
          -1 hdfs tests 190m 33s Tests failed in hadoop-hdfs.
              246m 5s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestEncryptionZonesWithKMS
            hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12766189/HDFS-9145.003.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / c60a16f
          hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12942/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12942/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12942/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12942/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 20m 30s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 7s There were no new javac warning messages. +1 javadoc 10m 39s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 2m 16s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 4m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 6m 59s Tests passed in hadoop-common. -1 hdfs tests 190m 33s Tests failed in hadoop-hdfs.     246m 5s   Reason Tests Failed unit tests hadoop.hdfs.TestEncryptionZonesWithKMS   hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766189/HDFS-9145.003.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / c60a16f hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12942/artifact/patchprocess/testrun_hadoop-common.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12942/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12942/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12942/console This message was automatically generated.
          Hide
          liuml07 Mingliang Liu added a comment -

          The failing tests can pass locally and seem unrelated.

          Show
          liuml07 Mingliang Liu added a comment - The failing tests can pass locally and seem unrelated.
          Hide
          liuml07 Mingliang Liu added a comment -

          The failing tests seem unrelated. Specially, we consider the failures in hadoop.hdfs.TestEncryptionZonesWithKMS as a data race bug and filed a jira about this HADOOP-12474.

          Show
          liuml07 Mingliang Liu added a comment - The failing tests seem unrelated. Specially, we consider the failures in hadoop.hdfs.TestEncryptionZonesWithKMS as a data race bug and filed a jira about this HADOOP-12474 .
          Hide
          jingzhao Jing Zhao added a comment -

          +1

          Show
          jingzhao Jing Zhao added a comment - +1
          Hide
          wheat9 Haohui Mai added a comment -

          I've committed the patch to trunk and branch-2. Thanks Mingliang Liu for the contribution.

          Show
          wheat9 Haohui Mai added a comment - I've committed the patch to trunk and branch-2. Thanks Mingliang Liu for the contribution.
          Hide
          liuml07 Mingliang Liu added a comment -

          Thank you Jing Zhao and Haohui Mai for your reviews and commit.

          Show
          liuml07 Mingliang Liu added a comment - Thank you Jing Zhao and Haohui Mai for your reviews and commit.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8624 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8624/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8624 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8624/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2470 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2470/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2470 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2470/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1259 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1259/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1259 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1259/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #524 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/524/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #524 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/524/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #535 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/535/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #535 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/535/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #491 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/491/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #491 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/491/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2429 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2429/)
          HDFS-9145. Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2429 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2429/ ) HDFS-9145 . Tracking methods that hold FSNamesytemLock for too long. (wheat9: rev d1e1925bf6c3cf7fd23ed8df5a5e18677fc299d8) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          jingzhao Jing Zhao added a comment -

          Actually the value of writeLockInterval should be captured within the lock. The current code has a race on writeLockHeldTimeStamp. Mingliang Liu, do you want to fix this in a separate jira?

          Show
          jingzhao Jing Zhao added a comment - Actually the value of writeLockInterval should be captured within the lock. The current code has a race on writeLockHeldTimeStamp . Mingliang Liu , do you want to fix this in a separate jira?
          Hide
          liuml07 Mingliang Liu added a comment -

          Sure. Nice catch!

          I filed HDFS-9467 to track this. Thanks for reporting this.

          Show
          liuml07 Mingliang Liu added a comment - Sure. Nice catch! I filed HDFS-9467 to track this. Thanks for reporting this.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Mingliang Liu, Jing Zhao. This is pretty good improvement. I just backport this change and HDFS-9467 to branch-2.7.

          Show
          zhz Zhe Zhang added a comment - Thanks Mingliang Liu , Jing Zhao . This is pretty good improvement. I just backport this change and HDFS-9467 to branch-2.7.
          Hide
          liuml07 Mingliang Liu added a comment - - edited

          Thanks Zhe Zhang for taking care of this. I also noticed that you backported HDFS-9467.

          Show
          liuml07 Mingliang Liu added a comment - - edited Thanks Zhe Zhang for taking care of this. I also noticed that you backported HDFS-9467 .
          Hide
          kihwal Kihwal Lee added a comment -

          There is a test failure in branch-2.7 after this.

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0
          Running org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem
          Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem
          testFSLockGetWaiterCount(org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem)  Time elapsed: 0.004 sec  <<< FAILURE!
          java.lang.AssertionError: Expected number of blocked thread not found expected:<3> but was:<2>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:555)
          	at org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem.testFSLockGetWaiterCount(TestFSNamesystem.java:244)
          

          It seems to pass when this test case is run separately. It might be due to interactions with other tests in the suite.

          Show
          kihwal Kihwal Lee added a comment - There is a test failure in branch-2.7 after this. ------------------------------------------------------- T E S T S ------------------------------------------------------- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem testFSLockGetWaiterCount(org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem) Time elapsed: 0.004 sec <<< FAILURE! java.lang.AssertionError: Expected number of blocked thread not found expected:<3> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem.testFSLockGetWaiterCount(TestFSNamesystem.java:244) It seems to pass when this test case is run separately. It might be due to interactions with other tests in the suite.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Kihwal. I'm trying to fix this.

          Show
          zhz Zhe Zhang added a comment - Thanks Kihwal. I'm trying to fix this.
          Hide
          zhz Zhe Zhang added a comment -

          I can't reproduce the testFSLockGetWaiterCount failure locally. Kihwal Lee Mingliang Liu How about in your local environments?

          Show
          zhz Zhe Zhang added a comment - I can't reproduce the testFSLockGetWaiterCount failure locally. Kihwal Lee Mingliang Liu How about in your local environments?
          Hide
          liuml07 Mingliang Liu added a comment - - edited

          Hm.... Is this an unrelated bug HDFS-8915?

          Show
          liuml07 Mingliang Liu added a comment - - edited Hm.... Is this an unrelated bug HDFS-8915 ?
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Mingliang Liu, it looks very likely.

          Show
          zhz Zhe Zhang added a comment - Thanks Mingliang Liu , it looks very likely.
          Hide
          kihwal Kihwal Lee added a comment -

          For me it fails when I run TestFSNamesystem, but passes when I specify TestFSNamesystem#testFSLockGetWaiterCount.
          I will get the test log and upload here.

          Show
          kihwal Kihwal Lee added a comment - For me it fails when I run TestFSNamesystem , but passes when I specify TestFSNamesystem#testFSLockGetWaiterCount . I will get the test log and upload here.
          Hide
          kihwal Kihwal Lee added a comment -

          I am not sure how helpful this log will be. I am using openjdk 1.8.0_101 on my box.

          Show
          kihwal Kihwal Lee added a comment - I am not sure how helpful this log will be. I am using openjdk 1.8.0_101 on my box.
          Hide
          liuml07 Mingliang Liu added a comment - - edited

          Kihwal Lee are you able to reproduce this failure consistently? How about the test without this patch?

          On my local machine, I can not reproduce the bug on Java8 against the branch-2.7 specifying TestFSNamesystem class (9 test cases).

          Show
          liuml07 Mingliang Liu added a comment - - edited Kihwal Lee are you able to reproduce this failure consistently? How about the test without this patch? On my local machine, I can not reproduce the bug on Java8 against the branch-2.7 specifying TestFSNamesystem class (9 test cases).
          Hide
          kihwal Kihwal Lee added a comment -

          I can reproduce it consistently. But when I tried the approach in the latest patch in HDFS-8915, it passes. If I undo the patch, the test failure is reproduced 100% times. Let's get HDFS-8915 moving.

          Show
          kihwal Kihwal Lee added a comment - I can reproduce it consistently. But when I tried the approach in the latest patch in HDFS-8915 , it passes. If I undo the patch, the test failure is reproduced 100% times. Let's get HDFS-8915 moving.
          Hide
          kihwal Kihwal Lee added a comment -

          I noticed that 2.7 CHANGES.txt has no entry for this and few other recent cherry-picks.

          Show
          kihwal Kihwal Lee added a comment - I noticed that 2.7 CHANGES.txt has no entry for this and few other recent cherry-picks.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Kihwal Lee for noticing this. But are we still using CHANGES.txt to keep track of branch-2.7 changes? This JIRA is a special case because when it went in branch-2 it had a CHANGES.txt entry. But for all new JIRAs targeting 2.7.4 we are no longer putting in CHANGES.txt entries anyway.

          Show
          zhz Zhe Zhang added a comment - Thanks Kihwal Lee for noticing this. But are we still using CHANGES.txt to keep track of branch-2.7 changes? This JIRA is a special case because when it went in branch-2 it had a CHANGES.txt entry. But for all new JIRAs targeting 2.7.4 we are no longer putting in CHANGES.txt entries anyway.
          Hide
          andrew.wang Andrew Wang added a comment -

          We still need manual CHANGES.txt entries for 2.6.x and 2.7.x changes. 2.8 and up use Yetus releasedocmaker to generate them automatically.

          Show
          andrew.wang Andrew Wang added a comment - We still need manual CHANGES.txt entries for 2.6.x and 2.7.x changes. 2.8 and up use Yetus releasedocmaker to generate them automatically.

            People

            • Assignee:
              liuml07 Mingliang Liu
              Reporter:
              jingzhao Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development