Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2434

TestNameNodeMetrics.testCorruptBlock fails intermittently

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0, 0.23.7
    • Component/s: test
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      java.lang.AssertionError: Bad value for metric CorruptBlocks expected:<1> but was:<0>
      at org.junit.Assert.fail(Assert.java:91)
      at org.junit.Assert.failNotEquals(Assert.java:645)
      at org.junit.Assert.assertEquals(Assert.java:126)
      at org.junit.Assert.assertEquals(Assert.java:470)
      at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
      at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175)
      at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at junit.framework.TestCase.runTest(TestCase.java:168)
      at junit.framework.TestCase.runBare(TestCase.java:134)

      1. HDFS-2434.001.patch
        2 kB
        Jing Zhao
      2. HDFS-2434.002.patch
        2 kB
        Jing Zhao
      3. HDFS-2434.trunk.003.patch
        3 kB
        Jing Zhao
      4. HDFS-2434.trunk.004.patch
        3 kB
        Jing Zhao
      5. HDFS-2434.trunk.005.patch
        2 kB
        Jing Zhao

        Issue Links

          Activity

          Hide
          Harsh J added a comment -

          Hi, was this against trunk or branch-1?

          Show
          Harsh J added a comment - Hi, was this against trunk or branch-1?
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Harsh, thanks for looking this.
          I remember i have seen this in trunk and also corrected some tests along with HDFS-1765. After that i don't see any failures in TestNAmeNodeMetrics. I am just closing this as con't reproduce ( again i ran 4 times this test for reconfirmation.).

          Show
          Uma Maheswara Rao G added a comment - Hi Harsh, thanks for looking this. I remember i have seen this in trunk and also corrected some tests along with HDFS-1765 . After that i don't see any failures in TestNAmeNodeMetrics. I am just closing this as con't reproduce ( again i ran 4 times this test for reconfirmation.).
          Hide
          Harsh J added a comment -

          Thanks for getting back Uma. We ran into a situation where the corrupt replica map (from where this metric derives itself, via size), ended up having bogus entries that weren't then being removed away by anything. I just happened to land on this one but is unrelated.

          Thanks again

          Show
          Harsh J added a comment - Thanks for getting back Uma. We ran into a situation where the corrupt replica map (from where this metric derives itself, via size), ended up having bogus entries that weren't then being removed away by anything. I just happened to land on this one but is unrelated. Thanks again
          Hide
          Todd Lipcon added a comment -

          I'm still seeing this sporadically on trunk. For example https://builds.apache.org/job/PreCommit-HDFS-Build/2841//testReport/org.apache.hadoop.hdfs.server.namenode.metrics/TestNameNodeMetrics/testCorruptBlock/
          (and I don't think it's due to any changes in that patch)

          Show
          Todd Lipcon added a comment - I'm still seeing this sporadically on trunk. For example https://builds.apache.org/job/PreCommit-HDFS-Build/2841//testReport/org.apache.hadoop.hdfs.server.namenode.metrics/TestNameNodeMetrics/testCorruptBlock/ (and I don't think it's due to any changes in that patch)
          Hide
          Kihwal Lee added a comment -

          The test case fails this way when the corrupt replica is fixed right away before gathering namenode metrics. In one example, computeReplicationWorkForBlocks() was done within 10ms of the block corruption and the datanode did heartbeat in 380ms. The block corruption was resolved completely in 13ms after that.

          Since replication monitor and dn heartbeats are asynchronous, the current way of sleeping for 1 sec is not a reliable way to hit a moment between the two.

          Show
          Kihwal Lee added a comment - The test case fails this way when the corrupt replica is fixed right away before gathering namenode metrics. In one example, computeReplicationWorkForBlocks() was done within 10ms of the block corruption and the datanode did heartbeat in 380ms. The block corruption was resolved completely in 13ms after that. Since replication monitor and dn heartbeats are asynchronous, the current way of sleeping for 1 sec is not a reliable way to hit a moment between the two.
          Hide
          Jing Zhao added a comment -

          Based on Kihwal's analysis, can we solve the problem on the CorruptBlocks metric by disabling the heartbeats of datanodes before marking the block as corrupt?

          Show
          Jing Zhao added a comment - Based on Kihwal's analysis, can we solve the problem on the CorruptBlocks metric by disabling the heartbeats of datanodes before marking the block as corrupt?
          Hide
          Jing Zhao added a comment -

          Made some further changes for the patch. In the testCorrupt testcase, because currently the delete operation will not remove the pending record in NN, it is possible that before the DN sends back a "block has been received" msg to NN, the block has been deleted due to the deletion request. In that case, it seems that the pending record cannot be removed until timeout.

          Thus the new patch first waits for the recovery to finish, and then do the deletion.

          Show
          Jing Zhao added a comment - Made some further changes for the patch. In the testCorrupt testcase, because currently the delete operation will not remove the pending record in NN, it is possible that before the DN sends back a "block has been received" msg to NN, the block has been deleted due to the deletion request. In that case, it seems that the pending record cannot be removed until timeout. Thus the new patch first waits for the recovery to finish, and then do the deletion.
          Hide
          Jing Zhao added a comment -

          The 003 patch could not apply to trunk after the changes in HDFS-4059. Modify the patch to be consistent.

          Show
          Jing Zhao added a comment - The 003 patch could not apply to trunk after the changes in HDFS-4059 . Modify the patch to be consistent.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549436/HDFS-2434.trunk.004.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3351//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3351//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549436/HDFS-2434.trunk.004.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3351//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3351//console This message is automatically generated.
          Hide
          Jing Zhao added a comment -

          Update the patch based on the change in HDFS-4072.

          Show
          Jing Zhao added a comment - Update the patch based on the change in HDFS-4072 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550020/HDFS-2434.trunk.005.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3370//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3370//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550020/HDFS-2434.trunk.005.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3370//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3370//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Did you run test in a loop to ensure it does not fail?

          Show
          Suresh Srinivas added a comment - Did you run test in a loop to ensure it does not fail?
          Hide
          Jing Zhao added a comment -

          Have run the testcase 551 times locally and all of them passed.

          Show
          Jing Zhao added a comment - Have run the testcase 551 times locally and all of them passed.
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to trunk. Thank you Jing.

          Show
          Suresh Srinivas added a comment - I committed the patch to trunk. Thank you Jing.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2914 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2914/)
          HDFS-2434. TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2914 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2914/ ) HDFS-2434 . TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          Suresh Srinivas added a comment -

          I am going to skip merging this change to branch-2, since the port is not straightforward.

          Show
          Suresh Srinivas added a comment - I am going to skip merging this change to branch-2, since the port is not straightforward.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2915 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2915/)
          Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2915 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2915/ ) Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #13 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/13/)
          Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446)
          HDFS-2434. TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #13 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/13/ ) Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446) HDFS-2434 . TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1205 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1205/)
          Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446)
          HDFS-2434. TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1205 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1205/ ) Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446) HDFS-2434 . TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1235 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1235/)
          Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446)
          HDFS-2434. TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1235 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1235/ ) Moving HDFS-2434 from Release 2.0.3 section to trunk section. (Revision 1401446) HDFS-2434 . TestNameNodeMetrics.testCorruptBlock fails intermittently. Contributed by Jing Zhao. (Revision 1401423) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401446 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401423 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          Kihwal Lee added a comment -

          Merged to branch-0.23.

          Show
          Kihwal Lee added a comment - Merged to branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #546 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/546/)
          svn merge -c 1401423 Merging from trunk to branch-0.23 to fix HDFS-2434. (Revision 1453627)

          Result = UNSTABLE
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453627
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #546 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/546/ ) svn merge -c 1401423 Merging from trunk to branch-0.23 to fix HDFS-2434 . (Revision 1453627) Result = UNSTABLE kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453627 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Uma Maheswara Rao G
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development