Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4338

TestNameNodeMetrics#testCorruptBlock is flaky

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Ran some background cpuburn threads, got this stack trace:

      Running org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
      Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.287 sec <<< FAILURE!
      testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)  Time elapsed: 14922 sec  <<< FAILURE!
      java.lang.AssertionError: Bad value for metric ScheduledReplicationBlocks expected:<1> but was:<0>
      	at org.junit.Assert.fail(Assert.java:91)
      	at org.junit.Assert.failNotEquals(Assert.java:645)
      	at org.junit.Assert.assertEquals(Assert.java:126)
      	at org.junit.Assert.assertEquals(Assert.java:470)
      	at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190)
      	at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:229)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
      	at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:242)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:137)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
      	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
      	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
      
      
      Results :
      
      Failed tests:   testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics): Bad value for metric ScheduledReplicationBlocks expected:<1> but was:<0>
      
      Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
      
      1. corruptblock
        10 kB
        Andrew Wang
      2. corruptblock.out
        61 kB
        Andrew Wang
      3. hdfs-4338.patch
        2 kB
        Andrew Wang

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1304 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1304/)
          HDFS-4338. TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1304 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1304/ ) HDFS-4338 . TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1274 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1274/)
          HDFS-4338. TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1274 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1274/ ) HDFS-4338 . TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #85 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/85/)
          HDFS-4338. TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #85 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/85/ ) HDFS-4338 . TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3164 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3164/)
          HDFS-4338. TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3164 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3164/ ) HDFS-4338 . TestNameNodeMetrics#testCorruptBlock is flaky. Contributed by Andrew Wang. (Revision 1428144) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1428144 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
          Hide
          atm Aaron T. Myers added a comment -

          I've just committed this to trunk. Thanks a lot for the contribution, Andrew.

          Show
          atm Aaron T. Myers added a comment - I've just committed this to trunk. Thanks a lot for the contribution, Andrew.
          Hide
          atm Aaron T. Myers added a comment -

          +1, the patch looks good to me. Good investigation, Andrew. I'm going to commit this momentarily.

          Show
          atm Aaron T. Myers added a comment - +1, the patch looks good to me. Good investigation, Andrew. I'm going to commit this momentarily.
          Hide
          andrew.wang Andrew Wang added a comment -

          Test failure looks unrelated, it's flaking on other testruns too.

          Show
          andrew.wang Andrew Wang added a comment - Test failure looks unrelated, it's flaking on other testruns too.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562511/hdfs-4338.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3699//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3699//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562511/hdfs-4338.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3699//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3699//console This message is automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Turns out the race is between the call to BlockManagerTestUtil#getComputedDatanodeWork() in the test and the BlockManager#ReplicationMonitor (which also calls #getComputedDatanodeWork(). The ScheduledReplicationBlocks metric reports the number of blocks scheduled for replication the last time BlockManager#getComputedDatanodeWork() was called. If the ReplicationMonitor runs after the call to BlockManagerTestUtil#getComputedDatanodeWork, ScheduledReplicationBlocks is correctly reported as 0, since the corrupted block was scheduled for replication last time.

          The fix is simply to remove this assert. I also removed an unnecessary call to #updateState() (which is called in #getComputedDatanodeWork(), and fixed a typo in a nearby comment.

          Show
          andrew.wang Andrew Wang added a comment - Turns out the race is between the call to BlockManagerTestUtil#getComputedDatanodeWork() in the test and the BlockManager#ReplicationMonitor (which also calls #getComputedDatanodeWork() . The ScheduledReplicationBlocks metric reports the number of blocks scheduled for replication the last time BlockManager#getComputedDatanodeWork() was called. If the ReplicationMonitor runs after the call to BlockManagerTestUtil#getComputedDatanodeWork , ScheduledReplicationBlocks is correctly reported as 0, since the corrupted block was scheduled for replication last time. The fix is simply to remove this assert. I also removed an unnecessary call to #updateState() (which is called in #getComputedDatanodeWork() , and fixed a typo in a nearby comment.
          Hide
          andrew.wang Andrew Wang added a comment -

          maven and stdout from a failed test run. Some kind of race between the BlockManager and the FSN metrics, since BM thinks it's 1 and FSN metrics thinks it's 0.

          Show
          andrew.wang Andrew Wang added a comment - maven and stdout from a failed test run. Some kind of race between the BlockManager and the FSN metrics, since BM thinks it's 1 and FSN metrics thinks it's 0.
          Hide
          andrew.wang Andrew Wang added a comment -

          Same test as in HDFS-2434, but a different metric this time.

          Show
          andrew.wang Andrew Wang added a comment - Same test as in HDFS-2434 , but a different metric this time.

            People

            • Assignee:
              andrew.wang Andrew Wang
              Reporter:
              andrew.wang Andrew Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development