Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5805

TestCheckpoint.testCheckpoint fails intermittently on branch2

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      java.lang.AssertionError: Bad value for metric GetEditAvgTime
      Expected: gt(0.0)
           got: <0.0>
      
      	at org.junit.Assert.assertThat(Assert.java:780)
      	at org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341)
      	at org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070)
      
      

        Activity

        Hide
        mitdesai Mit Desai added a comment -

        I was not able to reproduce the test failure even a single time in many efforts. Closing this for now.

        Show
        mitdesai Mit Desai added a comment - I was not able to reproduce the test failure even a single time in many efforts. Closing this for now.
        Hide
        ebadger Eric Badger added a comment -

        I've been able to get this test to fail consistently about 1/3 of the time on my local cluster. I checked against branch-2.7 and trunk and it failed the same in both. Since it's checking time on metrics, it will fail if the test runs too quickly, which is something that does not often happen on Jenkins. This would explain why we don't see it fail on there. However, without any load on my machine, I can get frequent failures. If I increase the load on my machine, then the test does not fail.

        The interesting thing is that GetEditAvgTime == 0.0 in the test runs where it fails, and == 1.0 in the test runs when it succeeds. It's being treated as a double, but I only ever see it manifest as an integer. My guess is that the metrics are somewhere truncating the value and so when the time is between 0 and 1 it just truncates the decimal place, thus making it 0.

        Show
        ebadger Eric Badger added a comment - I've been able to get this test to fail consistently about 1/3 of the time on my local cluster. I checked against branch-2.7 and trunk and it failed the same in both. Since it's checking time on metrics, it will fail if the test runs too quickly, which is something that does not often happen on Jenkins. This would explain why we don't see it fail on there. However, without any load on my machine, I can get frequent failures. If I increase the load on my machine, then the test does not fail. The interesting thing is that GetEditAvgTime == 0.0 in the test runs where it fails, and == 1.0 in the test runs when it succeeds. It's being treated as a double, but I only ever see it manifest as an integer. My guess is that the metrics are somewhere truncating the value and so when the time is between 0 and 1 it just truncates the decimal place, thus making it 0.
        Hide
        ebadger Eric Badger added a comment -

        Attaching a patch that adds a loop of mkdir and delete calls, so that the edit log time will never be so low that it is truncated to 0. Without this patch, the test fails about 1/3 of the time for me locally (with an SSD). With the patch, it failed 0 times out of 50 attempts.

        Show
        ebadger Eric Badger added a comment - Attaching a patch that adds a loop of mkdir and delete calls, so that the edit log time will never be so low that it is truncated to 0. Without this patch, the test fails about 1/3 of the time for me locally (with an SSD). With the patch, it failed 0 times out of 50 attempts.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 38s trunk passed
        +1 compile 0m 45s trunk passed
        +1 checkstyle 0m 29s trunk passed
        +1 mvnsite 1m 4s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 48s trunk passed
        +1 javadoc 0m 55s trunk passed
        +1 mvninstall 0m 49s the patch passed
        +1 compile 0m 43s the patch passed
        +1 javac 0m 43s the patch passed
        +1 checkstyle 0m 24s the patch passed
        +1 mvnsite 0m 49s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 50s the patch passed
        +1 javadoc 0m 56s the patch passed
        +1 unit 62m 53s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        82m 23s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816718/HDFS-5805.001.patch
        JIRA Issue HDFS-5805
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 1e580629db01 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9d46a49
        Default Java 1.8.0_91
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16003/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16003/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 45s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 55s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 50s the patch passed +1 javadoc 0m 56s the patch passed +1 unit 62m 53s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 82m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816718/HDFS-5805.001.patch JIRA Issue HDFS-5805 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1e580629db01 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9d46a49 Default Java 1.8.0_91 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16003/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16003/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        +1 lgtm

        Show
        kihwal Kihwal Lee added a comment - +1 lgtm
        Hide
        kihwal Kihwal Lee added a comment -

        Committed to trunk, branch-2 and branch-2.8. Thanks for fixing this, Eric Badger.

        Show
        kihwal Kihwal Lee added a comment - Committed to trunk, branch-2 and branch-2.8. Thanks for fixing this, Eric Badger .
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #10196 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10196/)
        HDFS-5805. TestCheckpoint.testCheckpoint fails intermittently on (kihwal: rev 5e5b8793fba8e25aeba7a74878da4cf8e806f061)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10196 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10196/ ) HDFS-5805 . TestCheckpoint.testCheckpoint fails intermittently on (kihwal: rev 5e5b8793fba8e25aeba7a74878da4cf8e806f061) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java

          People

          • Assignee:
            ebadger Eric Badger
            Reporter:
            mitdesai Mit Desai
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development