Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Target Version/s:

      Description

      I have seen this test failure occurred a few times in trunk:

      Error Message

      Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 after 20000 msec. Last counts: live = 2, excess = 0, corrupt = 0

      Stacktrace

      java.util.concurrent.TimeoutException: Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 after 20000 msec. Last counts: live = 2, excess = 0, corrupt = 0
      at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
      at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
      at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
      at org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)

      1. HDFS-9358.001.patch
        2 kB
        Masatake Iwasaki
      2. HDFS-9358.002.patch
        3 kB
        Masatake Iwasaki

        Activity

        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks, Walter Su!

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks, Walter Su !
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #610 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/610/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Show
        hudson Hudson added a comment - ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #610 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/610/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2548 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2548/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2548 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2548/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2613 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2613/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2613 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2613/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1408 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1408/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1408 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1408/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #684 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/684/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #684 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/684/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #671 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/671/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #671 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/671/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8810 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8810/)
        HDFS-9358. TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8810 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8810/ ) HDFS-9358 . TestNodeCount#testNodeCount timed out. Contributed by (waltersu4549: rev 621cbb4f69072bde259f213629f84494416ae12f) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNodeCount.java
        Hide
        walter.k.su Walter Su added a comment -

        Committed to trunk and branch-2.
        Thanks Masatake Iwasaki for contribution, and Wei-Chiu Chuang for good analysis.

        Show
        walter.k.su Walter Su added a comment - Committed to trunk and branch-2. Thanks Masatake Iwasaki for contribution, and Wei-Chiu Chuang for good analysis.
        Hide
        walter.k.su Walter Su added a comment -

        It did not fail in 100 runs.

        Great. Thanks Masatake Iwasaki. +1 for last patch.

        Show
        walter.k.su Walter Su added a comment - It did not fail in 100 runs. Great. Thanks Masatake Iwasaki . +1 for last patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s docker + precommit patch detected.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 4m 7s trunk passed
        +1 compile 1m 5s trunk passed with JDK v1.8.0_66
        +1 compile 0m 50s trunk passed with JDK v1.7.0_79
        +1 checkstyle 0m 19s trunk passed
        +1 mvnsite 0m 52s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 2m 28s trunk passed
        +1 javadoc 2m 10s trunk passed with JDK v1.8.0_66
        +1 javadoc 2m 54s trunk passed with JDK v1.7.0_79
        +1 mvninstall 1m 2s the patch passed
        +1 compile 0m 48s the patch passed with JDK v1.8.0_66
        +1 javac 0m 48s the patch passed
        +1 compile 0m 38s the patch passed with JDK v1.7.0_79
        +1 javac 0m 38s the patch passed
        +1 checkstyle 0m 18s the patch passed
        +1 mvnsite 0m 49s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 23s the patch passed
        +1 javadoc 1m 24s the patch passed with JDK v1.8.0_66
        +1 javadoc 2m 23s the patch passed with JDK v1.7.0_79
        -1 unit 93m 47s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        -1 unit 86m 24s hadoop-hdfs in the patch failed with JDK v1.7.0_79.
        -1 asflicense 0m 30s Patch generated 56 ASF License warnings.
        209m 36s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage
          hadoop.hdfs.server.namenode.ha.TestEditLogTailer
          hadoop.hdfs.security.TestDelegationTokenForProxyUser
          hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength
          hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
          hadoop.hdfs.server.namenode.ha.TestHAAppend
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.datanode.TestDirectoryScanner
        JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing
          hadoop.hdfs.TestPersistBlocks
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180
          hadoop.hdfs.TestDataTransferKeepalive
          hadoop.hdfs.security.TestDelegationTokenForProxyUser
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.tools.TestDFSAdminWithHA
          hadoop.hdfs.server.datanode.TestBlockReplacement
          hadoop.hdfs.qjournal.client.TestQuorumJournalManager
          hadoop.hdfs.TestEncryptionZones
          hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider
          hadoop.hdfs.server.datanode.TestDirectoryScanner



        Subsystem Report/Notes
        Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-16
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12772511/HDFS-9358.002.patch
        JIRA Issue HDFS-9358
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3d11f23d847f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-fa12328/precommit/personality/hadoop.sh
        git revision trunk / 855d529
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13515/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 227MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13515/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s docker + precommit patch detected. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 4m 7s trunk passed +1 compile 1m 5s trunk passed with JDK v1.8.0_66 +1 compile 0m 50s trunk passed with JDK v1.7.0_79 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 2m 28s trunk passed +1 javadoc 2m 10s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 54s trunk passed with JDK v1.7.0_79 +1 mvninstall 1m 2s the patch passed +1 compile 0m 48s the patch passed with JDK v1.8.0_66 +1 javac 0m 48s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_79 +1 javac 0m 38s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 23s the patch passed +1 javadoc 1m 24s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 23s the patch passed with JDK v1.7.0_79 -1 unit 93m 47s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 86m 24s hadoop-hdfs in the patch failed with JDK v1.7.0_79. -1 asflicense 0m 30s Patch generated 56 ASF License warnings. 209m 36s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.TestPersistBlocks   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180   hadoop.hdfs.TestDataTransferKeepalive   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.tools.TestDFSAdminWithHA   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-16 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12772511/HDFS-9358.002.patch JIRA Issue HDFS-9358 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3d11f23d847f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-fa12328/precommit/personality/hadoop.sh git revision trunk / 855d529 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13515/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13515/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 227MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13515/console This message was automatically generated.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for the comment, Walter Su.

        1. We can set heartBeat interval to 1s to shorten running time.

        Shortening heartbeat interval did not make significant difference but shortening replication interval did. I set shorter intervals for the both, anyway.

        So I think we can disable block invalidation by setting large delay to make it non-transient, then the test is more stable.

        Sure. I think that is better because we can get rid of busy loop checking test condition to make test easier to debug.

        I attached 002 based on your suggestions. It did not fail in 100 runs.

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for the comment, Walter Su . 1. We can set heartBeat interval to 1s to shorten running time. Shortening heartbeat interval did not make significant difference but shortening replication interval did. I set shorter intervals for the both, anyway. So I think we can disable block invalidation by setting large delay to make it non-transient, then the test is more stable. Sure. I think that is better because we can get rid of busy loop checking test condition to make test easier to debug. I attached 002 based on your suggestions. It did not fail in 100 runs.
        Hide
        walter.k.su Walter Su added a comment -

        1. We can set heartBeat interval to 1s to shorten running time.

        2. I think the 001 can patch solve the posted issue. Firstly thanks for that. However I think the race condition still exists?

        125       cluster.restartDataNode(dnprop);
        126       cluster.waitActive();
        127 
        128       // check if excessive replica is detected (transient)
        129       initializeTimeout(TIMEOUT);
        130       while (countNodes(block.getLocalBlock(), namesystem).excessReplicas() != 2) {
        131         checkTimeout("excess replica count not equal to 2");
        132       }
        

        The old code expects 2 excessReplicas. The 001 patch expects 1 excessReplicas. No matter how many excessReplicas we want, as you can see from the comment, the state is "transient". What if the state vanished before line 130? It's unlikely I know but the jenkins machine is under heavy load, who knows?

        So I think we can disable block invalidation by setting large delay to make it non-transient, then the test is more stable. Check InvalidateBlocks.getInvalidationDelay(). Then we solved the issue and the test logic changes in 001 patch is not required. How do you think?

        Show
        walter.k.su Walter Su added a comment - 1. We can set heartBeat interval to 1s to shorten running time. 2. I think the 001 can patch solve the posted issue. Firstly thanks for that. However I think the race condition still exists? 125 cluster.restartDataNode(dnprop); 126 cluster.waitActive(); 127 128 // check if excessive replica is detected ( transient ) 129 initializeTimeout(TIMEOUT); 130 while (countNodes(block.getLocalBlock(), namesystem).excessReplicas() != 2) { 131 checkTimeout( "excess replica count not equal to 2" ); 132 } The old code expects 2 excessReplicas. The 001 patch expects 1 excessReplicas. No matter how many excessReplicas we want, as you can see from the comment, the state is "transient". What if the state vanished before line 130? It's unlikely I know but the jenkins machine is under heavy load, who knows? So I think we can disable block invalidation by setting large delay to make it non-transient, then the test is more stable. Check InvalidateBlocks.getInvalidationDelay() . Then we solved the issue and the test logic changes in 001 patch is not required. How do you think?
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for the confirmation, Wei-Chiu Chuang!

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for the confirmation, Wei-Chiu Chuang !
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Masatake Iwasaki Thanks for the patch.
        I looked at the patch, and what it does is follows:
        After NN detects DN is down, wait until the excess replica is invalidated, before restarting the stopped DN again.

        After the DN is restarted, make sure the excessive replica is detected.

        So the process is deterministic and will always go like (granted no timeout)

        (live, excess): (3, 1) -> (3, 0) -> (2, 1)
        

        I don't have the committership, but looks good to me. I ran the patched test and it did not fail in 100 runs.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Masatake Iwasaki Thanks for the patch. I looked at the patch, and what it does is follows: After NN detects DN is down, wait until the excess replica is invalidated, before restarting the stopped DN again. After the DN is restarted, make sure the excessive replica is detected. So the process is deterministic and will always go like (granted no timeout) (live, excess): (3, 1) -> (3, 0) -> (2, 1) I don't have the committership, but looks good to me. I ran the patched test and it did not fail in 100 runs.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s docker + precommit patch detected.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 3m 34s trunk passed
        +1 compile 0m 49s trunk passed with JDK v1.8.0_66
        +1 compile 0m 40s trunk passed with JDK v1.7.0_79
        +1 checkstyle 0m 19s trunk passed
        +1 mvnsite 0m 57s trunk passed
        +1 mvneclipse 0m 19s trunk passed
        +1 findbugs 2m 20s trunk passed
        +1 javadoc 1m 22s trunk passed with JDK v1.8.0_66
        +1 javadoc 2m 13s trunk passed with JDK v1.7.0_79
        +1 mvninstall 0m 46s the patch passed
        +1 compile 0m 40s the patch passed with JDK v1.8.0_66
        +1 javac 0m 40s the patch passed
        +1 compile 0m 37s the patch passed with JDK v1.7.0_79
        +1 javac 0m 37s the patch passed
        +1 checkstyle 0m 18s the patch passed
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 16s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 22s the patch passed
        +1 javadoc 1m 23s the patch passed with JDK v1.8.0_66
        +1 javadoc 2m 9s the patch passed with JDK v1.7.0_79
        -1 unit 88m 24s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        -1 unit 75m 36s hadoop-hdfs in the patch failed with JDK v1.7.0_79.
        -1 asflicense 0m 26s Patch generated 56 ASF License warnings.
        189m 55s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
          hadoop.hdfs.server.namenode.ha.TestEditLogTailer
          hadoop.hdfs.shortcircuit.TestShortCircuitCache
          hadoop.hdfs.server.datanode.TestDataNodeMetrics
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.datanode.TestBlockReplacement
          hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
          hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot
          hadoop.fs.TestSymlinkHdfsFileContext
          hadoop.hdfs.TestAclsEndToEnd
          hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
          hadoop.hdfs.server.datanode.TestDirectoryScanner
        JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040
          hadoop.hdfs.security.TestDelegationTokenForProxyUser
          hadoop.hdfs.TestCrcCorruption
          hadoop.hdfs.server.namenode.TestBackupNode
          hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130



        Subsystem Report/Notes
        Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-12
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12771966/HDFS-9358.001.patch
        JIRA Issue HDFS-9358
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 4503ef9408cc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/patchprocess/apache-yetus-fa12328/precommit/personality/hadoop.sh
        git revision trunk / 9ad708a
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13484/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 229MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13484/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s docker + precommit patch detected. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 34s trunk passed +1 compile 0m 49s trunk passed with JDK v1.8.0_66 +1 compile 0m 40s trunk passed with JDK v1.7.0_79 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 2m 20s trunk passed +1 javadoc 1m 22s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 13s trunk passed with JDK v1.7.0_79 +1 mvninstall 0m 46s the patch passed +1 compile 0m 40s the patch passed with JDK v1.8.0_66 +1 javac 0m 40s the patch passed +1 compile 0m 37s the patch passed with JDK v1.7.0_79 +1 javac 0m 37s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 22s the patch passed +1 javadoc 1m 23s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 9s the patch passed with JDK v1.7.0_79 -1 unit 88m 24s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 75m 36s hadoop-hdfs in the patch failed with JDK v1.7.0_79. -1 asflicense 0m 26s Patch generated 56 ASF License warnings. 189m 55s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot   hadoop.fs.TestSymlinkHdfsFileContext   hadoop.hdfs.TestAclsEndToEnd   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestCrcCorruption   hadoop.hdfs.server.namenode.TestBackupNode   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 Subsystem Report/Notes Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-12 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12771966/HDFS-9358.001.patch JIRA Issue HDFS-9358 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4503ef9408cc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build@2/patchprocess/apache-yetus-fa12328/precommit/personality/hadoop.sh git revision trunk / 9ad708a findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13484/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13484/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 229MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13484/console This message was automatically generated.
        Hide
        iwasakims Masatake Iwasaki added a comment -

        Thanks for reporting this, Wei-Chiu Chuang.

        The testNodeCount expects number of excess replica to be increased to 2 by excessReplicateMap. (live, excess) could be changed in the case as

          (live, excess): (3, 1) -> (2, 2)
        

        If invalidation of existing excess replica is executed before excessReplicateMap is updated, number of excess replica never be 2.

          (live, excess): (3, 1) -> (3, 0) -> (2, 1)
        

        Attached 001 fix the test to wait for invalidation of the 1st excess replica then check the 2nd excess replica is detected.

        Show
        iwasakims Masatake Iwasaki added a comment - Thanks for reporting this, Wei-Chiu Chuang . The testNodeCount expects number of excess replica to be increased to 2 by excessReplicateMap. (live, excess) could be changed in the case as (live, excess): (3, 1) -> (2, 2) If invalidation of existing excess replica is executed before excessReplicateMap is updated, number of excess replica never be 2. (live, excess): (3, 1) -> (3, 0) -> (2, 1) Attached 001 fix the test to wait for invalidation of the 1st excess replica then check the 2nd excess replica is detected.

          People

          • Assignee:
            iwasakims Masatake Iwasaki
            Reporter:
            jojochuang Wei-Chiu Chuang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development