Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9626

TestBlockReplacement#testBlockReplacement fails occasionally

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      testBlockPlacement sometimes fail in test case 4 in checkBlocks. I'll post the detailed error in comment.

      Thanks Wei-Chiu Chuang for helping identify the issue.

        Activity

        Hide
        xiaochen Xiao Chen added a comment -

        Error Message

        Did not achieve expected replication to expected nodes after more than 20000 msec. See logs for details.
        

        Stacktrace

        java.util.concurrent.TimeoutException: Did not achieve expected replication to expected nodes after more than
        20000 msec. See logs for details.
        at
        org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.checkBlocks(TestBlockReplacement.java:296)
        at
        org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testBlockReplacement(TestBlockReplacement.java:202)
        at
        org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testBlockReplacement(TestBlockReplacement.java:99)
        

        And the most interested logs are:

        2016-01-07 11:01:08,434 [main] INFO  hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(301)) - Block is not located at 127.0.0.1:61050
        2016-01-07 11:01:08,435 [main] INFO  hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(313)) - Expected replica nodes are: 127.0.0.1:61045, 127.0.0.1:61050, 
        2016-01-07 11:01:08,435 [main] INFO  hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(314)) - Current actual replica nodes are: DatanodeInfoWithStorage[127.0.0.1:61045,DS-11926750-8c6a-4a6a-bd72-6e6adb5ef153,DISK], DatanodeInfoWithStorage[127.0.0.1:61062,DS-b67e96ac-b53d-4540-825a-4e0d0a412c35,DISK], DatanodeInfoWithStorage[127.0.0.1:61054,DS-18ab56c7-16f3-46a5-8369-5596f385c73b,DISK], 
        
        Show
        xiaochen Xiao Chen added a comment - Error Message Did not achieve expected replication to expected nodes after more than 20000 msec. See logs for details. Stacktrace java.util.concurrent.TimeoutException: Did not achieve expected replication to expected nodes after more than 20000 msec. See logs for details. at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.checkBlocks(TestBlockReplacement.java:296) at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testBlockReplacement(TestBlockReplacement.java:202) at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testBlockReplacement(TestBlockReplacement.java:99) And the most interested logs are: 2016-01-07 11:01:08,434 [main] INFO hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(301)) - Block is not located at 127.0.0.1:61050 2016-01-07 11:01:08,435 [main] INFO hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(313)) - Expected replica nodes are: 127.0.0.1:61045, 127.0.0.1:61050, 2016-01-07 11:01:08,435 [main] INFO hdfs.TestBlockReplacement (TestBlockReplacement.java:checkBlocks(314)) - Current actual replica nodes are: DatanodeInfoWithStorage[127.0.0.1:61045,DS-11926750-8c6a-4a6a-bd72-6e6adb5ef153,DISK], DatanodeInfoWithStorage[127.0.0.1:61062,DS-b67e96ac-b53d-4540-825a-4e0d0a412c35,DISK], DatanodeInfoWithStorage[127.0.0.1:61054,DS-18ab56c7-16f3-46a5-8369-5596f385c73b,DISK],
        Hide
        xiaochen Xiao Chen added a comment -

        The failed "Testcase 4" is as follows:
        Initially have blocks on rack0, rack1, rack2, and replicate a new block to newNode on rack2. Then give a invalid delhint, and wait for the extra replica to be deleted.
        The test expects that, when wrong delhint is given, the deletion always happens on rack2 (where 2 replicas are located). However, with the change of HDFS-9314, this is not guaranteed. Instead, the guarantee is after deletion, number of racks is still >=2. (See comments in HDFS-9314 for details.)

        Thus, patch 1 proposes to loose the check in this test, and only to make sure the extra replica is deleted. The number of racks guarantee is already in place in TestReplicationPolicy as part of HDFS-9314.

        Show
        xiaochen Xiao Chen added a comment - The failed "Testcase 4" is as follows: Initially have blocks on rack0, rack1, rack2, and replicate a new block to newNode on rack2. Then give a invalid delhint, and wait for the extra replica to be deleted. The test expects that, when wrong delhint is given, the deletion always happens on rack2 (where 2 replicas are located). However, with the change of HDFS-9314 , this is not guaranteed. Instead, the guarantee is after deletion, number of racks is still >=2. (See comments in HDFS-9314 for details.) Thus, patch 1 proposes to loose the check in this test, and only to make sure the extra replica is deleted. The number of racks guarantee is already in place in TestReplicationPolicy as part of HDFS-9314 .
        Hide
        zhz Zhe Zhang added a comment -

        Thanks for the fix Xiao. Patch LGTM, +1 pending Jenkins.

        Show
        zhz Zhe Zhang added a comment - Thanks for the fix Xiao. Patch LGTM, +1 pending Jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 7s trunk passed
        +1 compile 0m 44s trunk passed with JDK v1.8.0_66
        +1 compile 0m 44s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 56s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 2m 1s trunk passed
        +1 javadoc 1m 12s trunk passed with JDK v1.8.0_66
        +1 javadoc 1m 54s trunk passed with JDK v1.7.0_91
        +1 mvninstall 0m 49s the patch passed
        +1 compile 0m 41s the patch passed with JDK v1.8.0_66
        +1 javac 0m 41s the patch passed
        +1 compile 0m 41s the patch passed with JDK v1.7.0_91
        +1 javac 0m 41s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 52s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 18s the patch passed
        +1 javadoc 1m 14s the patch passed with JDK v1.8.0_66
        +1 javadoc 2m 0s the patch passed with JDK v1.7.0_91
        -1 unit 66m 23s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        -1 unit 58m 59s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
        +1 asflicense 0m 21s Patch does not generate ASF License warnings.
        153m 54s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
          hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
          hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140
          hadoop.hdfs.server.namenode.ha.TestHAAppend
          hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
        JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestParallelShortCircuitReadUnCached
          hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          hadoop.hdfs.server.namenode.TestCacheDirectives



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781048/HDFS-9626.01.patch
        JIRA Issue HDFS-9626
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3205c13f25f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 52b7757
        Default Java 1.7.0_91
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14056/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 76MB
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14056/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 7s trunk passed +1 compile 0m 44s trunk passed with JDK v1.8.0_66 +1 compile 0m 44s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 1m 12s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 54s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 49s the patch passed +1 compile 0m 41s the patch passed with JDK v1.8.0_66 +1 javac 0m 41s the patch passed +1 compile 0m 41s the patch passed with JDK v1.7.0_91 +1 javac 0m 41s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 18s the patch passed +1 javadoc 1m 14s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 0s the patch passed with JDK v1.7.0_91 -1 unit 66m 23s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 58m 59s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 153m 54s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestParallelShortCircuitReadUnCached   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark   hadoop.hdfs.server.namenode.TestCacheDirectives Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781048/HDFS-9626.01.patch JIRA Issue HDFS-9626 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3205c13f25f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 52b7757 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14056/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14056/console This message was automatically generated.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Zhe for the review.
        Failed tests are not touched by this patch.

        Show
        xiaochen Xiao Chen added a comment - Thanks Zhe for the review. Failed tests are not touched by this patch.
        Hide
        xiaochen Xiao Chen added a comment -

        Also, the test failed locally about 1 in 200 runs before the patch. After the patch, it passed 1k runs.

        Show
        xiaochen Xiao Chen added a comment - Also, the test failed locally about 1 in 200 runs before the patch. After the patch, it passed 1k runs.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Xiao for confirming this. I just committed the patch to trunk, branch-2, and branch-2.8.

        Show
        zhz Zhe Zhang added a comment - Thanks Xiao for confirming this. I just committed the patch to trunk, branch-2, and branch-2.8.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9074 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9074/)
        HDFS-9626. TestBlockReplacement#testBlockReplacement fails occasionally. (zhz: rev 0af2022e6d431e746301086980134730d4287cc7)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9074 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9074/ ) HDFS-9626 . TestBlockReplacement#testBlockReplacement fails occasionally. (zhz: rev 0af2022e6d431e746301086980134730d4287cc7) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java
        Hide
        xiaochen Xiao Chen added a comment -

        Thank you Zhe for the review and commit!

        Show
        xiaochen Xiao Chen added a comment - Thank you Zhe for the review and commit!
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9083 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9083/)
        Update CHANGES.txt: move HDFS-9626 and HDFS-9630 to the section of (zhz: rev 71e5982e3970b6cd130ccbe29ca2a1196268aa46)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9083 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9083/ ) Update CHANGES.txt: move HDFS-9626 and HDFS-9630 to the section of (zhz: rev 71e5982e3970b6cd130ccbe29ca2a1196268aa46) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

          People

          • Assignee:
            xiaochen Xiao Chen
            Reporter:
            xiaochen Xiao Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development