Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10748

TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This was fixed by HDFS-7886. But some recent Jenkins Results started seeing this again:

      Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
      testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)  Time elapsed: 43.861 sec  <<< ERROR!
      java.util.concurrent.TimeoutException: Timed out waiting for /test/testTruncateWithDataNodesRestart to reach 3 replicas
      	at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751)
      	at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704)
      
      1. HDFS-10748.002.patch
        1.0 kB
        Yiqun Lin
      2. HDFS-10748.001.patch
        0.9 kB
        Yiqun Lin

        Issue Links

          Activity

          Hide
          linyiqun Yiqun Lin added a comment -

          Thanks Xiaoyu Yao for the review and commit!

          Show
          linyiqun Yiqun Lin added a comment - Thanks Xiaoyu Yao for the review and commit!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10346 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10346/)
          HDFS-10748. TestFileTruncate#testTruncateWithDataNodesRestart runs (xyao: rev 4da5000dd33cf013e7212848ed2c44f1e60e860e)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10346 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10346/ ) HDFS-10748 . TestFileTruncate#testTruncateWithDataNodesRestart runs (xyao: rev 4da5000dd33cf013e7212848ed2c44f1e60e860e) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
          Hide
          xyao Xiaoyu Yao added a comment -

          Thanks Yiqun Lin for the contribution. I've commit the fix to trunk, branch-2 and branch-2.8.

          Show
          xyao Xiaoyu Yao added a comment - Thanks Yiqun Lin for the contribution. I've commit the fix to trunk, branch-2 and branch-2.8.
          Hide
          xyao Xiaoyu Yao added a comment -

          Thanks Yiqun Lin for working on this. The patch v02 LGTM, +1. I will commit it shortly.

          Show
          xyao Xiaoyu Yao added a comment - Thanks Yiqun Lin for working on this. The patch v02 LGTM, +1. I will commit it shortly.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 15s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 48s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 48s the patch passed
          +1 javac 0m 48s the patch passed
          +1 checkstyle 0m 24s the patch passed
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 53s the patch passed
          +1 javadoc 0m 52s the patch passed
          -1 unit 59m 30s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          79m 20s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825399/HDFS-10748.002.patch
          JIRA Issue HDFS-10748
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux dfc6c52fc785 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 5a6fc5f
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16534/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16534/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 15s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 48s the patch passed +1 javac 0m 48s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 53s the patch passed +1 javadoc 0m 52s the patch passed -1 unit 59m 30s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 79m 20s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825399/HDFS-10748.002.patch JIRA Issue HDFS-10748 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux dfc6c52fc785 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5a6fc5f Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16534/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16534/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          linyiqun Yiqun Lin added a comment -

          I found this issue is very similar to HDFS-8729, and there is already a complete analysis and corresponding patch.(See link:https://issues.apache.org/jira/browse/HDFS-8729?focusedCommentId=14619999&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619999).

          Since Xuebin Zhang asked me the current status of this jira offline, softly ping Akira Ajisaka, could you take a look for this if you have a time, thanks in advance.

          Finally attach a new patch to add the sleep time as the comment that suggested in HDFS-8729.

          Show
          linyiqun Yiqun Lin added a comment - I found this issue is very similar to HDFS-8729 , and there is already a complete analysis and corresponding patch.(See link: https://issues.apache.org/jira/browse/HDFS-8729?focusedCommentId=14619999&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619999 ). Since Xuebin Zhang asked me the current status of this jira offline, softly ping Akira Ajisaka , could you take a look for this if you have a time, thanks in advance. Finally attach a new patch to add the sleep time as the comment that suggested in HDFS-8729 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 22s trunk passed
          +1 compile 0m 50s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 0m 57s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 45s trunk passed
          +1 javadoc 0m 58s trunk passed
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          +1 checkstyle 0m 22s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 48s the patch passed
          +1 javadoc 0m 59s the patch passed
          -1 unit 89m 35s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 23s The patch does not generate ASF License warnings.
          110m 2s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
            org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824060/HDFS-10748.001.patch
          JIRA Issue HDFS-10748
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux b3fd1e5801d1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2353271
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16446/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16446/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16446/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 22s trunk passed +1 compile 0m 50s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 45s trunk passed +1 javadoc 0m 58s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 59s the patch passed -1 unit 89m 35s hadoop-hdfs in the patch failed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 110m 2s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824060/HDFS-10748.001.patch JIRA Issue HDFS-10748 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b3fd1e5801d1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2353271 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16446/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16446/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16446/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          linyiqun Yiqun Lin added a comment - - edited

          Thanks Xiaoyu Yao for reporting this issue.
          It seemed HDFS-7886 was not completely fix this issue. Can see the comment in HDFS-7930(https://issues.apache.org/jira/browse/HDFS-7930?focusedCommentId=14368053&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368053).

          Although this will not fix the testTruncateWithDataNodesRestart() completely. The location is correctly invalidated on the NN, but then NN postpones invalidation on the DN and waits for the next report.
          ...
          If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN.

          I think the main problem is that the block report is not completely sended to the namenode after block recovery, then lead the cluster wait for the replication.

          I tested testTruncateWithDataNodesRestart in my local env, it will fails one time when I runs that test 3~5 times. But when I try the way as the comment mentioned, the result are all passed. I think the operation triggerBlockReports() would be make sense to this jira.

          Attach a simple patch for this.

          Show
          linyiqun Yiqun Lin added a comment - - edited Thanks Xiaoyu Yao for reporting this issue. It seemed HDFS-7886 was not completely fix this issue. Can see the comment in HDFS-7930 ( https://issues.apache.org/jira/browse/HDFS-7930?focusedCommentId=14368053&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368053 ). Although this will not fix the testTruncateWithDataNodesRestart() completely. The location is correctly invalidated on the NN, but then NN postpones invalidation on the DN and waits for the next report. ... If I add triggerBlockReports() before waitReplication() then the test passes, as it finally triggers deletion of the replica on the DN. I think the main problem is that the block report is not completely sended to the namenode after block recovery, then lead the cluster wait for the replication. I tested testTruncateWithDataNodesRestart in my local env, it will fails one time when I runs that test 3~5 times. But when I try the way as the comment mentioned, the result are all passed. I think the operation triggerBlockReports() would be make sense to this jira. Attach a simple patch for this.

            People

            • Assignee:
              linyiqun Yiqun Lin
              Reporter:
              xyao Xiaoyu Yao
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development