Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4067

TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: test
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root cause of the failure is ReplicaAlreadyExistsException:

      org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 already exists in state FINALIZED and thus cannot be created.
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
      	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:155)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
      

        Issue Links

          Activity

          Hide
          jingzhao Jing Zhao added a comment -

          Move the discussion from HDFS-4061 here:

          When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to work and schedule the replication to D1. So the invalidation and replication request will be sent to D1 at the same time. D1 will then ignore the replication request (also throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the timeout of PendingReplication request (usually 5~10 min).

          Show
          jingzhao Jing Zhao added a comment - Move the discussion from HDFS-4061 here: When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork also starts to work and schedule the replication to D1. So the invalidation and replication request will be sent to D1 at the same time. D1 will then ignore the replication request (also throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the timeout of PendingReplication request (usually 5~10 min).
          Hide
          jingzhao Jing Zhao added a comment -

          And I guess that's also the reason for HDFS-342? Since the initial replication request is ignored, the replication on D1 can only be done after the pending replication timeout.

          Show
          jingzhao Jing Zhao added a comment - And I guess that's also the reason for HDFS-342 ? Since the initial replication request is ignored, the replication on D1 can only be done after the pending replication timeout.
          Hide
          jingzhao Jing Zhao added a comment -

          Initial patch to fix.

          Show
          jingzhao Jing Zhao added a comment - Initial patch to fix.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550126/HDFS-4067.trunk.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.web.TestWebHDFS

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550126/HDFS-4067.trunk.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//console This message is automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          testcase failure reported in HDFS-3948 before. Will run TestUnderReplicatedBlocks in loop later.

          Show
          jingzhao Jing Zhao added a comment - testcase failure reported in HDFS-3948 before. Will run TestUnderReplicatedBlocks in loop later.
          Hide
          jingzhao Jing Zhao added a comment -

          Run the testcase ~800 times and all of them passed.

          Show
          jingzhao Jing Zhao added a comment - Run the testcase ~800 times and all of them passed.
          Hide
          sureshms Suresh Srinivas added a comment -

          +1 for the patch.

          Show
          sureshms Suresh Srinivas added a comment - +1 for the patch.
          Hide
          sureshms Suresh Srinivas added a comment -

          I committed the patch. Thank you Jing.

          Show
          sureshms Suresh Srinivas added a comment - I committed the patch. Thank you Jing.
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2927 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2927/)
          HDFS-4067. TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #2927 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2927/ ) HDFS-4067 . TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #17 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/17/)
          HDFS-4067. TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #17 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/17/ ) HDFS-4067 . TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1207 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1207/)
          HDFS-4067. TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1207 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1207/ ) HDFS-4067 . TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1237/)
          HDFS-4067. TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1237/ ) HDFS-4067 . TestUnderReplicatedBlocks intermittently fails due to ReplicaAlreadyExistsException. Contributed by Jing Zhao. (Revision 1402261) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402261 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestUnderReplicatedBlocks.java

            People

            • Assignee:
              jingzhao Jing Zhao
              Reporter:
              eli Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development