Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3179

Improve the error message: DataStreamer throw an exception, "nodes.length != original.length + 1" on single datanode cluster

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.2
    • Fix Version/s: 2.0.0-alpha
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Create a single datanode cluster

      disable permissions
      enable webhfds
      start hdfs
      run the test script

      expected result:
      a file named "test" is created and the content is "testtest"

      the result I got:
      hdfs throw an exception on the second append operation.

      ./test.sh 
      {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]"}}
      

      Log in datanode:

      2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
      java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
      2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /test
      java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
      

      test.sh

      #!/bin/sh
      
      echo "test" > test.txt
      
      curl -L -X PUT "http://localhost:50070/webhdfs/v1/test?op=CREATE"
      
      curl -L -X POST -T test.txt "http://localhost:50070/webhdfs/v1/test?op=APPEND"
      curl -L -X POST -T test.txt "http://localhost:50070/webhdfs/v1/test?op=APPEND"
      
      
      1. h3179_20120403.patch
        4 kB
        Tsz Wo Nicholas Sze

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1047 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1047/)
        HDFS-3179. Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1047 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1047/ ) HDFS-3179 . Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1012 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1012/)
        HDFS-3179. Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892)

        Result = FAILURE
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1012 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1012/ ) HDFS-3179 . Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2067 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2067/)
        HDFS-3179. Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892)

        Result = ABORTED
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2067 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2067/ ) HDFS-3179 . Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2054 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2054/)
        HDFS-3179. Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2054 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2054/ ) HDFS-3179 . Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2128 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2128/)
        HDFS-3179. Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2128 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2128/ ) HDFS-3179 . Improve the exception message thrown by DataStreamer when it failed to add a datanode. (Revision 1324892) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324892 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplaceDatanodeOnFailure.java
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Uma, thanks for the review.

        I have committed this.

        Show
        Tsz Wo Nicholas Sze added a comment - Uma, thanks for the review. I have committed this.
        Hide
        Uma Maheswara Rao G added a comment -

        +1 looks good to me.
        Will commit this patch today.

        Show
        Uma Maheswara Rao G added a comment - +1 looks good to me. Will commit this patch today.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Zhanwei, does the updated error message looks good to you?

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Zhanwei, does the updated error message looks good to you?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12521272/h3179_20120403.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2177//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2177//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521272/h3179_20120403.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2177//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2177//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h3179_20120403.patch:

        • updates the error message as below;
        • adds Zhanwei's test.

        2012-04-03 17:59:07,624 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(586)) - Failed to close file /TestReplaceDatanodeOnFailure/testAppend
        java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[127.0.0.1:51791], original=[127.0.0.1:51791])
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

        Show
        Tsz Wo Nicholas Sze added a comment - h3179_20120403.patch: updates the error message as below; adds Zhanwei's test. 2012-04-03 17:59:07,624 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(586)) - Failed to close file /TestReplaceDatanodeOnFailure/testAppend java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current= [127.0.0.1:51791] , original= [127.0.0.1:51791] ) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > ..., the appended data after the first successful append is in danger ...

        You are right but it is the same as creating a new file. We should not make any change unless we also want to change the behavior of create(..).

        > ... And make the error message more friendly instead of "nodes.length != original.length + 1".

        Agree. I will change the error message.

        Show
        Tsz Wo Nicholas Sze added a comment - > ..., the appended data after the first successful append is in danger ... You are right but it is the same as creating a new file. We should not make any change unless we also want to change the behavior of create(..). > ... And make the error message more friendly instead of "nodes.length != original.length + 1". Agree. I will change the error message.
        Hide
        Zhanwei Wang added a comment -

        I totally agree with you about "the problem one datanode with replication 3",I think this kind of operation should fail or at least get a warning.

        My opinion is that, the purpose of "the policy check" is to make sure no potential data lose, in this "one datanode 3 replica" case, although the first append failure will not cause the data lose, the appended data after the first successful append is in danger because there is only one replica which is not the user expected 3. And there is no warning to tell the user the truth.

        My suggestion is to make the first write to the empty file fail if there is not enough datanode, in another word, make the policy check more strictly. And make the error message more friendly instead of "nodes.length != original.length + 1".

        Show
        Zhanwei Wang added a comment - I totally agree with you about "the problem one datanode with replication 3",I think this kind of operation should fail or at least get a warning. My opinion is that, the purpose of "the policy check" is to make sure no potential data lose, in this "one datanode 3 replica" case, although the first append failure will not cause the data lose, the appended data after the first successful append is in danger because there is only one replica which is not the user expected 3. And there is no warning to tell the user the truth. My suggestion is to make the first write to the empty file fail if there is not enough datanode, in another word, make the policy check more strictly. And make the error message more friendly instead of "nodes.length != original.length + 1".
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I think the problem is "one datanode with replication 3". What should be the user expectation? It seems that users won't be happy if we do not allow append. However, if we allow appending to a single replica and the replica become corrupted, then it is possible to have data loss - I can imagine in some extreme cases that a user is appending to a single replica slowly, admin add more datanodes later on but the block won't be replicated since the file is not closed, and then the datanode with the single replica fails. Is this case acceptable to you?

        > So from the view of user, the first append succeed while the second fail, is that a good idea?

        The distinction is whether there is pre-append data. There are pre-append data in the replica in the second append. The pre-append data was in a closed file and if the datanode fails during append, it could have data loss. However, in the first append, there is no pre-append data. If the append fails and the new replica is lost, it is a sort of okay since only the new data is lost.

        The add-datanode feature of is to prevent data loss on pre-append data. Users (or admin) could turn it off as mentioned in HDFS-3091. I think we may improve the error message. Is it good enough? Or any suggestion?

        Show
        Tsz Wo Nicholas Sze added a comment - I think the problem is "one datanode with replication 3". What should be the user expectation? It seems that users won't be happy if we do not allow append. However, if we allow appending to a single replica and the replica become corrupted, then it is possible to have data loss - I can imagine in some extreme cases that a user is appending to a single replica slowly, admin add more datanodes later on but the block won't be replicated since the file is not closed, and then the datanode with the single replica fails. Is this case acceptable to you? > So from the view of user, the first append succeed while the second fail, is that a good idea? The distinction is whether there is pre-append data. There are pre-append data in the replica in the second append. The pre-append data was in a closed file and if the datanode fails during append, it could have data loss. However, in the first append, there is no pre-append data. If the append fails and the new replica is lost, it is a sort of okay since only the new data is lost. The add-datanode feature of is to prevent data loss on pre-append data. Users (or admin) could turn it off as mentioned in HDFS-3091 . I think we may improve the error message. Is it good enough? Or any suggestion?
        Hide
        Zhanwei Wang added a comment -

        @Uma and amith
        Another question, in this test script, I first create a new EMPTY file and append to the file twice.
        The first append succeed because file is empty, to create a pipeline, the "stage" is PIPELINE_SETUP_CREATE and the policy will not be checked.
        The second append failed because the "stage" is PIPELINE_SETPU_APPEND and the policy will be checked.

        So from the view of user, the first append succeed while the second fail, is that a good idea?

                  // get new block from namenode
                  if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
                    if(DFSClient.LOG.isDebugEnabled()) {
                      DFSClient.LOG.debug("Allocating new block");
                    }
                    nodes = nextBlockOutputStream(src);
                    initDataStreaming();
                  } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) {
                    if(DFSClient.LOG.isDebugEnabled()) {
                      DFSClient.LOG.debug("Append to block " + block);
                    }
                    setupPipelineForAppendOrRecovery();  //check the policy here
                    initDataStreaming();
                  }
        
        Show
        Zhanwei Wang added a comment - @Uma and amith Another question, in this test script, I first create a new EMPTY file and append to the file twice. The first append succeed because file is empty, to create a pipeline, the "stage" is PIPELINE_SETUP_CREATE and the policy will not be checked. The second append failed because the "stage" is PIPELINE_SETPU_APPEND and the policy will be checked. So from the view of user, the first append succeed while the second fail, is that a good idea? // get new block from namenode if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) { if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug( "Allocating new block" ); } nodes = nextBlockOutputStream(src); initDataStreaming(); } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) { if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug( "Append to block " + block); } setupPipelineForAppendOrRecovery(); //check the policy here initDataStreaming(); }
        Hide
        Zhanwei Wang added a comment -

        @Uma and amith
        It seems the same question with HDFS-3091.

        I configure only one datanode and create a file using default number of replica(3),
        existings(1) <= replication/2(3/2==1) will be satisfied and it can not replace with the new node as there is no extra nodes exist in the cluster.

        HDFS-3091 should patch to 0.23.2 branch

        Show
        Zhanwei Wang added a comment - @Uma and amith It seems the same question with HDFS-3091 . I configure only one datanode and create a file using default number of replica(3), existings(1) <= replication/2(3/2==1) will be satisfied and it can not replace with the new node as there is no extra nodes exist in the cluster. HDFS-3091 should patch to 0.23.2 branch
        Hide
        Uma Maheswara Rao G added a comment -

        @Zhanwei, How many DNs are running in your test cluster?

        Show
        Uma Maheswara Rao G added a comment - @Zhanwei, How many DNs are running in your test cluster?
        Hide
        amith added a comment -

        Hi Zhanwei Wang

        I exactly dont know about your test script does, but this look similar to HDFS-3091.

        can u check this once
        https://issues.apache.org/jira/browse/HDFS-3091

        Please correct me If I am wrong

        Show
        amith added a comment - Hi Zhanwei Wang I exactly dont know about your test script does, but this look similar to HDFS-3091 . can u check this once https://issues.apache.org/jira/browse/HDFS-3091 Please correct me If I am wrong

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Zhanwei Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development