Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-675

File not being replicated, even when #of DNs >0

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      One of my tests is now failing, possibly a race condition:
      java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster)

        Activity

        Hide
        Steve Loughran added a comment -

        I have a test that tries to bring up a pseudo-distributed cluster in different JVMs, it is failing on my desk being unable to replicate the data. Normally I'd blame that on no worker nodes being around
        -But there is a datanode live. What could cause the NN to decide not to allocate the file to the live DN?

        [sf-system-test-junit] Termination Record: HOST morzine.hpl.hp.com:rootProcess:testDistributedCluster.sf:tests:touch,  type: abnormal,  description: Worker thread failed
        [sf-system-test-junit] RemoteException:: java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1178)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:702)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
        [sf-system-test-junit] 	at java.security.AccessController.doPrivileged(Native Method)
        [sf-system-test-junit] 	at javax.security.auth.Subject.doAs(Subject.java:396)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)
        [sf-system-test-junit] , SmartFrog 3.17.015dev (2009-10-01 16:08:54 BST)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Client.call(Client.java:766)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223)
        [sf-system-test-junit] 	at $Proxy1.addBlock(Unknown Source)
        [sf-system-test-junit] 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        [sf-system-test-junit] 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        [sf-system-test-junit] 	at $Proxy1.addBlock(Unknown Source)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.locateFollowingBlock(DFSClient.java:2909)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:2789)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2407)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRunOrSkip(DeployingTestBase.java:441)
        
        Show
        Steve Loughran added a comment - I have a test that tries to bring up a pseudo-distributed cluster in different JVMs, it is failing on my desk being unable to replicate the data. Normally I'd blame that on no worker nodes being around -But there is a datanode live. What could cause the NN to decide not to allocate the file to the live DN? [sf-system-test-junit] Termination Record: HOST morzine.hpl.hp.com:rootProcess:testDistributedCluster.sf:tests:touch, type: abnormal, description: Worker thread failed [sf-system-test-junit] RemoteException:: java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster) [sf-system-test-junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1178) [sf-system-test-junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:702) [sf-system-test-junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960) [sf-system-test-junit] at java.security.AccessController.doPrivileged(Native Method) [sf-system-test-junit] at javax.security.auth.Subject.doAs(Subject.java:396) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958) [sf-system-test-junit] , SmartFrog 3.17.015dev (2009-10-01 16:08:54 BST) [sf-system-test-junit] at org.apache.hadoop.ipc.Client.call(Client.java:766) [sf-system-test-junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223) [sf-system-test-junit] at $Proxy1.addBlock(Unknown Source) [sf-system-test-junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) [sf-system-test-junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) [sf-system-test-junit] at $Proxy1.addBlock(Unknown Source) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.locateFollowingBlock(DFSClient.java:2909) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:2789) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2407) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRunOrSkip(DeployingTestBase.java:441)
        Hide
        Eli Collins added a comment -

        Maybe you ran out of space. Do you see any messages of the form "Node xxx is not chosen because yyy" in the namenode log?

        Show
        Eli Collins added a comment - Maybe you ran out of space. Do you see any messages of the form "Node xxx is not chosen because yyy" in the namenode log?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I believe this is no longer a problem. Resolving...

        Show
        Tsz Wo Nicholas Sze added a comment - I believe this is no longer a problem. Resolving...

          People

          • Assignee:
            Unassigned
            Reporter:
            Steve Loughran
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development