Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-675

File not being replicated, even when #of DNs >0

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      One of my tests is now failing, possibly a race condition:
      java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster)

        Activity

        Hide
        Tsz Wo Nicholas Sze added a comment -

        I believe this is no longer a problem. Resolving...

        Show
        Tsz Wo Nicholas Sze added a comment - I believe this is no longer a problem. Resolving...
        Hide
        Eli Collins added a comment -

        Maybe you ran out of space. Do you see any messages of the form "Node xxx is not chosen because yyy" in the namenode log?

        Show
        Eli Collins added a comment - Maybe you ran out of space. Do you see any messages of the form "Node xxx is not chosen because yyy" in the namenode log?
        Hide
        Steve Loughran added a comment -

        I have a test that tries to bring up a pseudo-distributed cluster in different JVMs, it is failing on my desk being unable to replicate the data. Normally I'd blame that on no worker nodes being around
        -But there is a datanode live. What could cause the NN to decide not to allocate the file to the live DN?

        [sf-system-test-junit] Termination Record: HOST morzine.hpl.hp.com:rootProcess:testDistributedCluster.sf:tests:touch,  type: abnormal,  description: Worker thread failed
        [sf-system-test-junit] RemoteException:: java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1178)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:702)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
        [sf-system-test-junit] 	at java.security.AccessController.doPrivileged(Native Method)
        [sf-system-test-junit] 	at javax.security.auth.Subject.doAs(Subject.java:396)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)
        [sf-system-test-junit] , SmartFrog 3.17.015dev (2009-10-01 16:08:54 BST)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.Client.call(Client.java:766)
        [sf-system-test-junit] 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223)
        [sf-system-test-junit] 	at $Proxy1.addBlock(Unknown Source)
        [sf-system-test-junit] 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        [sf-system-test-junit] 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        [sf-system-test-junit] 	at $Proxy1.addBlock(Unknown Source)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.locateFollowingBlock(DFSClient.java:2909)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:2789)
        [sf-system-test-junit] 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2407)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340)
        [sf-system-test-junit] 	at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRunOrSkip(DeployingTestBase.java:441)
        
        Show
        Steve Loughran added a comment - I have a test that tries to bring up a pseudo-distributed cluster in different JVMs, it is failing on my desk being unable to replicate the data. Normally I'd blame that on no worker nodes being around -But there is a datanode live. What could cause the NN to decide not to allocate the file to the live DN? [sf-system-test-junit] Termination Record: HOST morzine.hpl.hp.com:rootProcess:testDistributedCluster.sf:tests:touch, type: abnormal, description: Worker thread failed [sf-system-test-junit] RemoteException:: java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are currently 1 live data nodes in the cluster) [sf-system-test-junit] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1178) [sf-system-test-junit] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:702) [sf-system-test-junit] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960) [sf-system-test-junit] at java.security.AccessController.doPrivileged(Native Method) [sf-system-test-junit] at javax.security.auth.Subject.doAs(Subject.java:396) [sf-system-test-junit] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958) [sf-system-test-junit] , SmartFrog 3.17.015dev (2009-10-01 16:08:54 BST) [sf-system-test-junit] at org.apache.hadoop.ipc.Client.call(Client.java:766) [sf-system-test-junit] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223) [sf-system-test-junit] at $Proxy1.addBlock(Unknown Source) [sf-system-test-junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) [sf-system-test-junit] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) [sf-system-test-junit] at $Proxy1.addBlock(Unknown Source) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.locateFollowingBlock(DFSClient.java:2909) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:2789) [sf-system-test-junit] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2407) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.completeTestDeployment(DeployingTestBase.java:317) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.runTestsToCompletion(DeployingTestBase.java:340) [sf-system-test-junit] at org.smartfrog.test.DeployingTestBase.expectSuccessfulTestRunOrSkip(DeployingTestBase.java:441)

          People

          • Assignee:
            Unassigned
            Reporter:
            Steve Loughran
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development