Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2691

Some junit tests fail with the exception: All datanodes are bad. Aborting...

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.2
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      Some junit tests fail with the following exception:
      java.io.IOException: All datanodes are bad. Aborting...
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
      The log contains the following message:
      2008-01-19 23:00:25,557 INFO dfs.StateChange (FSNamesystem.java:allocateBlock(1274)) - BLOCK* NameSystem.allocateBlock: /srcdat/three/3189919341591612220. blk_6989304691537873255
      2008-01-19 23:00:25,559 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40678
      2008-01-19 23:00:25,559 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
      2008-01-19 23:00:25,559 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40678
      2008-01-19 23:00:25,570 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) - Receiving block blk_6989304691537873255 from /127.0.0.1
      2008-01-19 23:00:25,572 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) - Receiving block blk_6989304691537873255 from /127.0.0.1
      2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1169)) - Datanode 0 forwarding connect ack to upstream firstbadlink is
      2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1150)) - Datanode 1 got response for connect ack from downstream datanode with firstbadlink as
      2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1169)) - Datanode 1 forwarding connect ack to upstream firstbadlink is
      2008-01-19 23:00:25,574 INFO dfs.DataNode (DataNode.java:lastDataNodeRun(1802)) - Received block blk_6989304691537873255 of size 34 from /127.0.0.1
      2008-01-19 23:00:25,575 INFO dfs.DataNode (DataNode.java:lastDataNodeRun(1819)) - PacketResponder 0 for block blk_6989304691537873255 terminating
      2008-01-19 23:00:25,575 INFO dfs.StateChange (FSNamesystem.java:addStoredBlock(2467)) - BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40680 is added to blk_6989304691537873255 size 34
      2008-01-19 23:00:25,575 INFO dfs.DataNode (DataNode.java:close(2013)) - BlockReceiver for block blk_6989304691537873255 waiting for last write to drain.
      2008-01-19 23:01:31,577 WARN fs.DFSClient (DFSClient.java:run(1764)) - DFSOutputStream ResponseProcessor exception for block blk_6989304691537873255java.net.SocketTimeoutException: Read timed out
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(SocketInputStream.java:129)
      at java.io.DataInputStream.readFully(DataInputStream.java:176)
      at java.io.DataInputStream.readLong(DataInputStream.java:380)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1726)

      2008-01-19 23:01:31,578 INFO fs.DFSClient (DFSClient.java:run(1653)) - Closing old block blk_6989304691537873255
      2008-01-19 23:01:31,579 WARN fs.DFSClient (DFSClient.java:processDatanodeError(1803)) - Error Recovery for block blk_6989304691537873255 bad datanode[0] 127.0.0.1:40678
      2008-01-19 23:01:31,580 WARN fs.DFSClient (DFSClient.java:processDatanodeError(1836)) - Error Recovery for block blk_6989304691537873255 bad datanode 127.0.0.1:40678
      2008-01-19 23:01:31,580 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
      2008-01-19 23:01:31,580 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40680
      2008-01-19 23:01:31,582 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) - Receiving block blk_6989304691537873255 from /127.0.0.1
      2008-01-19 23:01:31,584 INFO dfs.DataNode (DataNode.java:writeBlock(1196)) - writeBlock blk_6989304691537873255 received exception java.io.IOException: Reopen Block blk_6989304691537873255 is valid, and cannot be written to.
      2008-01-19 23:01:31,584 ERROR dfs.DataNode (DataNode.java:run(997)) - 127.0.0.1:40680:DataXceiver: java.io.IOException: Reopen Block blk_6989304691537873255 is valid, and cannot be written to.
      at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:613)
      at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1996)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1109)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:982)
      at java.lang.Thread.run(Thread.java:595)

      2008-01-19 23:01:31,585 INFO fs.DFSClient (DFSClient.java:createBlockOutputStream(2024)) - Exception in createBlockOutputStream java.io.EOFException

      The log shows that blk_6989304691537873255 was successfully written to two datanodes. But dfsclient timed out waiting for a response from the first datanode. It tried to recover from the failure by resending the data to the second datanode. However, the recovery failed because the second datanode threw an IOException when it detected that it already had the block. It would be nice that the second datanode does not throw an exception for a finalized block during a recovery.

        Attachments

        1. datanodesBad3.patch
          5 kB
          Dhruba Borthakur
        2. datanodesBad1.log
          656 kB
          Jim Kellerman
        3. datanotesBad2.log
          647 kB
          Jim Kellerman
        4. datanodesBad2.patch
          5 kB
          Dhruba Borthakur
        5. build.log
          647 kB
          Jim Kellerman
        6. TestTableMapReduce-patch.txt
          0.9 kB
          Jim Kellerman
        7. datanodesBad1.patch
          4 kB
          Dhruba Borthakur
        8. datanodesBad.patch
          3 kB
          Dhruba Borthakur

          Activity

            People

            • Assignee:
              dhruba Dhruba Borthakur
              Reporter:
              hairong Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: