Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2891

Some times first DataNode detected as bad when we power off for the second DataNode.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: datanode, hdfs-client
    • Labels:
      None

      Description

      In one of my clusters, observed this situation.
      This issue looks to be due to time out in ResponseProcesser at client side, it is marking first DataNode as bad.
      This happens in 20.2 version. This can be there in branch-1 as well and will check for trunk.

        Activity

        Hide
        Uma Maheswara Rao G added a comment -

        Hi Chackara,

        Thanks a lot for working on this issue.
        Seems to be valid scenario. Do you want to upload a patch for this issue?

        Show
        Uma Maheswara Rao G added a comment - Hi Chackara, Thanks a lot for working on this issue. Seems to be valid scenario. Do you want to upload a patch for this issue?
        Hide
        chackaravarthy added a comment -

        Hi Uma,

        This problem exists in branch-1.0 also.

        This can happen in the following scenario :

        1. consider the pipeline as [ DN1 -> DN2 -> DN3 ]
        2. create one file and get the stream.
        3. write some bytes using that stream and call sync.
        4. keep the stream open.
        5. Now unplug network / power-off / ethernet down in DN2 machine.

        The explanation is as follows :

        Consider the case, when the caller is not writing any data and dataStreamer & ResponseProcessor threads are running And also DN2 machine ethernet down:

        1. At time t1 , ResponseProcessor will start reading the ack from DN1 [timeOut is 69 secs ]

        2. But in the DN1 , packetResponder not yet started reading the ack. It will be waiting on ackQueue until one packet arrives.

        3. After time t1+34.5 secs only, dataStreamer will stream the HEART_BEAT packet to DN1. [if there is no data packet, DataStreamer will send HEART_BEAT packet after waiting for half of the timeout value]

        4. Then only, DataXceiver will receive the packet and will put it in ackQueue in DN side.

        5. At time t2 , Once the ackQueue is enqueued, the packetResponder will start reading the ack from DN2. [timeOut is 66 secs]

        6. As DN2 machine ethernet is down, packetResponder in DN1 is not getting the reply.

        7. But packetResponder will get timeOut only after t2+66 secs.

        8. Hence ResponseProcessor is getting socketTimeOutException earlier than PacketResponder.

        t2 - t1 >= 34.5 secs [if its greater than 3 secs itself, the reported scenario can happen]

        So, DFSClient is getting SocketTimeOutException before DN1.
        Hence DFSClient is detecting DN1 as bad datanode [which is up] and not detecting DN2 as bad datanode [which is down]

        Reason :

        In DataNode, the PacketResponder will start reading the ack only when it receives one packet [either data or heartbeat packet]
        But In DFSClient, the ResponseProcessor will start reading the ack before sending packet.

        Show
        chackaravarthy added a comment - Hi Uma, This problem exists in branch-1.0 also. This can happen in the following scenario : 1. consider the pipeline as [ DN1 -> DN2 -> DN3 ] 2. create one file and get the stream. 3. write some bytes using that stream and call sync. 4. keep the stream open. 5. Now unplug network / power-off / ethernet down in DN2 machine. The explanation is as follows : Consider the case, when the caller is not writing any data and dataStreamer & ResponseProcessor threads are running And also DN2 machine ethernet down: 1. At time t1 , ResponseProcessor will start reading the ack from DN1 [timeOut is 69 secs ] 2. But in the DN1 , packetResponder not yet started reading the ack. It will be waiting on ackQueue until one packet arrives. 3. After time t1+34.5 secs only, dataStreamer will stream the HEART_BEAT packet to DN1. [if there is no data packet, DataStreamer will send HEART_BEAT packet after waiting for half of the timeout value] 4. Then only, DataXceiver will receive the packet and will put it in ackQueue in DN side. 5. At time t2 , Once the ackQueue is enqueued, the packetResponder will start reading the ack from DN2. [timeOut is 66 secs] 6. As DN2 machine ethernet is down, packetResponder in DN1 is not getting the reply. 7. But packetResponder will get timeOut only after t2+66 secs. 8. Hence ResponseProcessor is getting socketTimeOutException earlier than PacketResponder. t2 - t1 >= 34.5 secs [if its greater than 3 secs itself, the reported scenario can happen] So, DFSClient is getting SocketTimeOutException before DN1. Hence DFSClient is detecting DN1 as bad datanode [which is up] and not detecting DN2 as bad datanode [which is down] Reason : In DataNode, the PacketResponder will start reading the ack only when it receives one packet [either data or heartbeat packet] But In DFSClient, the ResponseProcessor will start reading the ack before sending packet.
        Hide
        Uma Maheswara Rao G added a comment -

        Note: this is not happening very regularly.

        Show
        Uma Maheswara Rao G added a comment - Note: this is not happening very regularly.
        Hide
        Uma Maheswara Rao G added a comment -

        More info:

        This happens when we test with Hbase.
        Run the HDFS and HBase cluster normally, and Suddenly Power-off the mid Datanode from the pipeline.
        Then Clinet was getting following exception.

        [2012-01-31 11:15:42,596] [WARN ] [ResponseProcessor for block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3287] DFSOutputStream ResponseProcessor exception for block blk_1327946241860_1109java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XXX:59179 remote=/FIRST_DATANODE_IP:10010]
        	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:167)
        	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        	at java.io.DataInputStream.readFully(DataInputStream.java:178)
        	at java.io.DataInputStream.readLong(DataInputStream.java:399)
        	at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:131)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3240)
        

        After that client closed the connection and first Datanode was getting EOFException and interrupted the threads.

        [2012-01-31 11:15:42,597] [INFO ] [org.apache.hadoop.hdfs.server.datanode.DataXceiver@3e909a58] [org.apache.hadoop.hdfs.server.datanode.DataNode 816] Exception in receiveBlock for block blk_1327946241860_1109
        java.io.EOFException: while trying to read 31744 bytes
        	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:352)
        	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:399)
        	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:611)
        	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781)
        	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:514)
        	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:138)
        	at java.lang.Thread.run(Thread.java:662)
        [2012-01-31 11:15:42,598] [INFO ] [PacketResponder 2 for Block blk_1327946241860_1109] [org.apache.hadoop.hdfs.server.datanode.DataNode 1123] PacketResponder blk_1327946241860_1109 2 : Thread is interrupted.
        [2012-01-31 11:15:42,598] [INFO ] [PacketResponder 2 for Block blk_1327946241860_1109] [org.apache.hadoop.hdfs.server.datanode.DataNode 1194] PacketResponder 2 for block blk_1327946241860_1109 terminating
        

        Finally Client is marking this first DataNode as bad.

        [2012-01-31 11:15:42,597] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3326] Error Recovery for block blk_1327946241860_1109 bad datanode[0] FIRST_DATANODE_IP:10010
        [2012-01-31 11:15:42,597] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3380] Error Recovery for block blk_1327946241860_1109 in pipeline FIRST_DATANODE_IP:10010, SECOND_DATANODE_IP:10010, THIRD_DATANODE_IP:10010: bad datanode FIRST_DATANODE_IP:10010
        [2012-01-31 11:15:46,607] [INFO ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.ipc.Client 514] Retrying connect to server: /SECOND_DATANODE_IP:10020. Already tried 0 time(s).
        

        Infact first datanode is healthy, but unportunately due to this timeouts clinet detected first datanode as bad. Then immediately it is trying with the second datanode( by choosing it as primary node).

        Ideally first Datanode should get first timeout because he will wait only 66000ms for ack response from second datanode(power off DN). Here clients waits for 69000ms. But for some reason Client is getting first time out exception that firstDatanode. So, first DN marked as bad instaed of second one.

        Impact is, Clinet again tries for the second datanode and obiously will fail again.It will 6 times unnecessarily it will retry.

        2012-01-31 11:19:58,565] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3426] Error Recovery for block blk_1327946241860_1109 failed  because recovery from primary datanode SECOND_DATANODE_IP:10010 failed 6 times.  Pipeline was FIRST_DATANODE_IP:10010, SECOND_DATANODE_IP:10010, THIRD_DATANODE_IP:10010. Marking primary datanode as bad.
        

        Finally it will end up in writing only one replica in third DataNode. But here we have two nodes healthy.

        Show
        Uma Maheswara Rao G added a comment - More info: This happens when we test with Hbase. Run the HDFS and HBase cluster normally, and Suddenly Power-off the mid Datanode from the pipeline. Then Clinet was getting following exception. [2012-01-31 11:15:42,596] [WARN ] [ResponseProcessor for block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3287] DFSOutputStream ResponseProcessor exception for block blk_1327946241860_1109java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XXX:59179 remote=/FIRST_DATANODE_IP:10010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:167) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:131) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3240) After that client closed the connection and first Datanode was getting EOFException and interrupted the threads. [2012-01-31 11:15:42,597] [INFO ] [org.apache.hadoop.hdfs.server.datanode.DataXceiver@3e909a58] [org.apache.hadoop.hdfs.server.datanode.DataNode 816] Exception in receiveBlock for block blk_1327946241860_1109 java.io.EOFException: while trying to read 31744 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:352) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:399) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:611) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:514) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:138) at java.lang.Thread.run(Thread.java:662) [2012-01-31 11:15:42,598] [INFO ] [PacketResponder 2 for Block blk_1327946241860_1109] [org.apache.hadoop.hdfs.server.datanode.DataNode 1123] PacketResponder blk_1327946241860_1109 2 : Thread is interrupted. [2012-01-31 11:15:42,598] [INFO ] [PacketResponder 2 for Block blk_1327946241860_1109] [org.apache.hadoop.hdfs.server.datanode.DataNode 1194] PacketResponder 2 for block blk_1327946241860_1109 terminating Finally Client is marking this first DataNode as bad. [2012-01-31 11:15:42,597] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3326] Error Recovery for block blk_1327946241860_1109 bad datanode[0] FIRST_DATANODE_IP:10010 [2012-01-31 11:15:42,597] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3380] Error Recovery for block blk_1327946241860_1109 in pipeline FIRST_DATANODE_IP:10010, SECOND_DATANODE_IP:10010, THIRD_DATANODE_IP:10010: bad datanode FIRST_DATANODE_IP:10010 [2012-01-31 11:15:46,607] [INFO ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.ipc.Client 514] Retrying connect to server: /SECOND_DATANODE_IP:10020. Already tried 0 time(s). Infact first datanode is healthy, but unportunately due to this timeouts clinet detected first datanode as bad. Then immediately it is trying with the second datanode( by choosing it as primary node). Ideally first Datanode should get first timeout because he will wait only 66000ms for ack response from second datanode(power off DN). Here clients waits for 69000ms. But for some reason Client is getting first time out exception that firstDatanode. So, first DN marked as bad instaed of second one. Impact is, Clinet again tries for the second datanode and obiously will fail again.It will 6 times unnecessarily it will retry. 2012-01-31 11:19:58,565] [WARN ] [DataStreamer for file /hbase/.logs/DDB01,20020,1327946260020/DDB01%3A20020.1327978683479 block blk_1327946241860_1109] [org.apache.hadoop.hdfs.DFSClient 3426] Error Recovery for block blk_1327946241860_1109 failed because recovery from primary datanode SECOND_DATANODE_IP:10010 failed 6 times. Pipeline was FIRST_DATANODE_IP:10010, SECOND_DATANODE_IP:10010, THIRD_DATANODE_IP:10010. Marking primary datanode as bad. Finally it will end up in writing only one replica in third DataNode. But here we have two nodes healthy.

          People

          • Assignee:
            Unassigned
            Reporter:
            Uma Maheswara Rao G
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development