Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3831

slow-reading dfs clients do not recover from datanode-write-timeouts

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Some of our applications read through certain files from dfs (using libhdfs) much slower than through others, such that they trigger the write timeout introduced in 0.17.x into the datanodes. Eventually they fail.

      Dfs clients should be able to recover from such a situation.

      In the meantime, would setting
      dfs.datanode.socket.write.timeout=0
      in hadoop-site.xml help?

      Here are the exceptions I see:

      DataNode:

      2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got exception while serving blk_3304550638094049
      753 to /yyy:
      java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.
      SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
      at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
      at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
      at java.lang.Thread.run(Thread.java:619)

      DFS Client:

      08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture EOF from inputStream
      at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
      at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
      at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
      at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
      at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
      at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
      at java.io.DataInputStream.read(DataInputStream.java:83)

      08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block blk_3304550638094049753 from any node: java.io.IOException: No live nodes contain current block

        Attachments

        1. HADOOP-3831-branch-18.patch
          8 kB
          Raghu Angadi
        2. HADOOP-3831.patch
          7 kB
          Raghu Angadi
        3. HADOOP-3831.patch
          8 kB
          Raghu Angadi
        4. HADOOP-3831.patch
          8 kB
          Raghu Angadi
        5. HADOOP-3831.patch
          8 kB
          Raghu Angadi

          Activity

            People

            • Assignee:
              rangadi Raghu Angadi
              Reporter:
              ckunz Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: