Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3831

slow-reading dfs clients do not recover from datanode-write-timeouts

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.17.1
    • 0.19.0
    • None
    • None
    • Reviewed

    Description

      Some of our applications read through certain files from dfs (using libhdfs) much slower than through others, such that they trigger the write timeout introduced in 0.17.x into the datanodes. Eventually they fail.

      Dfs clients should be able to recover from such a situation.

      In the meantime, would setting
      dfs.datanode.socket.write.timeout=0
      in hadoop-site.xml help?

      Here are the exceptions I see:

      DataNode:

      2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got exception while serving blk_3304550638094049
      753 to /yyy:
      java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.
      SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
      at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
      at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
      at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
      at java.lang.Thread.run(Thread.java:619)

      DFS Client:

      08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture EOF from inputStream
      at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
      at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
      at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
      at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
      at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
      at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
      at java.io.DataInputStream.read(DataInputStream.java:83)

      08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block blk_3304550638094049753 from any node: java.io.IOException: No live nodes contain current block

      Attachments

        1. HADOOP-3831.patch
          8 kB
          Raghu Angadi
        2. HADOOP-3831.patch
          8 kB
          Raghu Angadi
        3. HADOOP-3831.patch
          8 kB
          Raghu Angadi
        4. HADOOP-3831.patch
          7 kB
          Raghu Angadi
        5. HADOOP-3831-branch-18.patch
          8 kB
          Raghu Angadi

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rangadi Raghu Angadi Assign to me
            ckunz Christian Kunz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment