Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.17.1
-
None
-
None
-
Reviewed
Description
Some of our applications read through certain files from dfs (using libhdfs) much slower than through others, such that they trigger the write timeout introduced in 0.17.x into the datanodes. Eventually they fail.
Dfs clients should be able to recover from such a situation.
In the meantime, would setting
dfs.datanode.socket.write.timeout=0
in hadoop-site.xml help?
Here are the exceptions I see:
DataNode:
2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got exception while serving blk_3304550638094049
753 to /yyy:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.
SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
at java.lang.Thread.run(Thread.java:619)
DFS Client:
08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
at java.io.DataInputStream.read(DataInputStream.java:83)
08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block blk_3304550638094049753 from any node: java.io.IOException: No live nodes contain current block