Description
I use cgroups to limit the datanode node IO to 1024Byte/s, use hedged read to read the file, (where dfs.client.hedged.read.threadpool.size is set to 5, dfs.client.hedged.read.threshold.millis is set to 500), the first 5 buffer read timeout, switch other datenode nodes to read successfully. Then stuck for a long time because of SocketTimeoutException. Log as follows
2020-06-11 16:40:07,832 | INFO | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:08,562 | INFO | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:09,102 | INFO | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:09,642 | INFO | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:10,182 | INFO | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:10,182 | INFO | main | Execution rejected, Executing in current thread | DFSClient.java:3049
2020-06-11 16:40:10,219 | INFO | main | Execution rejected, Executing in current thread | DFSClient.java:3049
2020-06-11 16:50:07,638 | WARN | hedgedRead-0 | I/O error constructing remote block reader. | BlockReaderFactory.java:764
java.net.SocketTimeoutException: 600000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.xx.113:62750 remote=/xx.xx.xx.28:25009]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:551)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:418)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:853)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:749)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:661)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1063)
at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1035)
at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1031)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-06-11 16:50:07,638 | WARN | hedgedRead-0 | Connection failure: Failed to connect to /xx.xx.xx.28:25009 for file /testhdfs/test2.jar for block BP-1820384660-xx.xx.xx.74-1585533043013:blk_1082582662_8861386:java.net.SocketTimeoutException: 600000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.xx.113:62750 remote=/xx.xx.xx.28:25009] | DFSInputStream.java:1118