Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
DFS write-heavy workloads are failing with
18/01/11 05:02:34 INFO mapreduce.Job: Task Id : attempt_1515660475578_0007_m_000387_0, Status : FAILED Error: java.io.IOException: Could not get block locations. Source file "/tmp/tpcds-generate/10000/_temporary/1/_temporary/attempt_1515660475578_0007_m_000387_0/inventory/data-m-00387" - Aborting...block==null at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
This was tracked to
Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253) at org.apache.hadoop.hdfs.DataStreamer$StreamerStreams.<init>(DataStreamer.java:162) at org.apache.hadoop.hdfs.DataStreamer.transfer(DataStreamer.java:1450) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1407) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
# ss -tl | grep 50010 LISTEN 0 128 *:50010 *:*
However, the system is configured with a much higher somaxconn
# sysctl -a | grep somaxconn net.core.somaxconn = 16000
Yet, the SNMP counters show connections being refused with 127 times the listen queue of a socket overflowed
Attachments
Issue Links
- is related to
-
HADOOP-16504 Increase ipc.server.listen.queue.size default from 128 to 256
- Resolved