Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13010

DataNode: Listen queue is always 128

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • datanode
    • None

    Description

      DFS write-heavy workloads are failing with

      18/01/11 05:02:34 INFO mapreduce.Job: Task Id : attempt_1515660475578_0007_m_000387_0, Status : FAILED
      Error: java.io.IOException: Could not get block locations. Source file "/tmp/tpcds-generate/10000/_temporary/1/_temporary/attempt_1515660475578_0007_m_000387_0/inventory/data-m-00387" - Aborting...block==null
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477)
              at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
              at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
      

      This was tracked to

      Caused by: java.net.ConnectException: Connection refused
              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
              at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
              at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
              at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
              at org.apache.hadoop.hdfs.DataStreamer$StreamerStreams.<init>(DataStreamer.java:162)
              at org.apache.hadoop.hdfs.DataStreamer.transfer(DataStreamer.java:1450)
              at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1407)
              at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598)
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499)
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
              at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
              at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
      
      # ss -tl | grep 50010
      
      LISTEN     0      128        *:50010                    *:*   
      

      However, the system is configured with a much higher somaxconn

      # sysctl -a | grep somaxconn
      
      net.core.somaxconn = 16000
      

      Yet, the SNMP counters show connections being refused with 127 times the listen queue of a socket overflowed

      Attachments

        Issue Links

          Activity

            People

              ajayydv Ajay Kumar
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: