Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16565

DataNode holds a large number of CLOSE_WAIT connections that are not released

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.3.0
    • None
    • datanode, ec
    • None
    • CentOS Linux release 7.5.1804 (Core)

    Description

      There is a strange phenomenon here, DataNode holds a large number of connections in CLOSE_WAIT state and does not release.
      netstat -na | awk '/^tcp/ {++S[$NF]} END

      {for(a in S) print a, S[a]}

      '
      LISTEN 20
      CLOSE_WAIT 17707
      ESTABLISHED 1450
      TIME_WAIT 12

      It can be found that the connections with the CLOSE_WAIT state have reached 17k and are still growing. View these CLOSE_WAITs through the lsof command, and get the following phenomenon:
      lsof -i tcp | grep -E 'CLOSE_WAIT|COMMAND'

      It can be seen that the reason for this phenomenon is that Socket#close() is not called correctly, and DataNode interacts with other nodes as Client.

      Attachments

        1. screenshot-1.png
          290 kB
          JiangHua Zhu
        2. screenshot-2.png
          116 kB
          JiangHua Zhu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jianghuazhu JiangHua Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: