Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6973

DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • hdfs-client
    • None
    • RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per node -3108Regions

    Description

      HBase as HDFS Client dose not close a dead connection with the datanode.
      This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port:50010.
      After I restart all RSs, the count of CLOSE_WAIT will increase always.
      $ netstat -an|grep CLOSE_WAIT|wc -l
      2545
      netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
      2545
      ps -ef|grep 6569
      hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC
      I aslo have reviewed these issues:
      HDFS-5697
      HDFS-5671
      HDFS-1836
      HBASE-9393
      I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been added.
      But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. Please check. Thanks a lot.
      These codes have been added into BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my problem,

      BlockReaderFactory.java
      // Some comments here
        private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
          if (LOG.isTraceEnabled()) {
            LOG.trace(this + ": trying to create a remote block reader from a " +
                "TCP socket");
          }
          BlockReader blockReader = null;
          while (true) {
            BlockReaderPeer curPeer = null;
            Peer peer = null;
            try {
              curPeer = nextTcpPeer();
              if (curPeer == null) break;
              if (curPeer.fromCache) remainingCacheTries--;
              peer = curPeer.peer;
              blockReader = getRemoteBlockReader(peer);
              return blockReader;
            } catch (IOException ioe) {
              if (isSecurityException(ioe)) {
                if (LOG.isTraceEnabled()) {
                  LOG.trace(this + ": got security exception while constructing " +
                      "a remote block reader from " + peer, ioe);
                }
                throw ioe;
              }
              if ((curPeer != null) && curPeer.fromCache) {
                // Handle an I/O error we got when using a cached peer.  These are
                // considered less serious, because the underlying socket may be
                // stale.
                if (LOG.isDebugEnabled()) {
                  LOG.debug("Closed potentially stale remote peer " + peer, ioe);
                }
              } else {
                // Handle an I/O error we got when using a newly created peer.
                LOG.warn("I/O error constructing remote block reader.", ioe);
                throw ioe;
              }
            } finally {
              if (blockReader == null) {
                IOUtils.cleanup(LOG, peer);
              }
            }
          }
          return null;
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevenxu steven xu
            Votes:
            2 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated: