Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11833

Hbase does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.98.0
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None
    • Environment:

      RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per node -3108Regions

      Description

      HBase dose not close a dead connection with the datanode.
      This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port:50010.
      After I restart all RSs, the count of CLOSE_WAIT will increase always.
      $ netstat -an|grep CLOSE_WAIT|wc -l
      2545

      1. netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
        2545
      2. ps -ef|grep 6569
        hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC

      I aslo have reviewed these issues:
      HBASE-9393
      HDFS-5671
      HDFS-1836
      I found HBase 0.98/Hadoop 2.4.0 I uesed which source codes are not different from these patches.
      But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. Please check. Thanks a lot.

        Activity

        Hide
        lhofhansl Lars Hofhansl added a comment -

        steven xu, did you ask about this on the mailing lists? You'll generally get better answers there.

        Show
        lhofhansl Lars Hofhansl added a comment - steven xu , did you ask about this on the mailing lists? You'll generally get better answers there.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        See also the comments towards the end here: HBASE-9393.

        Show
        lhofhansl Lars Hofhansl added a comment - See also the comments towards the end here: HBASE-9393 .
        Hide
        apurtell Andrew Purtell added a comment -

        Closing as dup of HBASE-9393

        Show
        apurtell Andrew Purtell added a comment - Closing as dup of HBASE-9393
        Hide
        stevenxu steven xu added a comment -

        Guys, before create this issue, I have read the HBASE-9393 and HDFS-5671. I found the patch code of these two Issues have added into Hadoop 2.4.0 tag in class BlockReaderFactory.getRemoteBlockReaderFromTcp(). So the HBASE-9393 patch donot solve my problem. Another bug maybe lead my problem, so I created a new issue. Please check also.

        Bar.java
        // Some comments here
          private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
            if (LOG.isTraceEnabled()) {
              LOG.trace(this + ": trying to create a remote block reader from a " +
                  "TCP socket");
            }
            BlockReader blockReader = null;
            while (true) {
              BlockReaderPeer curPeer = null;
              Peer peer = null;
              try {
                curPeer = nextTcpPeer();
                if (curPeer == null) break;
                if (curPeer.fromCache) remainingCacheTries--;
                peer = curPeer.peer;
                blockReader = getRemoteBlockReader(peer);
                return blockReader;
              } catch (IOException ioe) {
                if (isSecurityException(ioe)) {
                  if (LOG.isTraceEnabled()) {
                    LOG.trace(this + ": got security exception while constructing " +
                        "a remote block reader from " + peer, ioe);
                  }
                  throw ioe;
                }
                if ((curPeer != null) && curPeer.fromCache) {
                  // Handle an I/O error we got when using a cached peer.  These are
                  // considered less serious, because the underlying socket may be
                  // stale.
                  if (LOG.isDebugEnabled()) {
                    LOG.debug("Closed potentially stale remote peer " + peer, ioe);
                  }
                } else {
                  // Handle an I/O error we got when using a newly created peer.
                  LOG.warn("I/O error constructing remote block reader.", ioe);
                  throw ioe;
                }
              } finally {
                if (blockReader == null) {
                  IOUtils.cleanup(LOG, peer);
                }
              }
            }
            return null;
          }
        
        Show
        stevenxu steven xu added a comment - Guys, before create this issue, I have read the HBASE-9393 and HDFS-5671 . I found the patch code of these two Issues have added into Hadoop 2.4.0 tag in class BlockReaderFactory.getRemoteBlockReaderFromTcp(). So the HBASE-9393 patch donot solve my problem. Another bug maybe lead my problem, so I created a new issue. Please check also. Bar.java // Some comments here private BlockReader getRemoteBlockReaderFromTcp() throws IOException { if (LOG.isTraceEnabled()) { LOG.trace( this + ": trying to create a remote block reader from a " + "TCP socket" ); } BlockReader blockReader = null ; while ( true ) { BlockReaderPeer curPeer = null ; Peer peer = null ; try { curPeer = nextTcpPeer(); if (curPeer == null ) break ; if (curPeer.fromCache) remainingCacheTries--; peer = curPeer.peer; blockReader = getRemoteBlockReader(peer); return blockReader; } catch (IOException ioe) { if (isSecurityException(ioe)) { if (LOG.isTraceEnabled()) { LOG.trace( this + ": got security exception while constructing " + "a remote block reader from " + peer, ioe); } throw ioe; } if ((curPeer != null ) && curPeer.fromCache) { // Handle an I/O error we got when using a cached peer. These are // considered less serious, because the underlying socket may be // stale. if (LOG.isDebugEnabled()) { LOG.debug( "Closed potentially stale remote peer " + peer, ioe); } } else { // Handle an I/O error we got when using a newly created peer. LOG.warn( "I/O error constructing remote block reader." , ioe); throw ioe; } } finally { if (blockReader == null ) { IOUtils.cleanup(LOG, peer); } } } return null ; }
        Hide
        apurtell Andrew Purtell added a comment -

        Rather than paste HDFS code here steven xu consider commenting on or filing a new HDFS JIRA.

        Show
        apurtell Andrew Purtell added a comment - Rather than paste HDFS code here steven xu consider commenting on or filing a new HDFS JIRA.

          People

          • Assignee:
            Unassigned
            Reporter:
            stevenxu steven xu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development