Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13219

NameNode Rpc Reader Thread crash, and cluster hang.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4
    • None
    • rpc-server
    • catch throwable

    Description

      My Cluster hang yesterday .
      Becuase the rpc server Reader threads crash. So all rpc request timeout, include datanode hearbeat &.
      We can see , the method doRunLoop just catch InterruptedException and IOException:

      while (running) {
      SelectionKey key = null;
      try {
      // consume as many connections as currently queued to avoid
      // unbridled acceptance of connections that starves the select
      int size = pendingConnections.size();
      for (int i=size; i>0; i--)

      { Connection conn = pendingConnections.take(); conn.channel.register(readSelector, SelectionKey.OP_READ, conn); }

      readSelector.select();

      Iterator<SelectionKey> iter = readSelector.selectedKeys().iterator();
      while (iter.hasNext()) {
      key = iter.next();
      iter.remove();
      if (key.isValid()) {
      if (key.isReadable())

      { doRead(key); }

      }
      key = null;
      }
      } catch (InterruptedException e) {
      if (running)

      { // unexpected -- log it LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e); }

      } catch (IOException ex)

      { LOG.error("Error in Reader", ex); }


      }

      Attachments

        1. HADOOP-13219-3.patch
          1 kB
          ChenFolin
        2. HDFS-10472.patch
          1.0 kB
          ChenFolin
        3. HDFS-10472-2.patch
          1 kB
          ChenFolin

        Activity

          People

            Unassigned Unassigned
            chenfolin ChenFolin
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: