Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14241

Fix deadlock during cluster shutdown due to concurrent connection close

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Caught while testing branch-1.0, shutting down TestMasterMetricsWrapper.

      Found one Java-level deadlock:
      =============================
      "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0":
      waiting to lock monitor 0x00007f2a040051c8 (object 0x00000007e36108a8, a org.apache.hadoop.hbase.util.PoolMap),
      which is held by "M:0;ip-10-32-130-237:55342"
      "M:0;ip-10-32-130-237:55342":
      waiting to lock monitor 0x00007f2a04005118 (object 0x00000007e3610b00, a org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
      which is held by "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0"

      Full stack dump and deadlock debug output attached.

      Root cause:
      In RpcClientImpl#close(), we obtain lock on connections first:

          synchronized (connections) {
            for (Connection conn : connections.values()) {
      

      Then markClosed() tries to obtain lock on connection object:

              if (!conn.isAlive()) {
                conn.markClosed(new InterruptedIOException("RpcClient is closing"));
                conn.close();
      

      Another thread, MetaServerShutdownHandler, calls RpcClientImpl$Connection#setupIOstreams() where :

              markClosed(e);
              close();
      

      Lock on connection object is obtained first, then lock on connections is attempted, leading to deadlock:

            synchronized (connections) {
              connections.removeValue(remoteId, this);
            }
      

        Attachments

        1. deadlock.txt.gz
          10 kB
          Andrew Purtell
        2. 14241-v5.txt
          10 kB
          Ted Yu
        3. 14241-v4.txt
          10 kB
          Ted Yu
        4. 14241-v3.txt
          9 kB
          Ted Yu
        5. 14241-v2.txt
          8 kB
          Ted Yu

          Activity

            People

            • Assignee:
              tedyu@apache.org Ted Yu
              Reporter:
              apurtell Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: