Solr
  1. Solr
  2. SOLR-5215

Deadlock in Solr Cloud ConnectionManager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.1
    • Fix Version/s: 4.5, 6.0
    • Component/s: clients - java, SolrCloud
    • Labels:
      None
    • Environment:

      Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

      java version "1.6.0_18"
      Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
      Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

      Description

      We are constantly seeing a deadlocks in our production application servers.

      The problem seems to be that a thread A:

      • tries to process an event and acquires the ConnectionManager lock
      • the update callback acquires connectionUpdateLock and invokes waitForConnected
      • waitForConnected tries to acquire the ConnectionManager lock (which already has)
      • waitForConnected calls wait and release the ConnectionManager lock (but still has the connectionUpdateLock)

      The a thread B:

      • tries to process an event and acquires the ConnectionManager lock
      • the update call back tries to acquire connectionUpdateLock but gets blocked holding the ConnectionManager lock and preventing thread A from getting out of the wait state.

      Here is part of the thread dump:

      "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x0000000059965800 nid=0x3e81 waiting for monitor entry [0x0000000057169000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)

      • waiting to lock <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

      "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x000000005ad40000 nid=0x3e67 waiting for monitor entry [0x000000004dbd4000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)

      • waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

      "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x00002aac4c2f7000 nid=0x3d9a waiting for monitor entry [0x0000000042821000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
      • locked <0x00002aab1b0e0f78> (a java.lang.Object)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

      Found one Java-level deadlock:
      =============================
      "http-0.0.0.0-8080-82-EventThread":
      waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager),
      which is held by "http-0.0.0.0-8080-82-EventThread"
      "http-0.0.0.0-8080-82-EventThread":
      waiting to lock monitor 0x00002aac4c314978 (object 0x00002aab1b0e0f78, a java.lang.Object),
      which is held by "http-0.0.0.0-8080-82-EventThread"
      "http-0.0.0.0-8080-82-EventThread":
      waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager),
      which is held by "http-0.0.0.0-8080-82-EventThread"

      Java stack information for the threads listed above:
      ===================================================
      "http-0.0.0.0-8080-82-EventThread":
      at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)

      • waiting to lock <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
        "http-0.0.0.0-8080-82-EventThread":
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
      • waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
        "http-0.0.0.0-8080-82-EventThread":
        at java.lang.Object.wait(Native Method)
      • waiting on <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
      • locked <0x00002aab1b0e0f78> (a java.lang.Object)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
      • locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

        Activity

        Hide
        Feihong Huang added a comment - - edited

        Thanks to Ricard to find the reason.
        I also encounter this issue in our production application servers.

        Show
        Feihong Huang added a comment - - edited Thanks to Ricard to find the reason. I also encounter this issue in our production application servers.
        Hide
        Mark Miller added a comment -

        I don't think we actually really need that separate update lock at all. This patch removes it.

        Show
        Mark Miller added a comment - I don't think we actually really need that separate update lock at all. This patch removes it.
        Hide
        ASF subversion and git services added a comment -

        Commit 1521236 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1521236 ]

        SOLR-5215: Fix possibility of deadlock in ZooKeeper ConnectionManager.

        Show
        ASF subversion and git services added a comment - Commit 1521236 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1521236 ] SOLR-5215 : Fix possibility of deadlock in ZooKeeper ConnectionManager.
        Hide
        ASF subversion and git services added a comment -

        Commit 1521239 from Mark Miller in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1521239 ]

        SOLR-5215: Fix possibility of deadlock in ZooKeeper ConnectionManager.

        Show
        ASF subversion and git services added a comment - Commit 1521239 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1521239 ] SOLR-5215 : Fix possibility of deadlock in ZooKeeper ConnectionManager.
        Hide
        Shalin Shekhar Mangar added a comment -

        This fix was released in 4.5

        Show
        Shalin Shekhar Mangar added a comment - This fix was released in 4.5

          People

          • Assignee:
            Mark Miller
            Reporter:
            Ricardo Merizalde
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development