Solr
  1. Solr
  2. SOLR-7673

Race condition in shard splitting can cause operation to hang indefinitely or sub-shards to never become active

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.4, 5.2
    • Fix Version/s: 5.3, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I found this while digging into the failure on https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java8/69/

         [junit4]   2> 390032 INFO  (OverseerStateUpdate-93987739315601411-127.0.0.1:57926_-n_0000000000) [n:127.0.0.1:57926_    ] o.a.s.c.o.ReplicaMutator Update state numShards=1 message={
         [junit4]   2>   "core":"testasynccollectioncreation_shard1_0_replica2",
         [junit4]   2>   "core_node_name":"core_node5",
         [junit4]   2>   "roles":null,
         [junit4]   2>   "base_url":"http://127.0.0.1:38702",
         [junit4]   2>   "node_name":"127.0.0.1:38702_",
         [junit4]   2>   "numShards":"1",
         [junit4]   2>   "state":"active",
         [junit4]   2>   "shard":"shard1_0",
         [junit4]   2>   "collection":"testasynccollectioncreation",
         [junit4]   2>   "operation":"state"}
         [junit4]   2> 390033 INFO  (OverseerStateUpdate-93987739315601411-127.0.0.1:57926_-n_0000000000) [n:127.0.0.1:57926_    ] o.a.s.c.o.ZkStateWriter going to update_collection /collections/testasynccollectioncreation/state.json version: 18
         [junit4]   2> 390033 INFO  (RecoveryThread-testasynccollectioncreation_shard1_0_replica2) [n:127.0.0.1:38702_ c:testasynccollectioncreation s:shard1_0 r:core_node5 x:testasynccollectioncreation_shard1_0_replica2] o.a.s.u.UpdateLog Took 2 ms to seed version buckets with highest version 1503803851028824064
         [junit4]   2> 390033 INFO  (zkCallback-167-thread-1-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/testasynccollectioncreation/state.json for collection testasynccollectioncreation has occurred - updating... (live nodes size: 2)
         [junit4]   2> 390033 INFO  (RecoveryThread-testasynccollectioncreation_shard1_0_replica2) [n:127.0.0.1:38702_ c:testasynccollectioncreation s:shard1_0 r:core_node5 x:testasynccollectioncreation_shard1_0_replica2] o.a.s.c.RecoveryStrategy Finished recovery process.
         [junit4]   2> 390033 INFO  (zkCallback-173-thread-1-processing-n:127.0.0.1:38702_) [n:127.0.0.1:38702_    ] o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/testasynccollectioncreation/state.json for collection testasynccollectioncreation has occurred - updating... (live nodes size: 2)
         [junit4]   2> 390034 INFO  (zkCallback-167-thread-1-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.c.ZkStateReader Updating data for testasynccollectioncreation to ver 19 
         [junit4]   2> 390034 INFO  (zkCallback-173-thread-1-processing-n:127.0.0.1:38702_) [n:127.0.0.1:38702_    ] o.a.s.c.c.ZkStateReader Updating data for testasynccollectioncreation to ver 19 
         [junit4]   2> 390298 INFO  (qtp1859109869-1664) [n:127.0.0.1:38702_    ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :requeststatus with paramsrequestid=1004&action=REQUESTSTATUS&wt=javabin&version=2 
         [junit4]   2> 390299 INFO  (qtp1859109869-1664) [n:127.0.0.1:38702_    ] o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/collections params={requestid=1004&action=REQUESTSTATUS&wt=javabin&version=2} status=0 QTime=1 
         [junit4]   2> 390348 INFO  (qtp1859109869-1665) [n:127.0.0.1:38702_    ] o.a.s.h.a.CoreAdminHandler Checking request status for : 10042599422118342586
         [junit4]   2> 390348 INFO  (qtp1859109869-1665) [n:127.0.0.1:38702_    ] o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores params={qt=/admin/cores&requestid=10042599422118342586&action=REQUESTSTATUS&wt=javabin&version=2} status=0 QTime=1 
         [junit4]   2> 390348 INFO  (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.OverseerCollectionProcessor Asking sub shard leader to wait for: testasynccollectioncreation_shard1_0_replica2 to be alive on: 127.0.0.1:38702_
         [junit4]   2> 390349 INFO  (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.OverseerCollectionProcessor Creating replica shard testasynccollectioncreation_shard1_1_replica2 as part of slice shard1_1 of collection testasynccollectioncreation on 127.0.0.1:38702_
         [junit4]   2> 390349 INFO  (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.c.ZkStateReader Load collection config from:/collections/testasynccollectioncreation
         [junit4]   2> 390350 INFO  (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_    ] o.a.s.c.c.ZkStateReader path=/collections/testasynccollectioncreation configName=conf1 specified config exists in ZooKeeper
         [junit4]   2> 390352 INFO  (qtp381614561-1631) [n:127.0.0.1:57926_    ] o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores params={nodeName=127.0.0.1:38702_&core=testasynccollectioncreation_shard1_0_replica1&async=10042599424132346542&qt=/admin/cores&coreNodeName=core_node5&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} status=0 QTime=2 
      

      The following sequence of events lead to deadlock:

      1. testasynccollectioncreation_shard1_0_replica2 (core_node5) becomes active
      2. OCP asks sub-shard leader testasynccollectioncreation_shard1_0_replica1 to wait until replica2 is in recovery

      At this point, the test just keeps pinging for status until timeout and fails.

      1. Lucene-Solr-Tests-trunk-Java8-69.log
        1.31 MB
        Shalin Shekhar Mangar
      2. SOLR-7673.patch
        13 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Shalin Shekhar Mangar added a comment -

        Attaching full log from the jenkins failure.

        Show
        Shalin Shekhar Mangar added a comment - Attaching full log from the jenkins failure.
        Hide
        Shalin Shekhar Mangar added a comment -

        I think there is another race condition. For example, we update the shard state to recovery after adding a replica but if in the meantime the replica becomes active, the slice won't be set to "active" anytime soon.

        Show
        Shalin Shekhar Mangar added a comment - I think there is another race condition. For example, we update the shard state to recovery after adding a replica but if in the meantime the replica becomes active, the slice won't be set to "active" anytime soon.
        Hide
        Shalin Shekhar Mangar added a comment -

        Here's a patch which fixes both the race conditions.

        This patch creates the replicas in cluster state first (by submitting an addreplica command to the overseer queue), then it calls updateshardstate to set the sub-slices to 'recovery' state and then it invokes addreplica action to actually create the replica cores.

        A new flag called "skipCreateReplicaInClusterState" is added to the addreplica action so that OCP doesn't try to create the replica in cluster state again. This is a hack to avoid duplicating most of the logic of addreplica inside splitshard and I've put in comments to this effect. I did not want to perform potentially dangerous refactoring as part of this bug fix but once this is committed, we can open a separate issue just to refactor OCP.

        I've been beasting this test quite a bit and haven't been able to reproduce the failures.

        Show
        Shalin Shekhar Mangar added a comment - Here's a patch which fixes both the race conditions. This patch creates the replicas in cluster state first (by submitting an addreplica command to the overseer queue), then it calls updateshardstate to set the sub-slices to 'recovery' state and then it invokes addreplica action to actually create the replica cores. A new flag called "skipCreateReplicaInClusterState" is added to the addreplica action so that OCP doesn't try to create the replica in cluster state again. This is a hack to avoid duplicating most of the logic of addreplica inside splitshard and I've put in comments to this effect. I did not want to perform potentially dangerous refactoring as part of this bug fix but once this is committed, we can open a separate issue just to refactor OCP. I've been beasting this test quite a bit and haven't been able to reproduce the failures.
        Hide
        ASF subversion and git services added a comment -

        Commit 1688396 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1688396 ]

        SOLR-7673: Race condition in shard splitting can cause operation to hang indefinitely or sub-shards to never become active

        Show
        ASF subversion and git services added a comment - Commit 1688396 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1688396 ] SOLR-7673 : Race condition in shard splitting can cause operation to hang indefinitely or sub-shards to never become active
        Hide
        ASF subversion and git services added a comment -

        Commit 1688404 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1688404 ]

        SOLR-7673: Race condition in shard splitting can cause operation to hang indefinitely or sub-shards to never become active

        Show
        ASF subversion and git services added a comment - Commit 1688404 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1688404 ] SOLR-7673 : Race condition in shard splitting can cause operation to hang indefinitely or sub-shards to never become active
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development