Solr
  1. Solr
  2. SOLR-5324

Make sub shard replica recovery and shard state switch asynchronous

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, Trunk
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Currently the shard split command waits for all replicas of all sub shards to recover and then switches the state of parent to inactive and sub-shards to active.

      The problem is that shard split (ab)uses the CoreAdmin WaitForState action to ask the sub shard leader to wait until the replica states are active. This action is prone to timeout.

      We should make the shard state switching asynchronous. Once all replicas of all sub-shards are 'active', the shard states should be switched automatically.

      1. SOLR-5324.patch
        23 kB
        Shalin Shekhar Mangar
      2. SOLR-5324.patch
        23 kB
        Shalin Shekhar Mangar
      3. SOLR-5324.patch
        25 kB
        Shalin Shekhar Mangar
      4. SOLR-5324.patch
        25 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Shalin Shekhar Mangar added a comment -

        Changes:

        1. A new shard state: 'recovery' is added
        2. After all sub-shard replicas have been created, the sub-shard state is set to 'recovery'. If replication factor is 1 then the sub-shards are set to 'active'. The splitshard API returns at this point.
        3. The state change events in the overseer are used to track when all replicas of all sub-shards become 'active'. Once that happens, the parent shard is set to inactive and the sub-shards are set to 'active'.
        4. To facilitate the above, a slice property called 'parent' is introduced which is removed once the slice becomes 'active'.
        5. If the split is retried then we use the 'deleteshard' api to completely remove the sub-shards before starting the splitting process.
        Show
        Shalin Shekhar Mangar added a comment - Changes: A new shard state: 'recovery' is added After all sub-shard replicas have been created, the sub-shard state is set to 'recovery'. If replication factor is 1 then the sub-shards are set to 'active'. The splitshard API returns at this point. The state change events in the overseer are used to track when all replicas of all sub-shards become 'active'. Once that happens, the parent shard is set to inactive and the sub-shards are set to 'active'. To facilitate the above, a slice property called 'parent' is introduced which is removed once the slice becomes 'active'. If the split is retried then we use the 'deleteshard' api to completely remove the sub-shards before starting the splitting process.
        Hide
        Shalin Shekhar Mangar added a comment -
        1. On unsuccessful replica recovery, the sub-shard state was incorrectly being set active
        2. The split by route field test should wait for the right collection to recover
        Show
        Shalin Shekhar Mangar added a comment - On unsuccessful replica recovery, the sub-shard state was incorrectly being set active The split by route field test should wait for the right collection to recover
        Hide
        Anshum Gupta added a comment -

        This is good. Just had a quick look and it looks fine to me.

        Show
        Anshum Gupta added a comment - This is good. Just had a quick look and it looks fine to me.
        Hide
        Shalin Shekhar Mangar added a comment -

        Changes:

        1. Extracted shard splitting related logic in overseer to its own method
        2. The Overseer.updateShardState is re-used for switching shard state. It also takes care of removing parent shard information when a shard is switched from 'recovery' to 'active' states.
        Show
        Shalin Shekhar Mangar added a comment - Changes: Extracted shard splitting related logic in overseer to its own method The Overseer.updateShardState is re-used for switching shard state. It also takes care of removing parent shard information when a shard is switched from 'recovery' to 'active' states.
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Anshum! I think this is ready to go into trunk. I'll let it bake for a while and then merge into branch_4x.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Anshum! I think this is ready to go into trunk. I'll let it bake for a while and then merge into branch_4x.
        Hide
        Shalin Shekhar Mangar added a comment -
        1. Removed unused imports introduced in Overseer
        2. Improved logging message in DistribUpdateProcessor when sub-shard is active an update request from parent is received.
        Show
        Shalin Shekhar Mangar added a comment - Removed unused imports introduced in Overseer Improved logging message in DistribUpdateProcessor when sub-shard is active an update request from parent is received.
        Hide
        ASF subversion and git services added a comment -

        Commit 1530994 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1530994 ]

        SOLR-5324: Make sub shard replica recovery and shard state switch asynchronous

        Show
        ASF subversion and git services added a comment - Commit 1530994 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1530994 ] SOLR-5324 : Make sub shard replica recovery and shard state switch asynchronous
        Hide
        ASF subversion and git services added a comment -

        Commit 1531580 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1531580 ]

        SOLR-5324: Make sub shard replica recovery and shard state switch asynchronous

        Show
        ASF subversion and git services added a comment - Commit 1531580 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531580 ] SOLR-5324 : Make sub shard replica recovery and shard state switch asynchronous

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development