Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6721

ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 4.10.2
    • Fix Version/s: 5.0, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica information in the local "replicasInLeaderInitiatedRecovery" map before writing to ZK. If there is an exception writing to ZK then the replica is still in the local map preventing future indexing failures from setting the replica in "down" state.

      1. SOLR-6721.patch
        2 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        anshumg Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        anshumg Anshum Gupta added a comment - Bulk close after 5.0 release.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Thanks Tim for taking a look at it.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Thanks Tim for taking a look at it.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1637987 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1637987 ]

        SOLR-6721: ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1637987 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1637987 ] SOLR-6721 : ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1637986 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1637986 ]

        SOLR-6721: ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1637986 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1637986 ] SOLR-6721 : ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK
        Hide
        thelabdude Timothy Potter added a comment -

        Ah ok - I was going from memory ... this sounds good then.

        Show
        thelabdude Timothy Potter added a comment - Ah ok - I was going from memory ... this sounds good then.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns.

        That's not true because we synchronize on the map before writing to ZK and/or adding the replica url to the map. So all threads will wait for the lock to be available and then see that the map already has the replicaUrl and return.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns. That's not true because we synchronize on the map before writing to ZK and/or adding the replica url to the map. So all threads will wait for the lock to be available and then see that the map already has the replicaUrl and return.
        Hide
        thelabdude Timothy Potter added a comment -

        Alternatively, we could introduce some sort of shared lock per replicaUrl instead of using the map object's lock

        Show
        thelabdude Timothy Potter added a comment - Alternatively, we could introduce some sort of shared lock per replicaUrl instead of using the map object's lock
        Hide
        thelabdude Timothy Potter added a comment -

        My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns.

        I think it would be better to add to the map first, so that any subsequent requests to put that replica into LIR know it's in-progress and then remove from the map if the write to ZK (updateLeaderInitiatedRecoveryState) fails.

        Show
        thelabdude Timothy Potter added a comment - My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns. I think it would be better to add to the map first, so that any subsequent requests to put that replica into LIR know it's in-progress and then remove from the map if the write to ZK (updateLeaderInitiatedRecoveryState) fails.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Patch which re-orders the statements so that state is written to ZK first and then the local map is updated. I also made the local map as a final field.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Patch which re-orders the statements so that state is written to ZK first and then the local map is updated. I also made the local map as a final field.

          People

          • Assignee:
            shalinmangar Shalin Shekhar Mangar
            Reporter:
            shalinmangar Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development