Solr
  1. Solr
  2. SOLR-6721

ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.10.2
    • Fix Version/s: 5.0, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica information in the local "replicasInLeaderInitiatedRecovery" map before writing to ZK. If there is an exception writing to ZK then the replica is still in the local map preventing future indexing failures from setting the replica in "down" state.

      1. SOLR-6721.patch
        2 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Shalin Shekhar Mangar added a comment -

        Patch which re-orders the statements so that state is written to ZK first and then the local map is updated. I also made the local map as a final field.

        Show
        Shalin Shekhar Mangar added a comment - Patch which re-orders the statements so that state is written to ZK first and then the local map is updated. I also made the local map as a final field.
        Hide
        Timothy Potter added a comment -

        My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns.

        I think it would be better to add to the map first, so that any subsequent requests to put that replica into LIR know it's in-progress and then remove from the map if the write to ZK (updateLeaderInitiatedRecoveryState) fails.

        Show
        Timothy Potter added a comment - My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns. I think it would be better to add to the map first, so that any subsequent requests to put that replica into LIR know it's in-progress and then remove from the map if the write to ZK (updateLeaderInitiatedRecoveryState) fails.
        Hide
        Timothy Potter added a comment -

        Alternatively, we could introduce some sort of shared lock per replicaUrl instead of using the map object's lock

        Show
        Timothy Potter added a comment - Alternatively, we could introduce some sort of shared lock per replicaUrl instead of using the map object's lock
        Hide
        Shalin Shekhar Mangar added a comment -

        My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns.

        That's not true because we synchronize on the map before writing to ZK and/or adding the replica url to the map. So all threads will wait for the lock to be available and then see that the map already has the replicaUrl and return.

        Show
        Shalin Shekhar Mangar added a comment - My concern with this approach is that you can have literally 100's of failures going on at once and if you wait to add to the map until the write to ZK succeeds, it seems like there can be many attempted writes to ZK until the first one returns. That's not true because we synchronize on the map before writing to ZK and/or adding the replica url to the map. So all threads will wait for the lock to be available and then see that the map already has the replicaUrl and return.
        Hide
        Timothy Potter added a comment -

        Ah ok - I was going from memory ... this sounds good then.

        Show
        Timothy Potter added a comment - Ah ok - I was going from memory ... this sounds good then.
        Hide
        ASF subversion and git services added a comment -

        Commit 1637986 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1637986 ]

        SOLR-6721: ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

        Show
        ASF subversion and git services added a comment - Commit 1637986 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1637986 ] SOLR-6721 : ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK
        Hide
        ASF subversion and git services added a comment -

        Commit 1637987 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1637987 ]

        SOLR-6721: ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK

        Show
        ASF subversion and git services added a comment - Commit 1637987 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1637987 ] SOLR-6721 : ZkController.ensureReplicaInLeaderInitiatedRecovery puts replica in local map before writing to ZK
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Tim for taking a look at it.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Tim for taking a look at it.
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development