Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7034

Consider allowing any node to become leader, regardless of their last published state.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Now that we allow a min replication param for updates, I think it's time to loosen this up. Currently, you can end up in a state where no one in a shard thinks they can be leader and you so do this fast ugly infinite loop trying to pick the leader.

      We should let anyone that is able to properly sync with the available replicas to become leader if that process succeeds.

      The previous strategy was to account for the case of not having enough replicas after a machine loss to ensure you don't lose the data. The idea was that you should stop the cluster to avoid losing data and repair and get all your replicas involved in a leadership election. Instead, we should favor carrying on, and those that want to ensure they don't lose data due to major replica loss should use the min replication update param.

        Issue Links

          Activity

          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I filed SOLR-7065 to tackle the lesser change in my last comment.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I filed SOLR-7065 to tackle the lesser change in my last comment.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          A good first step might be, if all replicas in a shard participate in a leader sync, don't consult last published state. This would at least deal with cases where replicas 'blink' at the same time (gc, network interrupt, etc), but everyone gets it together and are ready to move on.

          Show
          markrmiller@gmail.com Mark Miller added a comment - A good first step might be, if all replicas in a shard participate in a leader sync, don't consult last published state. This would at least deal with cases where replicas 'blink' at the same time (gc, network interrupt, etc), but everyone gets it together and are ready to move on.

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              markrmiller@gmail.com Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development