Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6530

Commits under network partition can put any node in down state

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 4.10.2, 5.0, 6.0
    • SolrCloud
    • None

    Description

      Commits are executed by any node in SolrCloud i.e. they're not routed via the leader like other updates.

      1. Suppose there's 1 collection, 1 shard, 2 replicas (A and B) and A is the leader
      2. Suppose a commit request is made to node B during a time where B cannot talk to A due to a partition for any reason (failing switch, heavy GC, whatever)
      3. B fails to distribute the commit to A (times out) and asks A to recover
      4. This was okay earlier because a leader just ignores recovery requests but with leader initiated recovery code, B puts A in the "down" state and A can never get out of that state.

      tl;dr; During network partitions, if enough commit/optimize requests are sent to the cluster, all the nodes in the cluster will eventually be marked as "down".

      Attachments

        1. SOLR-6530.patch
          6 kB
          Shalin Shekhar Mangar
        2. SOLR-6530.patch
          8 kB
          Shalin Shekhar Mangar
        3. SOLR-6530.patch
          8 kB
          Shalin Shekhar Mangar
        4. SOLR-6530.patch
          14 kB
          Shalin Shekhar Mangar
        5. SOLR-6530.patch
          9 kB
          Shalin Shekhar Mangar
        6. SOLR-6530.patch
          10 kB
          Shalin Shekhar Mangar
        7. SOLR-6530.patch
          9 kB
          Shalin Shekhar Mangar
        8. SOLR-6530.patch
          10 kB
          Shalin Shekhar Mangar

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shalin Shalin Shekhar Mangar
            shalin Shalin Shekhar Mangar
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment