Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7245

Temporary ZK election or connection loss should not stall indexing due to LIR

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      If there's a ZK election or connection loss, and the leader is unable to reach a replica, it currently would stall till the ZK connection is established, due to the LIR process. This shouldn't happen, and in some way regresses the work done in SOLR-5577.

      I will try get to this, but if someone races me to it, feel free to..

      1. SOLR-7245.patch
        16 kB
        Ramkumar Aiyengar
      2. SOLR-7245.patch
        15 kB
        Ramkumar Aiyengar

        Issue Links

          Activity

          Hide
          andyetitmoves Ramkumar Aiyengar added a comment -

          I need to think this through a bit more, but here's a starting patch which tries to have the update path make a best effort in updating ZK but not stalling if disconnected. Comments welcome..

          Show
          andyetitmoves Ramkumar Aiyengar added a comment - I need to think this through a bit more, but here's a starting patch which tries to have the update path make a best effort in updating ZK but not stalling if disconnected. Comments welcome..
          Hide
          andyetitmoves Ramkumar Aiyengar added a comment -

          Updated the patch to account for changes since. Tests seem to pass, but would appreciate a second look, I haven't dealt much with LIR till now.

          Is there an existing test anyone knows of which covers this area? BasicZkTest doesn't handle cloud setups. May be one of the chaos monkey tests?

          Show
          andyetitmoves Ramkumar Aiyengar added a comment - Updated the patch to account for changes since. Tests seem to pass, but would appreciate a second look, I haven't dealt much with LIR till now. Is there an existing test anyone knows of which covers this area? BasicZkTest doesn't handle cloud setups. May be one of the chaos monkey tests?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          The ChaosMonkey tests could hit such a case, but there is no real guarantee - those tests need love to make sure they hit everything we hope they hit.

          Timothy Potter, any chance you can take a gander at this?

          Show
          markrmiller@gmail.com Mark Miller added a comment - The ChaosMonkey tests could hit such a case, but there is no real guarantee - those tests need love to make sure they hit everything we hope they hit. Timothy Potter , any chance you can take a gander at this?
          Hide
          thelabdude Timothy Potter added a comment -

          Taking a look, thanks for the heads up.

          Show
          thelabdude Timothy Potter added a comment - Taking a look, thanks for the heads up.
          Hide
          thelabdude Timothy Potter added a comment -

          Patch looks good at first look, but wanted to make sure we coordinate with Shalin Shekhar Mangar on SOLR-7109

          Show
          thelabdude Timothy Potter added a comment - Patch looks good at first look, but wanted to make sure we coordinate with Shalin Shekhar Mangar on SOLR-7109
          Hide
          andyetitmoves Ramkumar Aiyengar added a comment -

          Patch looks good at first look, but wanted to make sure we coordinate with Shalin Shekhar Mangar on SOLR-7109

          Hopefully should be. I actually noticed this when I was reviewing the patch for that issue. But Shalin, let me know if otherwise..

          Show
          andyetitmoves Ramkumar Aiyengar added a comment - Patch looks good at first look, but wanted to make sure we coordinate with Shalin Shekhar Mangar on SOLR-7109 Hopefully should be. I actually noticed this when I was reviewing the patch for that issue. But Shalin, let me know if otherwise..
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1668274 from Ramkumar Aiyengar in branch 'dev/trunk'
          [ https://svn.apache.org/r1668274 ]

          SOLR-7245: Temporary ZK election or connection loss should not stall indexing due to LIR

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1668274 from Ramkumar Aiyengar in branch 'dev/trunk' [ https://svn.apache.org/r1668274 ] SOLR-7245 : Temporary ZK election or connection loss should not stall indexing due to LIR
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          +1 LGTM.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - +1 LGTM.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1668479 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1668479 ]

          SOLR-7245: Temporary ZK election or connection loss should not stall indexing due to LIR

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1668479 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1668479 ] SOLR-7245 : Temporary ZK election or connection loss should not stall indexing due to LIR
          Hide
          andyetitmoves Ramkumar Aiyengar added a comment -

          Thanks everyone. I am looking at adding ZK restarts to chaos monkey tests, but that can be a separate issue, it has wider coverage than just this issue..

          Show
          andyetitmoves Ramkumar Aiyengar added a comment - Thanks everyone. I am looking at adding ZK restarts to chaos monkey tests, but that can be a separate issue, it has wider coverage than just this issue..
          Hide
          thelabdude Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          thelabdude Timothy Potter added a comment - Bulk close after 5.1 release

            People

            • Assignee:
              andyetitmoves Ramkumar Aiyengar
              Reporter:
              andyetitmoves Ramkumar Aiyengar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development