Solr
  1. Solr
  2. SOLR-5311

Avoid registering replicas which are removed

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, 5.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      If a replica is removed from the clusterstate and if it comes back up it should not be allowed to register.

      Each core ,when comes up, checks if it was already registered and if yes is it still there. If not ,throw an error and do an unregister . If such a request come to overseer it should ignore such a core

      1. SOLR-5311.patch
        34 kB
        Noble Paul
      2. SOLR-5311.patch
        34 kB
        Noble Paul
      3. SOLR-5311.patch
        33 kB
        Noble Paul
      4. SOLR-5311.patch
        33 kB
        Noble Paul
      5. SOLR-5311.patch
        32 kB
        Noble Paul
      6. SOLR-5311.patch
        34 kB
        Noble Paul
      7. SOLR-5311.patch
        33 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          Also includes SOLR-5310

          Show
          Noble Paul added a comment - Also includes SOLR-5310
          Hide
          Noble Paul added a comment -

          patch updated to latest trunk

          Show
          Noble Paul added a comment - patch updated to latest trunk
          Hide
          Noble Paul added a comment - - edited

          all tests pass in trunk. please verify. this is a combined patch for SOLR-5310 too

          Show
          Noble Paul added a comment - - edited all tests pass in trunk. please verify. this is a combined patch for SOLR-5310 too
          Hide
          Noble Paul added a comment -

          updated patch w/ latest trunk after Mark Miller 's commits

          Show
          Noble Paul added a comment - updated patch w/ latest trunk after Mark Miller 's commits
          Hide
          Noble Paul added a comment -

          solving a race condition where the clusterstate says the node is active but it is actually unregistered

          Show
          Noble Paul added a comment - solving a race condition where the clusterstate says the node is active but it is actually unregistered
          Hide
          Noble Paul added a comment -

          handling exceptions

          Show
          Noble Paul added a comment - handling exceptions
          Hide
          Anshum Gupta added a comment -

          Can you rename DeleteInactiveReplica to DeleteInactiveReplicaTest?

          Show
          Anshum Gupta added a comment - Can you rename DeleteInactiveReplica to DeleteInactiveReplicaTest?
          Hide
          ASF subversion and git services added a comment -

          Commit 1536606 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1536606 ]

          SOLR-5311 - Avoid registering replicas which are removed , SOLR-5310 -Add a collection admin command to remove a replica

          Show
          ASF subversion and git services added a comment - Commit 1536606 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1536606 ] SOLR-5311 - Avoid registering replicas which are removed , SOLR-5310 -Add a collection admin command to remove a replica
          Hide
          ASF subversion and git services added a comment -

          Commit 1536608 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1536608 ]

          SOLR-5311 - Avoid registering replicas which are removed , SOLR-5310 -Add a collection admin command to remove a replica

          Show
          ASF subversion and git services added a comment - Commit 1536608 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1536608 ] SOLR-5311 - Avoid registering replicas which are removed , SOLR-5310 -Add a collection admin command to remove a replica
          Hide
          ASF subversion and git services added a comment -

          Commit 1537060 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1537060 ]

          SOLR-5311 tests were failing intermittently

          Show
          ASF subversion and git services added a comment - Commit 1537060 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1537060 ] SOLR-5311 tests were failing intermittently
          Hide
          ASF subversion and git services added a comment -

          Commit 1537061 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1537061 ]

          SOLR-5311 tests were failing intermittently

          Show
          ASF subversion and git services added a comment - Commit 1537061 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1537061 ] SOLR-5311 tests were failing intermittently
          Hide
          ASF subversion and git services added a comment -

          Commit 1537978 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1537978 ]

          SOLR-5311, invalid error message

          Show
          ASF subversion and git services added a comment - Commit 1537978 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1537978 ] SOLR-5311 , invalid error message
          Hide
          ASF subversion and git services added a comment -

          Commit 1537981 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1537981 ]

          SOLR-5311, invalid error message

          Show
          ASF subversion and git services added a comment - Commit 1537981 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1537981 ] SOLR-5311 , invalid error message
          Hide
          ASF subversion and git services added a comment -

          Commit 1538254 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1538254 ]

          SOLR-5311 trying to stop test failures

          Show
          ASF subversion and git services added a comment - Commit 1538254 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1538254 ] SOLR-5311 trying to stop test failures
          Hide
          ASF subversion and git services added a comment -

          Commit 1538255 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1538255 ]

          SOLR-5311 trying to stop test failures

          Show
          ASF subversion and git services added a comment - Commit 1538255 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538255 ] SOLR-5311 trying to stop test failures
          Hide
          Mark Miller added a comment -

          You can't count on the coreNodeName to determine if a core was removed or not. The whole thing is much tricker than this anyway - when cores are controlled by the user, you can't yet tell what should exist or not, just what states are published. Doing something better is more difficult than what is done with this patch.

          Show
          Mark Miller added a comment - You can't count on the coreNodeName to determine if a core was removed or not. The whole thing is much tricker than this anyway - when cores are controlled by the user, you can't yet tell what should exist or not, just what states are published. Doing something better is more difficult than what is done with this patch.
          Hide
          Noble Paul added a comment - - edited

          People managing the clusterstate explicitly is not really a requirement.

          They just need to create cores and the system should automatically assign coreNodeName .

          Show
          Noble Paul added a comment - - edited People managing the clusterstate explicitly is not really a requirement. They just need to create cores and the system should automatically assign coreNodeName .
          Hide
          Mark Miller added a comment -

          People managing the clusterstate explicitly is not really a requirement.

          Dude, that means nothing in this context. It's a released, supported feature, right now it is a requirement.

          Show
          Mark Miller added a comment - People managing the clusterstate explicitly is not really a requirement. Dude, that means nothing in this context. It's a released, supported feature, right now it is a requirement.
          Show
          Noble Paul added a comment - - edited Actually it was an undocumented feature. https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage- CREATE
          Hide
          Mark Miller added a comment -

          It's in CHANGES, it's been noted in user list emails, it had a JIRA issue - it's a fully supported feature. Features are not determined by what has been documented or not. Take a look at the code and the test code - this is an explicit, supported, released, feature.

          Show
          Mark Miller added a comment - It's in CHANGES, it's been noted in user list emails, it had a JIRA issue - it's a fully supported feature. Features are not determined by what has been documented or not. Take a look at the code and the test code - this is an explicit, supported, released, feature.
          Hide
          Mark Miller added a comment -

          And as I mention in SOLR-5580, your code actually relies on coreNodeName being a configurable setting in solr.xml! It wouldn't work without it. This just was not implemented correctly.

          Show
          Mark Miller added a comment - And as I mention in SOLR-5580 , your code actually relies on coreNodeName being a configurable setting in solr.xml! It wouldn't work without it. This just was not implemented correctly.
          Hide
          Noble Paul added a comment - - edited

          I'm not sure I understand you. I'm missing something I guess
          Thisproperty is persisted to solr.xml or core.properties . And that is how it works. Without that it can't work . I made the change so that it is persisted to solr.xml/core.properties

          Show
          Noble Paul added a comment - - edited I'm not sure I understand you. I'm missing something I guess Thisproperty is persisted to solr.xml or core.properties . And that is how it works. Without that it can't work . I made the change so that it is persisted to solr.xml/core.properties
          Hide
          Mark Miller added a comment - - edited

          This is part of a larger direction we have been working towards, which is essentially making ZooKeeper the truth.

          With the current SolrCloud, you cannot do this like you have. The Core Admin API, and pre configured cores, and the collections API all need to work in concert. That makes this approach a complete no go right now.

          The path I have been working towards with the Collections API is a new mode where everything is handled by the Collections API. In this case, it will not be valid for users to try and mess with things at a per core level.

          In this way, the cluster can truly match the truth in ZooKeeper because both the nodes and the Overseer can work together to keep the truth enforced. This includes things like being able to easily change the replication factor for a collection, or add a new host that automatically gets used depending on your settings. You should not have to address individual nodes to manage collections, nor should your replicationFactor stop mattering simply because you added another core via core admin. To me, this is the ultimate cloud situation. The system needs full control. We can add the ability to override or prefer certain things, but in general, we want to get to the point of optionally have the cluster mostly managed for you given some simple directectives. Of course, I think it should be implemented as a bunch of option features. You should also be easy to really lock things done unless you manage things manually.

          All of this requires we have a mode a user can decide to use (the collections api, perhaps with an option for back compat) so that we are in control of everything. We know when a collection is created and deleted - it won't be able to just pop back into existence.

          Until we have this special mode, the way that we had to build this, lots of historical reasons, we currently have to support what we have with pre configured cores and core admin and the collections api. This is silly form a user perspective though. It can all be done much nicer with just a collections API that doesn't have to be directed to any single node.

          Doing what you want to do in a back compat way is not some simple fix. We have been working towards this for a long time now - if you could just slap in a band aid and make it work like this, I would have done it a long time go.

          Show
          Mark Miller added a comment - - edited This is part of a larger direction we have been working towards, which is essentially making ZooKeeper the truth. With the current SolrCloud, you cannot do this like you have. The Core Admin API, and pre configured cores, and the collections API all need to work in concert. That makes this approach a complete no go right now. The path I have been working towards with the Collections API is a new mode where everything is handled by the Collections API. In this case, it will not be valid for users to try and mess with things at a per core level. In this way, the cluster can truly match the truth in ZooKeeper because both the nodes and the Overseer can work together to keep the truth enforced. This includes things like being able to easily change the replication factor for a collection, or add a new host that automatically gets used depending on your settings. You should not have to address individual nodes to manage collections, nor should your replicationFactor stop mattering simply because you added another core via core admin. To me, this is the ultimate cloud situation. The system needs full control. We can add the ability to override or prefer certain things, but in general, we want to get to the point of optionally have the cluster mostly managed for you given some simple directectives. Of course, I think it should be implemented as a bunch of option features. You should also be easy to really lock things done unless you manage things manually. All of this requires we have a mode a user can decide to use (the collections api, perhaps with an option for back compat) so that we are in control of everything. We know when a collection is created and deleted - it won't be able to just pop back into existence. Until we have this special mode, the way that we had to build this, lots of historical reasons, we currently have to support what we have with pre configured cores and core admin and the collections api. This is silly form a user perspective though. It can all be done much nicer with just a collections API that doesn't have to be directed to any single node. Doing what you want to do in a back compat way is not some simple fix. We have been working towards this for a long time now - if you could just slap in a band aid and make it work like this, I would have done it a long time go.
          Hide
          Mark Miller added a comment -

          This cannot be part of SolrCloud as is. It was rushed in without fully understanding the system and with terrible tests (I had to spend a lot of time working on them to fix them).

          This implementation breaks the proper functioning of SolrCloud. I've explained how it can be properly implemented above. There may be other options as well. The naive option here creates many bugs and must be removed for 4.6.1.

          Show
          Mark Miller added a comment - This cannot be part of SolrCloud as is. It was rushed in without fully understanding the system and with terrible tests (I had to spend a lot of time working on them to fix them). This implementation breaks the proper functioning of SolrCloud. I've explained how it can be properly implemented above. There may be other options as well. The naive option here creates many bugs and must be removed for 4.6.1.
          Hide
          Mark Miller added a comment -

          Shalin has captured previous discussions really well in this issue SOLR-5128. We need a new "ZooKeeper is Truth" mode. In that mode, we can start building this feature set out.

          Show
          Mark Miller added a comment - Shalin has captured previous discussions really well in this issue SOLR-5128 . We need a new "ZooKeeper is Truth" mode. In that mode, we can start building this feature set out.
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          David Smiley added a comment -

          This issues is marked as not resolved; yet I see commits and an entry in CHANGES.txt under 4.6.0. Noble Paul can you address this?

          Show
          David Smiley added a comment - This issues is marked as not resolved; yet I see commits and an entry in CHANGES.txt under 4.6.0. Noble Paul can you address this?
          Hide
          Noble Paul added a comment -

          forgot to mark it as resolved

          Show
          Noble Paul added a comment - forgot to mark it as resolved

            People

            • Assignee:
              Noble Paul
              Reporter:
              Noble Paul
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development