Solr
  1. Solr
  2. SOLR-5580

SOLR-5311 was done without full understanding of the system and must be reverted.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.6.1, 4.7, 5.0
    • Component/s: None
    • Labels:
    • Environment:

      OS:Red Hat Enterprise Linux Server release 6.4 (Santiago)
      Software:solr 4.6,
      jdk:OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)
      OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

      Description

      In class org.apache.solr.cloud.Overseer the Line 360:
      ---------------------------------------------------------------------
      if (sliceName !=null && collectionExists && !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
      Slice slice = state.getSlice(collection, sliceName);
      if (slice.getReplica(coreNodeName) == null)

      { log.info("core_deleted . Just return"); return state; }
      }
      ---------------------------------------------------------------------
      the slice needs to be checked null .because when create a new core with both explicite shard and coreNodeName, the state.getSlice(collection, sliceName) may return a null.So it needs to be checked ,or there will be an NullpointException
      ---------------------------------------------------------------------
      if (sliceName !=null && collectionExists && !"true".equals(state.getCollection(collection).getStr("autoCreated"))) {
      Slice slice = state.getSlice(collection, sliceName);
      if (slice != null && slice.getReplica(coreNodeName) == null) { log.info("core_deleted . Just return"); return state; }

      }
      ---------------------------------------------------------------------

        Issue Links

          Activity

          Hide
          Mark Miller added a comment - - edited

          Like I've said in SOLR-5311, autoCreated is illegal in the current system. Cores created by the Collections API must currently be the same as those created by the Core Admin API or preconfigured.

          autoCreated cannot exist yet, and when it can, it can't be used by default, it must be a new mode.

          Show
          Mark Miller added a comment - - edited Like I've said in SOLR-5311 , autoCreated is illegal in the current system. Cores created by the Collections API must currently be the same as those created by the Core Admin API or preconfigured. autoCreated cannot exist yet, and when it can, it can't be used by default, it must be a new mode.
          Hide
          Noble Paul added a comment -

          The 'autoCreated' attribute was missing .

          Show
          Noble Paul added a comment - The 'autoCreated' attribute was missing .
          Hide
          Noble Paul added a comment -

          You said that I need to create a slice first which I think is not convenient

          No I didn't say that. Not creating the slice was fine. I'm saying if you used the collection create api it would create the slices upfront.

          The last thing ,what is the purpose of the autocreated check

          The autoCreated flag was added to differentiate between a collection created using collections API and another one created automatically as a part of a core creation

          Oh, even there are no any information about the change.

          It was designed to work for your usecases . This is a regression . I need to test it with your usecase before I can comment further

          Show
          Noble Paul added a comment - You said that I need to create a slice first which I think is not convenient No I didn't say that. Not creating the slice was fine. I'm saying if you used the collection create api it would create the slices upfront. The last thing ,what is the purpose of the autocreated check The autoCreated flag was added to differentiate between a collection created using collections API and another one created automatically as a part of a core creation Oh, even there are no any information about the change. It was designed to work for your usecases . This is a regression . I need to test it with your usecase before I can comment further
          Hide
          YouPeng Yang added a comment -

          It needs to make clear that ,I create a collection with the new solr 4.6.
          We deploy the new solr 4.6 in tomcat, and just create a new collection and a new core with the same request URL.
          Collections can created when we create a new with no explicit corenodename , I can not find the autocreated property in the collection information on zookeeper.

          Convenient to manage the core,we would like to identify core with explicit custom shard name, core name ,collection and corecodename.

          You said that I need to create a slice first which I think is not convenient. As all things go well in the old version. I think with no defend your work maybe make the lose of solr flexibility and the smooth.

          Anyway , we do hope your work keeps the good feature of solr.
          Another suggestion ,if you make any change about the good charecter of solr ,please give some hints in the reference doc or CHANGES.

          The last thing ,what is the purpose of the autocreated check . The autocreated property can not be find in solr. I have checked the data in zookeeper.

          Oh, even there are no any information about the change.

          发自我的 iPhone

          在 2014-1-2,20:04,"Noble Paul (JIRA)" <jira@apache.org> 写道:

          Show
          YouPeng Yang added a comment - It needs to make clear that ,I create a collection with the new solr 4.6. We deploy the new solr 4.6 in tomcat, and just create a new collection and a new core with the same request URL. Collections can created when we create a new with no explicit corenodename , I can not find the autocreated property in the collection information on zookeeper. Convenient to manage the core,we would like to identify core with explicit custom shard name, core name ,collection and corecodename. You said that I need to create a slice first which I think is not convenient. As all things go well in the old version. I think with no defend your work maybe make the lose of solr flexibility and the smooth. Anyway , we do hope your work keeps the good feature of solr. Another suggestion ,if you make any change about the good charecter of solr ,please give some hints in the reference doc or CHANGES. The last thing ,what is the purpose of the autocreated check . The autocreated property can not be find in solr. I have checked the data in zookeeper. Oh, even there are no any information about the change. 发自我的 iPhone 在 2014-1-2,20:04,"Noble Paul (JIRA)" <jira@apache.org> 写道:
          Hide
          Noble Paul added a comment - - edited

          OK, Now I see it.
          (Not the solution , but the diagnosis)
          For your usecase, the collection was created with an older version of Solr. So the , property 'autoCreated' is absent from the collection.

          slice would have never been null in a normal case. if you created the system with numShards=x the system would have created shard1 to shardx right away. If it was a custom collection it would have expected you to create the slice first

          Do you have multiple replicas or do you just keep one copy of the index?
          How did naming the coreNodeName "Test1" help you instead of "core_node1" ? I'm assuming you somehow encode the data dir in the coreNodeName

          If you could encode the data location in coreName itself , it would still work , right?

          Show
          Noble Paul added a comment - - edited OK, Now I see it. (Not the solution , but the diagnosis) For your usecase, the collection was created with an older version of Solr. So the , property 'autoCreated' is absent from the collection. slice would have never been null in a normal case. if you created the system with numShards=x the system would have created shard1 to shardx right away. If it was a custom collection it would have expected you to create the slice first Do you have multiple replicas or do you just keep one copy of the index? How did naming the coreNodeName "Test1" help you instead of "core_node1" ? I'm assuming you somehow encode the data dir in the coreNodeName If you could encode the data location in coreName itself , it would still work , right?
          Hide
          YouPeng Yang added a comment -

          Hi

          The create core URL:
          http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1
          We do not have the collection created first. Just create a new collection when create the first core of a collection.
          We use the corecodename to make distinguish the cores. As I said,the auto created corenodename make it hard to identify which core the datadir belongs to . Especially ,there are a lot of cores.

          When a core crashed,as only as the data still exists, we can create a new core use the same datadir. It is important when you have a shared storage,such as NSF,even hdfs.

          The failover we suppose that we can force multi core to share the same data directory. At last , we give up after the some tests . Also, Mr mark miller suggests we should not do that.

          发自我的 iPhone

          在 2014-1-2,19:05,"Noble Paul (JIRA)" <jira@apache.org> 写道:

          Show
          YouPeng Yang added a comment - Hi The create core URL: http://10.7.23.122:8080/solr/admin/cores?action=CREATE&name=Test1&shard=Test&collection.configName=myconf&schema=schema.xml&config=solrconfigLocal_default.xml&collection=defaultcol&coreNodeName=Test1 We do not have the collection created first. Just create a new collection when create the first core of a collection. We use the corecodename to make distinguish the cores. As I said,the auto created corenodename make it hard to identify which core the datadir belongs to . Especially ,there are a lot of cores. When a core crashed,as only as the data still exists, we can create a new core use the same datadir. It is important when you have a shared storage,such as NSF,even hdfs. The failover we suppose that we can force multi core to share the same data directory. At last , we give up after the some tests . Also, Mr mark miller suggests we should not do that. 发自我的 iPhone 在 2014-1-2,19:05,"Noble Paul (JIRA)" <jira@apache.org> 写道:
          Hide
          Noble Paul added a comment -

          YouPeng Yang what is your normal workflow?

          • how do you create your collection? I mean what is a typical command to create a collection?
          • what is your typical coreNodeName? How does your specific coreNodeName help you failover quickly?
          • What will happen if a core comes up after you replace it with another? it should be replacing your new core with the old core in that case. How do you deal with that?

          How about just creating a new core w/o a coreNodeName (only collection name and slice name) ? In that case you never need to worry about coreNodeName at all

          Show
          Noble Paul added a comment - YouPeng Yang what is your normal workflow? how do you create your collection? I mean what is a typical command to create a collection? what is your typical coreNodeName? How does your specific coreNodeName help you failover quickly? What will happen if a core comes up after you replace it with another? it should be replacing your new core with the old core in that case. How do you deal with that? How about just creating a new core w/o a coreNodeName (only collection name and slice name) ? In that case you never need to worry about coreNodeName at all
          Hide
          YouPeng Yang added a comment -

          By the way. We use the corenodename since the solr 4.4. However things go wrong when we decide to upgrade to 4.6.
          It troubles me for a long time. No one gives any help,neither the reference doc nor the CHANGES of solr 4.6 do no make any announcement about the change ,which is bad for users.

          I even am not clear about what your autocreated check aim at.

          发自我的 iPhone

          在 2014-1-1,22:40,"Noble Paul (JIRA)" <jira@apache.org> 写道:

          Show
          YouPeng Yang added a comment - By the way. We use the corenodename since the solr 4.4. However things go wrong when we decide to upgrade to 4.6. It troubles me for a long time. No one gives any help,neither the reference doc nor the CHANGES of solr 4.6 do no make any announcement about the change ,which is bad for users. I even am not clear about what your autocreated check aim at. 发自我的 iPhone 在 2014-1-1,22:40,"Noble Paul (JIRA)" <jira@apache.org> 写道:
          Hide
          YouPeng Yang added a comment -

          Hi
          The auto created corenodename can not be easy distinguished. We create many cores whose datadir is on hdfs. We need the clear corenodename to make failover quickly when a core was down. .

          发自我的 iPhone

          在 2014-1-1,22:40,"Noble Paul (JIRA)" <jira@apache.org> 写道:

          Show
          YouPeng Yang added a comment - Hi The auto created corenodename can not be easy distinguished. We create many cores whose datadir is on hdfs. We need the clear corenodename to make failover quickly when a core was down. . 发自我的 iPhone 在 2014-1-1,22:40,"Noble Paul (JIRA)" <jira@apache.org> 写道:
          Hide
          Noble Paul added a comment -

          Mark Miller

          I wish to know where all coreNodeName is used by end user

          1. Replace an existing core with another . It should be working with SOLR-5311
          2. create a completely new core with a user-specified coreNodeName . It would fail with SOLR-5311
          3. Is there any other case I am missing?
          Show
          Noble Paul added a comment - Mark Miller I wish to know where all coreNodeName is used by end user Replace an existing core with another . It should be working with SOLR-5311 create a completely new core with a user-specified coreNodeName . It would fail with SOLR-5311 Is there any other case I am missing?
          Hide
          Noble Paul added a comment -

          I would like to know the what the user was trying to do in this case YouPeng Yang ? where you trying to create a new core for a coreNodeName that didn't exist? was the slice present? or were you trying to create that too?

          Show
          Noble Paul added a comment - I would like to know the what the user was trying to do in this case YouPeng Yang ? where you trying to create a new core for a coreNodeName that didn't exist? was the slice present? or were you trying to create that too?
          Hide
          Noble Paul added a comment -

          Cores can stick around, and simply don't get updates into the cluster state?...

          What do you mean? Cores are removed if they were removed explicitly from clusterstate and they come up again

          Show
          Noble Paul added a comment - Cores can stick around, and simply don't get updates into the cluster state?... What do you mean? Cores are removed if they were removed explicitly from clusterstate and they come up again
          Hide
          Noble Paul added a comment -

          I think, the core of the issue is whether to have an option of removing a core from the cluster without touching to the node itself?

          If that option can be provided can it be implemented without introducing backward compatibility issues?

          I believe , in a reasonably big cluster, nobody would want to deal with individual nodes to manage the cluster . So , they would like to go through the API to achieve almost everything. Do we wish to introduce a new mode for these things? Or do we want it to be the default mode?

          If we can implement the new mode without breaking existing features , do we need to introduce a new mode? the reason I'm asking these questions is because users tend to use the default mode assuming that is the best. a new mode is always a problem because that is kind of forking the system. That will be a support nightmare

          Before getting into Implementation details , we need to build consensus on these things first .

          Show
          Noble Paul added a comment - I think, the core of the issue is whether to have an option of removing a core from the cluster without touching to the node itself? If that option can be provided can it be implemented without introducing backward compatibility issues? I believe , in a reasonably big cluster, nobody would want to deal with individual nodes to manage the cluster . So , they would like to go through the API to achieve almost everything. Do we wish to introduce a new mode for these things? Or do we want it to be the default mode? If we can implement the new mode without breaking existing features , do we need to introduce a new mode? the reason I'm asking these questions is because users tend to use the default mode assuming that is the best. a new mode is always a problem because that is kind of forking the system. That will be a support nightmare Before getting into Implementation details , we need to build consensus on these things first .
          Hide
          Mark Miller added a comment -

          The NPE was just a surface bug that a user hit.

          The actual bug here is:

          • You cannot count on coreNodeName like SOLR-5311 - totally illegal.
          • You cannot use this autoCreated property. Totally illegal in the current system.
          • Cores can stick around, and simply don't get updates into the cluster state? A lot of ugliness around this...and...
          • In general, this is a hack, half hearted attempt at making ZooKeeper the truth.

          All these issues lead to many bugs.

          There are non buggy, full solutions to making ZooKeeper the truth, and if that is what you are after, we have to do it in a way that is consistent with the current system.

          Show
          Mark Miller added a comment - The NPE was just a surface bug that a user hit. The actual bug here is: You cannot count on coreNodeName like SOLR-5311 - totally illegal. You cannot use this autoCreated property. Totally illegal in the current system. Cores can stick around, and simply don't get updates into the cluster state? A lot of ugliness around this...and... In general, this is a hack, half hearted attempt at making ZooKeeper the truth. All these issues lead to many bugs. There are non buggy, full solutions to making ZooKeeper the truth, and if that is what you are after, we have to do it in a way that is consistent with the current system.
          Hide
          Mark Miller added a comment -

          No dude, the description of this issue is just not correct.

          You broke SolrCloud and I won't allow that code back in. -1, it's broken. In 4.6.1 we need to remove it.

          Show
          Mark Miller added a comment - No dude, the description of this issue is just not correct. You broke SolrCloud and I won't allow that code back in. -1, it's broken. In 4.6.1 we need to remove it.
          Hide
          Noble Paul added a comment - - edited

          Mark,
          This issue is about an NPE . If we fix this NPE we should be good to go. The wholesale reverting of SOLR-5311 is beyond the scope of this issue. SOLR-5311 has already been released in 4.6 and reverting that code would be a regression in 4.6.1.
          We can fix the NPE right away and make the core admin API work and this can be closed. I can take care of that right away. If we need a discussion over how the implementation of SOLR-5311 should be, it can be done after reopening SOLR-5311

          Show
          Noble Paul added a comment - - edited Mark, This issue is about an NPE . If we fix this NPE we should be good to go. The wholesale reverting of SOLR-5311 is beyond the scope of this issue. SOLR-5311 has already been released in 4.6 and reverting that code would be a regression in 4.6.1. We can fix the NPE right away and make the core admin API work and this can be closed. I can take care of that right away. If we need a discussion over how the implementation of SOLR-5311 should be, it can be done after reopening SOLR-5311
          Hide
          Mark Miller added a comment -

          What needs to be done before I can get this in?

          I'm happy to discuss that in the reopen JIRA issue around this feature.

          Show
          Mark Miller added a comment - What needs to be done before I can get this in? I'm happy to discuss that in the reopen JIRA issue around this feature.
          Hide
          Mark Miller added a comment -

          Make another issue please. This is a bug that is assigned to me that I have fixed. Either work on the reopened original issue or make a new one.

          The work necessary to do this feature properly is fairly substantial, and I'll be releasing this in 4.6.1 any time now.

          Show
          Mark Miller added a comment - Make another issue please. This is a bug that is assigned to me that I have fixed. Either work on the reopened original issue or make a new one. The work necessary to do this feature properly is fairly substantial, and I'll be releasing this in 4.6.1 any time now.
          Hide
          Noble Paul added a comment -

          My take on this.

          Let's fix this in a backcompat way so that deletereplica is implemented consistently

          Show
          Noble Paul added a comment - My take on this. Let's fix this in a backcompat way so that deletereplica is implemented consistently
          Hide
          Noble Paul added a comment -

          the problem is that it is ahead of it's time

          What needs to be done before I can get this in?

          Yes, we need this - but we have to do it right.

          If we do it right how should it look like? We can't keep it an open item

          Show
          Noble Paul added a comment - the problem is that it is ahead of it's time What needs to be done before I can get this in? Yes, we need this - but we have to do it right. If we do it right how should it look like? We can't keep it an open item
          Hide
          Mark Miller added a comment -

          The Core Admin also lets you set an initial coreNodeName! And it has the same use case.

          But here the question is not whether we should use core admin or not. I feel that the API to add a replica on a particular node would be pretty ugly on collections API and it looks more elegant on core admin API.

          That needs a discussion in it's own issue. A lot of things make more sense from a collection perspective (eg, I want to change the replicationFactor). And we need a way to easily distinguish between a operations meant for a Collections API collection and those that are not. How we implement that is still open, but I would initially lean towards making the collections api powerful enough not to need the core admin and then ban it on collection api collections.

          I really didn't want to have a half broken deletereplica API

          We both want the same functionality - the problem is that it is ahead of it's time. The way it has been implemented does not jive with the current system. Yes, we need this - but we have to do it right.

          Show
          Mark Miller added a comment - The Core Admin also lets you set an initial coreNodeName! And it has the same use case. But here the question is not whether we should use core admin or not. I feel that the API to add a replica on a particular node would be pretty ugly on collections API and it looks more elegant on core admin API. That needs a discussion in it's own issue. A lot of things make more sense from a collection perspective (eg, I want to change the replicationFactor). And we need a way to easily distinguish between a operations meant for a Collections API collection and those that are not. How we implement that is still open, but I would initially lean towards making the collections api powerful enough not to need the core admin and then ban it on collection api collections. I really didn't want to have a half broken deletereplica API We both want the same functionality - the problem is that it is ahead of it's time. The way it has been implemented does not jive with the current system. Yes, we need this - but we have to do it right.
          Hide
          Noble Paul added a comment -

          It is OK to use the cor admin API sometimes . It is fine. But editing solr.xml or adding system properties at node startup is something we should actively strive to avoid.

          But here the question is not whether we should use core admin or not. I feel that the API to add a replica on a particular node would be pretty ugly on collections API and it looks more elegant on core admin API.

          I really didn't want to have a half broken deletereplica API

          Show
          Noble Paul added a comment - It is OK to use the cor admin API sometimes . It is fine. But editing solr.xml or adding system properties at node startup is something we should actively strive to avoid. But here the question is not whether we should use core admin or not. I feel that the API to add a replica on a particular node would be pretty ugly on collections API and it looks more elegant on core admin API. I really didn't want to have a half broken deletereplica API
          Hide
          Mark Miller added a comment -

          this feature is a problem.

          Or at least the way it was implemented.

          Show
          Mark Miller added a comment - this feature is a problem. Or at least the way it was implemented.
          Hide
          Mark Miller added a comment -

          The collections API is not ready for that functionality - you still have to use the collections API in concert with the core admin API to do many things. Until the Collections API can do everything without the core admin API, this feature is a problem.

          Show
          Mark Miller added a comment - The collections API is not ready for that functionality - you still have to use the collections API in concert with the core admin API to do many things. Until the Collections API can do everything without the core admin API, this feature is a problem.
          Hide
          Noble Paul added a comment -

          I guess you missed something.

          If you added the core through solr.xml it is for a collection that is 'autoCreated' . So I enabled this feature only for collections created through collections API. For others the legacy behavior is not altered .So , they are not really editable if the cores are created through API.

          Show
          Noble Paul added a comment - I guess you missed something. If you added the core through solr.xml it is for a collection that is 'autoCreated' . So I enabled this feature only for collections created through collections API. For others the legacy behavior is not altered .So , they are not really editable if the cores are created through API.
          Hide
          Mark Miller added a comment - - edited

          I agree with you . But what is the harm in making the coreNodeName editable? If he is sure about what he is doing , it will work. If he is fiddling with stuff any of those properties can screw up his system. I really can't see the difference between someone editing collection, shard, or coreNodeName

          It is currently editable. The way you did things, you would need it to not be editable. So no way it should be in solr.xml. You can configure a new core in solr.xml and set a shard id and a collection - but now you are going to say you can't set some of those settings (coreNodeName), we are just storing them there internally and it's not for you to preconfigure or edit? Now we have some special config in solr.xml that is not for users when everything else is? No way, -1.

          If and when we are ready , how do you plan to make the switch? Can we introduce the switch right away, so that the users who want the new way can go that way.

          We will need to support non Collections API for some time. For the Collections API, its just going to become more capable - if it requires it, some of those capabilities will require turning on new options if you are using old config. It will all be pretty easy, other than when and if we drop the non Collections API support entirely.

          Show
          Mark Miller added a comment - - edited I agree with you . But what is the harm in making the coreNodeName editable? If he is sure about what he is doing , it will work. If he is fiddling with stuff any of those properties can screw up his system. I really can't see the difference between someone editing collection, shard, or coreNodeName It is currently editable. The way you did things, you would need it to not be editable. So no way it should be in solr.xml. You can configure a new core in solr.xml and set a shard id and a collection - but now you are going to say you can't set some of those settings (coreNodeName), we are just storing them there internally and it's not for you to preconfigure or edit? Now we have some special config in solr.xml that is not for users when everything else is? No way, -1. If and when we are ready , how do you plan to make the switch? Can we introduce the switch right away, so that the users who want the new way can go that way. We will need to support non Collections API for some time. For the Collections API, its just going to become more capable - if it requires it, some of those capabilities will require turning on new options if you are using old config. It will all be pretty easy, other than when and if we drop the non Collections API support entirely.
          Hide
          Noble Paul added a comment -

          No...those are user configurable and should be...a user can always enter bad settings...

          I agree with you . But what is the harm in making the coreNodeName editable? If he is sure about what he is doing , it will work. If he is fiddling with stuff any of those properties can screw up his system. I really can't see the difference between someone editing collection, shard, or coreNodeName

          No. That is the goal of finishing the Collections API.

          If and when we are ready , how do you plan to make the switch? Can we introduce the switch right away, so that the users who want the new way can go that way.

          Show
          Noble Paul added a comment - No...those are user configurable and should be...a user can always enter bad settings... I agree with you . But what is the harm in making the coreNodeName editable? If he is sure about what he is doing , it will work. If he is fiddling with stuff any of those properties can screw up his system. I really can't see the difference between someone editing collection, shard, or coreNodeName No. That is the goal of finishing the Collections API. If and when we are ready , how do you plan to make the switch? Can we introduce the switch right away, so that the users who want the new way can go that way.
          Hide
          Mark Miller added a comment -

          Isn't this same?

          No...those are user configurable and should be...a user can always enter bad settings...

          But we would expect the data in ZK to be the truth , right?

          No. That is the goal of finishing the Collections API. For the non Collections API, the truth is not currently ZK and it's not easy to make it so 100% - which is why I keep mentioning the collections API ...

          Show
          Mark Miller added a comment - Isn't this same? No...those are user configurable and should be...a user can always enter bad settings... But we would expect the data in ZK to be the truth , right? No. That is the goal of finishing the Collections API. For the non Collections API, the truth is not currently ZK and it's not easy to make it so 100% - which is why I keep mentioning the collections API ...
          Hide
          Noble Paul added a comment -

          A setting in solr.xml is user configurable by definition.

          Yes, but that is same for every other property like, collection, shard etc . Isn't this same? anyone editing those properties would end up screwing the cluster itself. But we would expect the data in ZK to be the truth , right?

          Show
          Noble Paul added a comment - A setting in solr.xml is user configurable by definition. Yes, but that is same for every other property like, collection, shard etc . Isn't this same? anyone editing those properties would end up screwing the cluster itself. But we would expect the data in ZK to be the truth , right?
          Hide
          Mark Miller added a comment -

          A setting in solr.xml is user configurable by definition.

          Show
          Mark Miller added a comment - A setting in solr.xml is user configurable by definition.
          Hide
          Noble Paul added a comment -

          I didn't quite get the solr.xml part. It is persisted , right?

          I was essentially trying to have to modes, one is a collection which got 'autoCreated' and the other is one which got created thru API ? are you saying se need a 3rd mode?

          Show
          Noble Paul added a comment - I didn't quite get the solr.xml part. It is persisted , right? I was essentially trying to have to modes, one is a collection which got 'autoCreated' and the other is one which got created thru API ? are you saying se need a 3rd mode?
          Hide
          Mark Miller added a comment - - edited

          was down was to avoid having another replica

          Yes, I know why you did it, I'm saying there are many problems with how you went about it. The entire reliance on the coreNodeName is incorrect. Like I said, even if you had said users can't specify it and ignored back compat, it can't be in solr.xml then. Your code only worked because it is a user setting that is persisted in solr.xml.

          Your goal is fine, the implementation is all wrong. While it could be corrected, I think it's much better to push on the collections api, rather than complicate what is now a simple mode that will eventually either become second class or be removed. We should not spend a lot of time making it do what it was not designed for from the start. The plan has always been the Collections API for this type of behavior.

          Show
          Mark Miller added a comment - - edited was down was to avoid having another replica Yes, I know why you did it, I'm saying there are many problems with how you went about it. The entire reliance on the coreNodeName is incorrect. Like I said, even if you had said users can't specify it and ignored back compat, it can't be in solr.xml then. Your code only worked because it is a user setting that is persisted in solr.xml. Your goal is fine, the implementation is all wrong. While it could be corrected, I think it's much better to push on the collections api, rather than complicate what is now a simple mode that will eventually either become second class or be removed. We should not spend a lot of time making it do what it was not designed for from the start. The plan has always been the Collections API for this type of behavior.
          Hide
          Noble Paul added a comment -

          Register the core. If you don't use the Collections API, the behavior is simple and straightforward.

          The problem is , the deletereplica did not help. The collections api should get importance than cores going up and down. The reason why I called the deletereplica when the core was down was to avoid having another replica (and to clean up clusterstate) . One of my purpose is defeated

          For a much better experience, we should finish the collections api

          I completely agree with you. We are pursuing them one by one. One day I want collections API to be the definitive way to achieve almost anything on SolrCloud. So I want collection API's to take precedence over others

          I think the user has a problem because we didn't document this new behavior , mea culpa

          Show
          Noble Paul added a comment - Register the core. If you don't use the Collections API, the behavior is simple and straightforward. The problem is , the deletereplica did not help. The collections api should get importance than cores going up and down. The reason why I called the deletereplica when the core was down was to avoid having another replica (and to clean up clusterstate) . One of my purpose is defeated For a much better experience, we should finish the collections api I completely agree with you. We are pursuing them one by one. One day I want collections API to be the definitive way to achieve almost anything on SolrCloud. So I want collection API's to take precedence over others I think the user has a problem because we didn't document this new behavior , mea culpa
          Hide
          Mark Miller added a comment -

          The problem is that, If I removed a replica from clusterstate and then the core came up, What is the desired behavior? register the core or unload the core?

          Register the core. If you don't use the Collections API, the behavior is simple and straightforward. For a much better experience, we should finish the collections api, so we can deprecate dealing with individual cores.

          The use case is, A node went down and I don't need to replace it with another node because I have enough replicas. Now I need to clean up the clusterstate .Currently there is no way to achieve it

          That's why SOLR-5310 still makes sense and should still work fine for this case...

          Show
          Mark Miller added a comment - The problem is that, If I removed a replica from clusterstate and then the core came up, What is the desired behavior? register the core or unload the core? Register the core. If you don't use the Collections API, the behavior is simple and straightforward. For a much better experience, we should finish the collections api, so we can deprecate dealing with individual cores. The use case is, A node went down and I don't need to replace it with another node because I have enough replicas. Now I need to clean up the clusterstate .Currently there is no way to achieve it That's why SOLR-5310 still makes sense and should still work fine for this case...
          Hide
          Mark Miller added a comment -

          If you do want to create some marker that gets saved out so that a SolrCore can track if it had been removed or not, you would need to do it to a spot that is not a user editable param...

          The only reason the previous scheme worked at all is because the coreNodeName is user editable and is saved out to solr.xml - into a user overrideable field.

          You would need to save that information to a system only storage location.

          Show
          Mark Miller added a comment - If you do want to create some marker that gets saved out so that a SolrCore can track if it had been removed or not, you would need to do it to a spot that is not a user editable param... The only reason the previous scheme worked at all is because the coreNodeName is user editable and is saved out to solr.xml - into a user overrideable field. You would need to save that information to a system only storage location.
          Hide
          Noble Paul added a comment - - edited

          It's just a matter of opinion ....

          Yes, you are right. the point is , I think people don't really have to think that replicas have a name , they just need to have enough replicas for a given slice.

          I don't believe that - it should be fine to still have a command that removes a replica from the clusterstate.

          The problem is that, If I removed a replica from clusterstate and then the core came up, What is the desired behavior? register the core or unload the core?

          Then perhaps they should not be implemented and this energy is better spent working towards a fully functional Collections API.

          SOLR-5310 is a step towards a fully functional collections API. The use case is, A node went down and I don't need to replace it with another node because I have enough replicas. Now I need to clean up the clusterstate .Currently there is no way to achieve it

          uh...yes it is used...

          I',m sorry, I meant it is not used anywhere BY THE USER

          My intent was not to break backcompat . But it happened because I didn't know this particular usecase. Let us see what is the best solution for this? Let us answer a few questions to ourselves

          • If we are designing the system today which way would you choose? a deletereplica API or a create core API to 'replace' a core. So what is the way forward?
          • implement deletereplica API , but make the clusterstate slightly ugly for backcompat
          Show
          Noble Paul added a comment - - edited It's just a matter of opinion .... Yes, you are right. the point is , I think people don't really have to think that replicas have a name , they just need to have enough replicas for a given slice. I don't believe that - it should be fine to still have a command that removes a replica from the clusterstate. The problem is that, If I removed a replica from clusterstate and then the core came up, What is the desired behavior? register the core or unload the core? Then perhaps they should not be implemented and this energy is better spent working towards a fully functional Collections API. SOLR-5310 is a step towards a fully functional collections API. The use case is, A node went down and I don't need to replace it with another node because I have enough replicas. Now I need to clean up the clusterstate .Currently there is no way to achieve it uh...yes it is used... I',m sorry, I meant it is not used anywhere BY THE USER My intent was not to break backcompat . But it happened because I didn't know this particular usecase. Let us see what is the best solution for this? Let us answer a few questions to ourselves If we are designing the system today which way would you choose? a deletereplica API or a create core API to 'replace' a core. So what is the way forward? implement deletereplica API , but make the clusterstate slightly ugly for backcompat
          Hide
          Mark Miller added a comment -

          I also thing the approach is not correct in general - even if you didn't allow a user to specify the coreNodeName, you can't 100% safely use that information to determine if a core should exist or not.

          The correct approach is to finish the Collections API, which can know if a collection should exist or not.

          Show
          Mark Miller added a comment - I also thing the approach is not correct in general - even if you didn't allow a user to specify the coreNodeName, you can't 100% safely use that information to determine if a core should exist or not. The correct approach is to finish the Collections API, which can know if a collection should exist or not.
          Hide
          Mark Miller added a comment -

          Do we need a way to 'replace' a replica.

          It's just a matter of opinion on whether it's a hack to have to remove a replica and then add it or let something take over for it. Either method seems like it should reasonably work if you want it to.

          In the longer term, the Collections API should be reasonable for all of this stuff, and eventually we won't necessarily support manual core manipulation. Until we do, I think this is a good feature.

          SOLR-5310 and SOLR-5311 both must be removed together. you can't remove one and leave the other one

          I don't believe that - it should be fine to still have a command that removes a replica from the clusterstate.

          I don't see a clean way to implement SOLR-5310 and SOLR-5311 without making the clusterstate ugly

          Then perhaps they should not be implemented and this energy is better spent working towards a fully functional Collections API.

          The coreNodeName is not used anywhere , so is it important (or even desirable ) to have custom coreNodeName ?

          uh...yes it is used...

          Show
          Mark Miller added a comment - Do we need a way to 'replace' a replica. It's just a matter of opinion on whether it's a hack to have to remove a replica and then add it or let something take over for it. Either method seems like it should reasonably work if you want it to. In the longer term, the Collections API should be reasonable for all of this stuff, and eventually we won't necessarily support manual core manipulation. Until we do, I think this is a good feature. SOLR-5310 and SOLR-5311 both must be removed together. you can't remove one and leave the other one I don't believe that - it should be fine to still have a command that removes a replica from the clusterstate. I don't see a clean way to implement SOLR-5310 and SOLR-5311 without making the clusterstate ugly Then perhaps they should not be implemented and this energy is better spent working towards a fully functional Collections API. The coreNodeName is not used anywhere , so is it important (or even desirable ) to have custom coreNodeName ? uh...yes it is used...
          Hide
          Noble Paul added a comment -

          I didn't know people were hacking the system this way. I thought nobody used it

          Having said that, let us see what is the solution

          • Do we need a way to 'replace' a replica. it is just not replacing anything and just creating a new node. It was a hack for not having an ability to delete replicas from clusterstate
          • I think the proper way should be to create new replicas and they can choose to clean up old ones using the API
          • SOLR-5310 and SOLR-5311 both must be removed together. you can't remove one and leave the other one
          • I don't see a clean way to implement SOLR-5310 and SOLR-5311 without making the clusterstate ugly
          • The coreNodeName is not used anywhere , so is it important (or even desirable ) to have custom coreNodeName ?
          Show
          Noble Paul added a comment - I didn't know people were hacking the system this way. I thought nobody used it Having said that, let us see what is the solution Do we need a way to 'replace' a replica. it is just not replacing anything and just creating a new node. It was a hack for not having an ability to delete replicas from clusterstate I think the proper way should be to create new replicas and they can choose to clean up old ones using the API SOLR-5310 and SOLR-5311 both must be removed together. you can't remove one and leave the other one I don't see a clean way to implement SOLR-5310 and SOLR-5311 without making the clusterstate ugly The coreNodeName is not used anywhere , so is it important (or even desirable ) to have custom coreNodeName ?
          Hide
          Mark Miller added a comment -

          You can currently accomplish two things with this feature:

          1. coreNodeNames that suit your taste rather than the generic ones we make up. Not very important, but something users can already be doing.
          2. coreNodeName is the identity in the clusterstate - so you can make a new SolrCore take over for an existing state. Like I described above.

          If we need to do something that conflicts with this feature, we either need to write code to let both things coexist, or we need to deprecate and wait till 5 to remove it or something.

          Show
          Mark Miller added a comment - You can currently accomplish two things with this feature: 1. coreNodeNames that suit your taste rather than the generic ones we make up. Not very important, but something users can already be doing. 2. coreNodeName is the identity in the clusterstate - so you can make a new SolrCore take over for an existing state. Like I described above. If we need to do something that conflicts with this feature, we either need to write code to let both things coexist, or we need to deprecate and wait till 5 to remove it or something.
          Hide
          Noble Paul added a comment -

          Mark, can you explain the difference between creating a core with the combination of collection/slice and collection/slice/coreNodeName ? what is the difference in the behavior ?

          Show
          Noble Paul added a comment - Mark, can you explain the difference between creating a core with the combination of collection/slice and collection/slice/coreNodeName ? what is the difference in the behavior ?
          Hide
          Mark Miller added a comment -

          If you have 2 replicas and one machine blows up, you now have two replicas registered and one running. If you buy a new machine, you can now tell it to take over for the machine that blew up rather than having a replica in the state that will never come back or having to remove it manually. This is a feature that cannot simply be removed unceremoniously in a point release.

          User coreNodeNames is a current, supported feature...

          Show
          Mark Miller added a comment - If you have 2 replicas and one machine blows up, you now have two replicas registered and one running. If you buy a new machine, you can now tell it to take over for the machine that blew up rather than having a replica in the state that will never come back or having to remove it manually. This is a feature that cannot simply be removed unceremoniously in a point release. User coreNodeNames is a current, supported feature...
          Hide
          Noble Paul added a comment -

          I don't understand where is coreNodeName used? If a new core is created with same collection/slice it will join that slice. the corNodeName is not used internally

          Show
          Noble Paul added a comment - I don't understand where is coreNodeName used? If a new core is created with same collection/slice it will join that slice. the corNodeName is not used internally
          Hide
          Mark Miller added a comment -

          SOLR-5311 is the one to reopen.

          Show
          Mark Miller added a comment - SOLR-5311 is the one to reopen.
          Hide
          Mark Miller added a comment -

          1. Even if we need to eliminate this, you cannot just eliminate a large feature by introducing buggy code that doesn't work with it!

          2. -1 on eliminating this! Custom coreNodeName is an explicit and important feature! This is how users can have a SolrCore take over for a replica that has gone away, or move it to a new machine.

          The feature you added does not make sense with the current system. I suggest trying to implement something else correctly in another issue, but as it is, it's just one big bug with the current system design.

          Show
          Mark Miller added a comment - 1. Even if we need to eliminate this, you cannot just eliminate a large feature by introducing buggy code that doesn't work with it! 2. -1 on eliminating this! Custom coreNodeName is an explicit and important feature! This is how users can have a SolrCore take over for a replica that has gone away, or move it to a new machine. The feature you added does not make sense with the current system. I suggest trying to implement something else correctly in another issue, but as it is, it's just one big bug with the current system design.
          Hide
          Noble Paul added a comment -

          We need to eliminate this. Otherwise there is no other way to implement SLR-5311 . Why do we need this particular usecase?

          Show
          Noble Paul added a comment - We need to eliminate this. Otherwise there is no other way to implement SLR-5311 . Why do we need this particular usecase?
          Hide
          Mark Miller added a comment -

          If you want to try and introduce this feature go back to the original issue. But please don't break this yet again. You can't do what you did, not even close.

          Show
          Mark Miller added a comment - If you want to try and introduce this feature go back to the original issue. But please don't break this yet again. You can't do what you did, not even close.
          Hide
          Mark Miller added a comment -

          But why would anyone create a core with explicit coreNodeName

          It's a supported feature and you can't just eliminate it.

          Show
          Mark Miller added a comment - But why would anyone create a core with explicit coreNodeName It's a supported feature and you can't just eliminate it.
          Hide
          Noble Paul added a comment - - edited

          Yes , this is expected to fail.
          But why would anyone create a core with explicit coreNodeName. I wanted that case to be eliminated.
          The idea is to only create the coreNodeName at the Overseer

          Show
          Noble Paul added a comment - - edited Yes , this is expected to fail. But why would anyone create a core with explicit coreNodeName. I wanted that case to be eliminated. The idea is to only create the coreNodeName at the Overseer
          Hide
          ASF subversion and git services added a comment -

          Commit 1554140 from Mark Miller in branch 'dev/branches/lucene_solr_4_6'
          [ https://svn.apache.org/r1554140 ]

          SOLR-5580: Remove more code that is not legal for determining if a core is new or not.

          Show
          ASF subversion and git services added a comment - Commit 1554140 from Mark Miller in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1554140 ] SOLR-5580 : Remove more code that is not legal for determining if a core is new or not.
          Hide
          ASF subversion and git services added a comment -

          Commit 1554139 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1554139 ]

          SOLR-5580: Remove more code that is not legal for determining if a core is new or not.

          Show
          ASF subversion and git services added a comment - Commit 1554139 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1554139 ] SOLR-5580 : Remove more code that is not legal for determining if a core is new or not.
          Hide
          ASF subversion and git services added a comment -

          Commit 1554138 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1554138 ]

          SOLR-5580: Remove more code that is not legal for determining if a core is new or not.

          Show
          ASF subversion and git services added a comment - Commit 1554138 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1554138 ] SOLR-5580 : Remove more code that is not legal for determining if a core is new or not.
          Hide
          Mark Miller added a comment -

          Thanks YouPeng!

          Show
          Mark Miller added a comment - Thanks YouPeng!
          Hide
          ASF subversion and git services added a comment -

          Commit 1553969 from Mark Miller in branch 'dev/branches/lucene_solr_4_6'
          [ https://svn.apache.org/r1553969 ]

          SOLR-5580: NPE when creating a core with both explicit shard and coreNodeName.

          Show
          ASF subversion and git services added a comment - Commit 1553969 from Mark Miller in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1553969 ] SOLR-5580 : NPE when creating a core with both explicit shard and coreNodeName.
          Hide
          ASF subversion and git services added a comment -

          Commit 1553968 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1553968 ]

          SOLR-5580: NPE when creating a core with both explicit shard and coreNodeName.

          Show
          ASF subversion and git services added a comment - Commit 1553968 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1553968 ] SOLR-5580 : NPE when creating a core with both explicit shard and coreNodeName.
          Hide
          ASF subversion and git services added a comment -

          Commit 1553967 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1553967 ]

          SOLR-5580: NPE when creating a core with both explicit shard and coreNodeName.

          Show
          ASF subversion and git services added a comment - Commit 1553967 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1553967 ] SOLR-5580 : NPE when creating a core with both explicit shard and coreNodeName.
          Hide
          Mark Miller added a comment -

          That whole bit looks silly and dangerous to me. I'm not sure this is the right way to do this check and it seems we should just remove that whole attempt to skip publishing. I don't think it's well tested.

          Show
          Mark Miller added a comment - That whole bit looks silly and dangerous to me. I'm not sure this is the right way to do this check and it seems we should just remove that whole attempt to skip publishing. I don't think it's well tested.
          Hide
          Mark Miller added a comment -

          This was introduced by SOLR-5311.

          Show
          Mark Miller added a comment - This was introduced by SOLR-5311 .
          Hide
          Bill Bell added a comment -

          This looks good to me. I also like the log.info()

          Show
          Bill Bell added a comment - This looks good to me. I also like the log.info()

            People

            • Assignee:
              Mark Miller
              Reporter:
              YouPeng Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development