Solr
  1. Solr
  2. SOLR-3080

We should consider removing shard info from Zk when you explicitly unload a SolrCore.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: SolrCloud
    • Labels:
      None
    1. SOLR-3080_smaller_sync.patch
      2 kB
      Yonik Seeley
    2. SOLR-3080.patch
      42 kB
      Sami Siren
    3. SOLR-3080.patch
      25 kB
      Sami Siren

      Activity

      Hide
      Sami Siren added a comment -

      Here's an initial patch to implement this feature. I still need to add some more junit tests.

      I noticed that CoreAdminHandler allows "creating" multiple cores with same name and if you do that you can end up having one core in multiple places unless you remove it in between. Is this behavior of CoreAdminHandler intentional?

      Also I was not sure if the slice should be deleted when there are no more shards with that id (currently it is not deleted).

      Show
      Sami Siren added a comment - Here's an initial patch to implement this feature. I still need to add some more junit tests. I noticed that CoreAdminHandler allows "creating" multiple cores with same name and if you do that you can end up having one core in multiple places unless you remove it in between. Is this behavior of CoreAdminHandler intentional? Also I was not sure if the slice should be deleted when there are no more shards with that id (currently it is not deleted).
      Hide
      Mark Miller added a comment -

      Also I was not sure if the slice should be deleted when there are no more shards with that id (currently it is not deleted).

      Yeah, I think that is the right move for now - don't delete it.

      Show
      Mark Miller added a comment - Also I was not sure if the slice should be deleted when there are no more shards with that id (currently it is not deleted). Yeah, I think that is the right move for now - don't delete it.
      Hide
      Mark Miller added a comment -

      I noticed that CoreAdminHandler allows "creating" multiple cores with same name and if you do that you can end up having one core in multiple places unless you remove it in between. Is this behavior of CoreAdminHandler intentional?

      I doubt it... Probably we should throw an error if you try and create a core that already exists?

      Show
      Mark Miller added a comment - I noticed that CoreAdminHandler allows "creating" multiple cores with same name and if you do that you can end up having one core in multiple places unless you remove it in between. Is this behavior of CoreAdminHandler intentional? I doubt it... Probably we should throw an error if you try and create a core that already exists?
      Hide
      Sami Siren added a comment -

      I doubt it... Probably we should throw an error if you try and create a core that already exists?

      I checked the wiki at http://wiki.apache.org/solr/CoreAdmin#CREATE and it documents this special piece of functionality. I am not sure how that should be handled when using solr in cloud mode...

      Show
      Sami Siren added a comment - I doubt it... Probably we should throw an error if you try and create a core that already exists? I checked the wiki at http://wiki.apache.org/solr/CoreAdmin#CREATE and it documents this special piece of functionality. I am not sure how that should be handled when using solr in cloud mode...
      Hide
      Mark Miller added a comment -

      I am not sure how that should be handled when using solr in cloud mode...

      Perhaps we should just remove it? I mean, it's a core reload right? Why do we need to support that through the create cmd as well?

      Show
      Mark Miller added a comment - I am not sure how that should be handled when using solr in cloud mode... Perhaps we should just remove it? I mean, it's a core reload right? Why do we need to support that through the create cmd as well?
      Hide
      Sami Siren added a comment -

      Perhaps we should just remove it?

      Sounds good to me.

      Show
      Sami Siren added a comment - Perhaps we should just remove it? Sounds good to me.
      Hide
      Yonik Seeley added a comment -

      I mean, it's a core reload right? Why do we need to support that through the create cmd as well?

      • it looks like you can specify more stuff on create (you could change the data dir, etc). you could replace that with create-swap-unload but it's 3 calls instead of one
      • it seems useful to have a "create or reload" command where you don't have to worry if the core was already loaded or not... you just want it to read/reread it's config.

      So I can see how it made sense in the past, but if it causes problems with cloud mode then we should do something about that.

      Show
      Yonik Seeley added a comment - I mean, it's a core reload right? Why do we need to support that through the create cmd as well? it looks like you can specify more stuff on create (you could change the data dir, etc). you could replace that with create-swap-unload but it's 3 calls instead of one it seems useful to have a "create or reload" command where you don't have to worry if the core was already loaded or not... you just want it to read/reread it's config. So I can see how it made sense in the past, but if it causes problems with cloud mode then we should do something about that.
      Hide
      Mark Miller added a comment -

      but if it causes problems with cloud mode then we should do something about that.

      Well there are other ways to deal with it I imagine - leave it alone and just doc not to do that in cloud mode, or if in cloud mode throw an error or whatever.

      Show
      Mark Miller added a comment - but if it causes problems with cloud mode then we should do something about that. Well there are other ways to deal with it I imagine - leave it alone and just doc not to do that in cloud mode, or if in cloud mode throw an error or whatever.
      Hide
      Sami Siren added a comment -

      Here's a new patch.

      when running in cloud mode I am preventing creation of core if the name is not locally unique.

      Show
      Sami Siren added a comment - Here's a new patch. when running in cloud mode I am preventing creation of core if the name is not locally unique.
      Hide
      Mark Miller added a comment -

      Cool - I'll try and get to this today.

      Show
      Mark Miller added a comment - Cool - I'll try and get to this today.
      Hide
      Mark Miller added a comment -

      All yours Sami.

      Show
      Mark Miller added a comment - All yours Sami.
      Hide
      Yonik Seeley added a comment -

      We currently synchronize on coreStates when we publish the info to zookeeper. This would also block any reads or other changes in the meantime.

      Here's the first approach that came to mind to handle not blocking reads/updates while publishing. Anyone have an easier way?

      Show
      Yonik Seeley added a comment - We currently synchronize on coreStates when we publish the info to zookeeper. This would also block any reads or other changes in the meantime. Here's the first approach that came to mind to handle not blocking reads/updates while publishing. Anyone have an easier way?
      Hide
      Sami Siren added a comment -

      +1, nice!

      Show
      Sami Siren added a comment - +1, nice!
      Hide
      Yonik Seeley added a comment -

      FYI, sometimes the core name and sometimes the zk node name was used to key coreStates (leading to duplicate entries on a restart). I've changed it to just key off of core name, along with some other minor improvements like creating the map up front so we have a single object to consistently synchronize on (prob didn't matter now, but may in the future w/ continued development).

      Show
      Yonik Seeley added a comment - FYI, sometimes the core name and sometimes the zk node name was used to key coreStates (leading to duplicate entries on a restart). I've changed it to just key off of core name, along with some other minor improvements like creating the map up front so we have a single object to consistently synchronize on (prob didn't matter now, but may in the future w/ continued development).
      Hide
      Yonik Seeley added a comment -

      OK, here's what I think is happening around the ChaosMonkeySafeLeaderTest failures I'm seeing.

      The missing docs actually appear in the "startingVersions" (i.e. they are in the transaction log).
      We peersync and don't request those versions because we already have them.
      But I don't think we ever called UpdateLog.recoverFromLog() to actually apply them to the index!

      In ZkController.register() recoverFromLog is only called if isLeader==true. Perhaps this is a holdover from when peersync was not done at startup and hence it didn't make sense to try and recover from your logs when you were just going to replicate the leader index anyway?

      Show
      Yonik Seeley added a comment - OK, here's what I think is happening around the ChaosMonkeySafeLeaderTest failures I'm seeing. The missing docs actually appear in the "startingVersions" (i.e. they are in the transaction log). We peersync and don't request those versions because we already have them. But I don't think we ever called UpdateLog.recoverFromLog() to actually apply them to the index! In ZkController.register() recoverFromLog is only called if isLeader==true. Perhaps this is a holdover from when peersync was not done at startup and hence it didn't make sense to try and recover from your logs when you were just going to replicate the leader index anyway?
      Hide
      Yonik Seeley added a comment -

      After removing isLeader==true test for tlog recovery, I ran for 4 hours w/o a fail with deletes turned off. Turning those on now...

      Show
      Yonik Seeley added a comment - After removing isLeader==true test for tlog recovery, I ran for 4 hours w/o a fail with deletes turned off. Turning those on now...
      Hide
      Sami Siren added a comment -

      Thanks Yonik, Mark. I am resolving this issue as fixed. Further bug fixing can be done in separate issues.

      Show
      Sami Siren added a comment - Thanks Yonik, Mark. I am resolving this issue as fixed. Further bug fixing can be done in separate issues.

        People

        • Assignee:
          Sami Siren
          Reporter:
          Mark Miller
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development