Solr
  1. Solr
  2. SOLR-4718

Allow solr.xml to be stored in zookeeper

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3, 6.0
    • Fix Version/s: 4.5, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now.

      More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch.

      Second level is how to tell Solr to get the file from ZK. Some possibilities:

      1> A system prop, -DzkSolrXmlPath=blah where blah is the path on zk where the file is. Would require -DzkHost or -DzkRun as well.
      > pros - simple, I can wrap my head around it.

      • easy to script
        > cons - can't run multiple JVMs pointing to different files. Is this really a problem?

      2> New solr.xml element. Something like:
      <solr>
      <solrcloud>
      <str name="zkHost">zkurl</str>
      <str name="zkSolrXmlPath">whatever</str>
      </solrcloud>
      <solr>
      Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy.

      NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's really easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort.

      pros - No new mechanisms
      cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change...

      For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though.

      1. SOLR-4718.patch
        23 kB
        Erick Erickson
      2. SOLR-4718.patch
        22 kB
        Erick Erickson
      3. SOLR-4718.patch
        17 kB
        Erick Erickson
      4. SOLR-4718.patch
        16 kB
        Erick Erickson
      5. SOLR-4718.patch
        15 kB
        Erick Erickson
      6. SOLR-4718.patch
        13 kB
        Erick Erickson
      7. SOLR-4718.patch
        4 kB
        Mark Miller
      8. SOLR-4718-alternative.patch
        9 kB
        Alan Woodward

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -

          Mark Miller Chris Hostetter (Unused) I think you're both interested in this topic, comments? Yonik Seeley any thoughts?

          Another note, persisting anything to solr.xml won't happen any more FWIW.

          Show
          Erick Erickson added a comment - Mark Miller Chris Hostetter (Unused) I think you're both interested in this topic, comments? Yonik Seeley any thoughts? Another note, persisting anything to solr.xml won't happen any more FWIW.
          Hide
          Mark Miller added a comment -

          More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other?

          I think this is one of the most important pieces - solr.xml must still be easy to edit - I don't think we should commit anything that leaves this piece out.

          The rest of this seems totally straightforward - the biggest downside is going to be the increased pain in working with solr.xml, and so we need to minimize that as much as possible.

          Show
          Mark Miller added a comment - More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I think this is one of the most important pieces - solr.xml must still be easy to edit - I don't think we should commit anything that leaves this piece out. The rest of this seems totally straightforward - the biggest downside is going to be the increased pain in working with solr.xml, and so we need to minimize that as much as possible.
          Hide
          Mark Miller added a comment -

          cons - can't run multiple JVMs pointing to different files. Is this really a problem?

          Hossman had a few cons he listed out in another issue - I don't think multiple JVMs has a problem with system properties - it's running multiple Solr wars within a single JVM that has a problem.

          I think one simple idea to handle is this is something much like solrcore.properties and solrconfig.xml. solrcore.properties is auto loaded and used for sys prop resolution on solrconfig.xml - you could imagine the same mechanism for solr.xml, but where the properties file is also picked up in a local solr.home location.

          Now you can put any local config overrides in this file.

          It could be required that zkhost is put either in the properties file or passed by sys prop.

          Show
          Mark Miller added a comment - cons - can't run multiple JVMs pointing to different files. Is this really a problem? Hossman had a few cons he listed out in another issue - I don't think multiple JVMs has a problem with system properties - it's running multiple Solr wars within a single JVM that has a problem. I think one simple idea to handle is this is something much like solrcore.properties and solrconfig.xml. solrcore.properties is auto loaded and used for sys prop resolution on solrconfig.xml - you could imagine the same mechanism for solr.xml, but where the properties file is also picked up in a local solr.home location. Now you can put any local config overrides in this file. It could be required that zkhost is put either in the properties file or passed by sys prop.
          Hide
          Mark Miller added a comment -

          The more I think about it the more I like that idea.

          A file called solr.properties can exist in solrhome, and if it does, it's used to substitute properties in solr.xml. zkhost can either go there, or as a sys prop on the cmd line. Like solrcore.properties (and most property files of this nature), it should be optional.

          It's simple, consistent with current design, and solves all of the open issues.

          Show
          Mark Miller added a comment - The more I think about it the more I like that idea. A file called solr.properties can exist in solrhome, and if it does, it's used to substitute properties in solr.xml. zkhost can either go there, or as a sys prop on the cmd line. Like solrcore.properties (and most property files of this nature), it should be optional. It's simple, consistent with current design, and solves all of the open issues.
          Hide
          Mark Miller added a comment -

          There is related discussion in SOLR-4687 and SOLR-4622.

          Show
          Mark Miller added a comment - There is related discussion in SOLR-4687 and SOLR-4622 .
          Hide
          Erick Erickson added a comment -

          My head's exploding. We have two files, solr.xml and solr.properties, plus the absence of either. Stored either locally or on ZK. What's the precedence here? I vote that solr.properties be only local.

          So, some straw-man rules....

          > solr.properties is irrelevant without a solr.xml, it's used only to sub in ${} type constructs.

          > solr.properties can exists only locally. Don't look on ZK under any conditions.

          > solr.properties definitions take precedence over any definitions in solr.xml either local or from ZK.

          Note: my objection to making it possible to store solr.properties in ZK is that it complexifies things and one can centrally store a complete solr.xml file there just as easily. I can be persuaded otherwise, but I want a real use case not a theoretical one. We can always add that on later too. Except how does this play when creating collections? I guess that's irrelevant, the solr home is what counts.

          > if solr.xml is local, we look for one on ZK iff zkSolrXmlPath is defined (in solr.xml or as a sysprop). Overlay the local one if so. Then apply solr.properties to the combined solr.xml

          > if no solr.xml is found locally but zkRun or zkHost is defined (sys props), then look in ZK(root or sysprop zkSolrXmlPath) for solr.xml and use that. Overlay with local solr.properties if present.

          1. So the multiple war in single JVM case is handled; you don't have to have any sysprops if you have a solr.xml/solr.properties pair defined in solrhome for each war. And you can still define a minimal file that just points to ZK for the "real" solr.xml file.
          1. The no-local-config case is handled by deciding to NOT have a solr.xml locally, set the necessary sysprops (zkhost and possibly zkSolrXmlPath. With or without a solr.properties file.
          1. The mixed case is handled by having remote solr.xml but local solr.properties in each solrhome to override values.
          Show
          Erick Erickson added a comment - My head's exploding. We have two files, solr.xml and solr.properties, plus the absence of either. Stored either locally or on ZK. What's the precedence here? I vote that solr.properties be only local. So, some straw-man rules.... > solr.properties is irrelevant without a solr.xml, it's used only to sub in ${} type constructs. > solr.properties can exists only locally. Don't look on ZK under any conditions. > solr.properties definitions take precedence over any definitions in solr.xml either local or from ZK. Note: my objection to making it possible to store solr.properties in ZK is that it complexifies things and one can centrally store a complete solr.xml file there just as easily. I can be persuaded otherwise, but I want a real use case not a theoretical one. We can always add that on later too. Except how does this play when creating collections? I guess that's irrelevant, the solr home is what counts. > if solr.xml is local, we look for one on ZK iff zkSolrXmlPath is defined (in solr.xml or as a sysprop). Overlay the local one if so. Then apply solr.properties to the combined solr.xml > if no solr.xml is found locally but zkRun or zkHost is defined (sys props), then look in ZK(root or sysprop zkSolrXmlPath) for solr.xml and use that. Overlay with local solr.properties if present. So the multiple war in single JVM case is handled; you don't have to have any sysprops if you have a solr.xml/solr.properties pair defined in solrhome for each war. And you can still define a minimal file that just points to ZK for the "real" solr.xml file. The no-local-config case is handled by deciding to NOT have a solr.xml locally, set the necessary sysprops (zkhost and possibly zkSolrXmlPath. With or without a solr.properties file. The mixed case is handled by having remote solr.xml but local solr.properties in each solrhome to override values.
          Hide
          Walter Underwood added a comment -

          What if one or both has a syntax error? Need rules for that, too.

          Show
          Walter Underwood added a comment - What if one or both has a syntax error? Need rules for that, too.
          Hide
          Mark Miller added a comment -

          It all should work the same way that solrcore.properties should work.

          It's should only be looked for locally - there is no need to put it in zk.

          This is roughly already in place with solrcore.properties like I said - there is not a lot of reason to tread new ground here.

          solr.properties and solr.xml have no precedence issues. Again, this works like solrcore.properties - these properties are used for sys prob variable substitution in solr.xml. No precedence to consider. We already have the machinery in place for almost all of this.

          The only special case will be zkhost.

          zkSolrXmlPath

          -1 on this guy. We shouldn't support local anymore - only for back compat. I think there must be a better way to solve this.

          I plan on working on some of this as well.

          Show
          Mark Miller added a comment - It all should work the same way that solrcore.properties should work. It's should only be looked for locally - there is no need to put it in zk. This is roughly already in place with solrcore.properties like I said - there is not a lot of reason to tread new ground here. solr.properties and solr.xml have no precedence issues. Again, this works like solrcore.properties - these properties are used for sys prob variable substitution in solr.xml. No precedence to consider. We already have the machinery in place for almost all of this. The only special case will be zkhost. zkSolrXmlPath -1 on this guy. We shouldn't support local anymore - only for back compat. I think there must be a better way to solve this. I plan on working on some of this as well.
          Hide
          Eric Pugh added a comment -

          So what about just making ZooKeeper a core part of Solr? Whether you are running traditional Solr, or running Solr Cloud, you still use ZooKeeper?

          I remember when we had either single core, or multiple core support in Solr back in version 1.4... And the logic was always "if single core, do X, else if multicore, then do Y". And eventually we moved to multicore everywhere, just sometimes you have a single core.

          So maybe everything is always in ZK, and the question is, is it a embedded ZK, which should be fine for typical use cases, or is it external ZK that you would want in a big setup. Eliminate this constant "am I in ZK or not"? Just always use ZK.

          Show
          Eric Pugh added a comment - So what about just making ZooKeeper a core part of Solr? Whether you are running traditional Solr, or running Solr Cloud, you still use ZooKeeper? I remember when we had either single core, or multiple core support in Solr back in version 1.4... And the logic was always "if single core, do X, else if multicore, then do Y". And eventually we moved to multicore everywhere, just sometimes you have a single core. So maybe everything is always in ZK, and the question is, is it a embedded ZK, which should be fine for typical use cases, or is it external ZK that you would want in a big setup. Eliminate this constant "am I in ZK or not"? Just always use ZK.
          Hide
          Mark Miller added a comment -

          That's always been on the table, but I don't think it's very near term yet. I think dealing with config and what not in zk needs to advance quite a bit before we can push zk into non solrcloud mode or push non solrcloud mode into zk. Long term, I think it will end up simplifying a lot when that happens.

          Show
          Mark Miller added a comment - That's always been on the table, but I don't think it's very near term yet. I think dealing with config and what not in zk needs to advance quite a bit before we can push zk into non solrcloud mode or push non solrcloud mode into zk. Long term, I think it will end up simplifying a lot when that happens.
          Hide
          Erik Hatcher added a comment -

          Just always use ZK.

          +1

          What are the cons to always using ZK? Is there some performance or other efficiency to using local config files? Granted one wouldn't want to be pulling synonyms from ZK for every request. But there's at least the file size issue in ZK and some folks do have large synonym files and large ExternalFileField files or elevate.xml's.

          Show
          Erik Hatcher added a comment - Just always use ZK. +1 What are the cons to always using ZK? Is there some performance or other efficiency to using local config files? Granted one wouldn't want to be pulling synonyms from ZK for every request. But there's at least the file size issue in ZK and some folks do have large synonym files and large ExternalFileField files or elevate.xml's.
          Hide
          Ryan McKinley added a comment -

          Just always use ZK.

          I have not followed this dev much, but I assume we are still aiming to support a ZK free (non-cloud) deployment

          Show
          Ryan McKinley added a comment - Just always use ZK. I have not followed this dev much, but I assume we are still aiming to support a ZK free (non-cloud) deployment
          Hide
          Shawn Heisey added a comment -

          Guiding principle that I think needs to be observed: Any increase in complexity must come with a tangible benefit to a substantial percentage of the user base.

          I agree that solr.properties should only be disk-based. I can't quite decide which should be the 'winner' for a config item that's in both solr.properties and solr.xml, but I am leaning towards the .properties entry. In a world that includes both solr.properties and solr.xml, perhaps solr.xml becomes a truly global config file, affecting the whole cluster and living in zookeeper, with server-specific overrides living in solr.properties.

          If solr.xml lives in zookeeper, then how do we initially upload it or update it? Would a Solr restart be required for changes to take effect? Will it be possible to get SolrCloud started if the only thing you have is zkHost in solr.properties? Side issue: it would be really cool to be able to upload and partially update config sets (and possibly solr.xml) from within the admin UI, and/or with an API like /admin/cores.

          The current zookeeper bootstrap/cli is made for uploading named configs, but solr.xml is different, more global. I'm reluctant to keep expanding the commandline options for bootstrapping or zkcli, but perhaps a cluster-wide solr.xml is a good reason for it.

          Show
          Shawn Heisey added a comment - Guiding principle that I think needs to be observed: Any increase in complexity must come with a tangible benefit to a substantial percentage of the user base. I agree that solr.properties should only be disk-based. I can't quite decide which should be the 'winner' for a config item that's in both solr.properties and solr.xml, but I am leaning towards the .properties entry. In a world that includes both solr.properties and solr.xml, perhaps solr.xml becomes a truly global config file, affecting the whole cluster and living in zookeeper, with server-specific overrides living in solr.properties. If solr.xml lives in zookeeper, then how do we initially upload it or update it? Would a Solr restart be required for changes to take effect? Will it be possible to get SolrCloud started if the only thing you have is zkHost in solr.properties? Side issue: it would be really cool to be able to upload and partially update config sets (and possibly solr.xml) from within the admin UI, and/or with an API like /admin/cores. The current zookeeper bootstrap/cli is made for uploading named configs, but solr.xml is different, more global. I'm reluctant to keep expanding the commandline options for bootstrapping or zkcli, but perhaps a cluster-wide solr.xml is a good reason for it.
          Hide
          Mark Miller added a comment -

          I can't quite decide which should be the 'winner' for a config item that's in both solr.properties and solr.xml

          There will be no decision necessary on this - the two will not compete. This is all very simple stuff that already works and has been around in Solr for a long time.

          Show
          Mark Miller added a comment - I can't quite decide which should be the 'winner' for a config item that's in both solr.properties and solr.xml There will be no decision necessary on this - the two will not compete. This is all very simple stuff that already works and has been around in Solr for a long time.
          Hide
          Erick Erickson added a comment -

          Shawn Heisey
          Note the structure here. Solr.xml has something like
          <solr>
          <solrcloud>
          <int name="hostPort">$

          {hostPort:44}

          </int>
          </solrcloud>
          </solr>

          but solr.properties just has
          hostPort=55

          so solr.properties always overrides solr.xml. If you don't put the ${} syntax in, you don't get the substitution.....

          Walter Underwood
          about syntax errors. How about blow up, first time, every time. As in refuse to start Solr? This will still be at startup. Fail early, fail often.....

          Mark Miller
          bq: zkSolrXmlPath ... -1 on this guy.

          This has nothing to do with local support and everything to do with where on ZK the solr.xml file is. I don't have strong feelings either way, but if we want to support multiple wars in the same JVM that have different solr.xmls seems like we need a way to distinguish where they are stored on ZK. That said, on the KISS principle I can get behind "OK, define a generic solr.xml, it always lives in the root on ZK and put your changes in the solr.properties file that exists locally". We can always add this later iff a good use-case comes up requiring it.

          All:

          I'm having something of a problem with solr.properties only being disk-based (even though I'd also like it to be). How do we get it there in the first place? Here's what I'm thinking; it should be possible to just install a stock solr with no modifications on a node, fire it up without having any a-priori files and be done with it. If we require the "correct" solr.properties file to be there why don't we just require the correct solr.xml and not bother with a properties file? (I'm over-stating the case here, but....)

          I want to say "java -DzkHost=blah -jar start.jar" and be done with it. I'm probably missing something, but any time we require local files (in this case solr.properties), whether an installation uses any of the auto-config tools or not, it seems like the same problem as requiring the "right" version of solr.xml.

          Or is this just one of those cases where I'm over-thinking it and I should just get over it and assume anyone who wants all this flexibility is also capable of getting the right solr.properties file in the right place?

          Show
          Erick Erickson added a comment - Shawn Heisey Note the structure here. Solr.xml has something like <solr> <solrcloud> <int name="hostPort">$ {hostPort:44} </int> </solrcloud> </solr> but solr.properties just has hostPort=55 so solr.properties always overrides solr.xml. If you don't put the ${} syntax in, you don't get the substitution..... Walter Underwood about syntax errors. How about blow up, first time, every time. As in refuse to start Solr? This will still be at startup. Fail early, fail often..... Mark Miller bq: zkSolrXmlPath ... -1 on this guy. This has nothing to do with local support and everything to do with where on ZK the solr.xml file is. I don't have strong feelings either way, but if we want to support multiple wars in the same JVM that have different solr.xmls seems like we need a way to distinguish where they are stored on ZK. That said, on the KISS principle I can get behind "OK, define a generic solr.xml, it always lives in the root on ZK and put your changes in the solr.properties file that exists locally". We can always add this later iff a good use-case comes up requiring it. All: I'm having something of a problem with solr.properties only being disk-based (even though I'd also like it to be). How do we get it there in the first place? Here's what I'm thinking; it should be possible to just install a stock solr with no modifications on a node, fire it up without having any a-priori files and be done with it. If we require the "correct" solr.properties file to be there why don't we just require the correct solr.xml and not bother with a properties file? (I'm over-stating the case here, but....) I want to say "java -DzkHost=blah -jar start.jar" and be done with it. I'm probably missing something, but any time we require local files (in this case solr.properties), whether an installation uses any of the auto-config tools or not, it seems like the same problem as requiring the "right" version of solr.xml. Or is this just one of those cases where I'm over-thinking it and I should just get over it and assume anyone who wants all this flexibility is also capable of getting the right solr.properties file in the right place?
          Hide
          Mark Miller added a comment - - edited

          This has nothing to do with local support and everything to do with where on ZK the solr.xml file is.

          That should not be configurable.

          it should be possible to just install a stock solr with no modifications on a node, fire it up without having any a-priori files and be done with it.

          That is possible - but you have to specify zkhost as a sys prop on the cmd line then.

          As I've said, the solr.properties file is optional - it will get there because the user decides to create it.

          I think all of this is pretty simple - I'm happy to implement it.

          Show
          Mark Miller added a comment - - edited This has nothing to do with local support and everything to do with where on ZK the solr.xml file is. That should not be configurable. it should be possible to just install a stock solr with no modifications on a node, fire it up without having any a-priori files and be done with it. That is possible - but you have to specify zkhost as a sys prop on the cmd line then. As I've said, the solr.properties file is optional - it will get there because the user decides to create it. I think all of this is pretty simple - I'm happy to implement it.
          Hide
          Erick Erickson added a comment -

          There is no problem I can't make unnecessarily complex...

          Let me take a whack at it, as you say given the assumptions it's pretty straight-forward.
          Here's the rules then:

          a> solr.xml is a local file. Apply local solr.properties only if new-style (changes when we stop supporting old-style solr.xml). Stop

          b> zkhost is specified as a sysprop. Read solr.xml from ZK. If found, apply local solr.properties (if any) and stop.

          c> zkhost is specified in local solr.properties. Use it to get solr.xml from ZK. If found, apply local solr.properties (if any) and stop

          d> Use the current hard-coded old-style solr.xml file. Do NOT apply any local solr.properties (this will change when we stop supporting old-style solr.xml).

          Show
          Erick Erickson added a comment - There is no problem I can't make unnecessarily complex... Let me take a whack at it, as you say given the assumptions it's pretty straight-forward. Here's the rules then: a> solr.xml is a local file. Apply local solr.properties only if new-style (changes when we stop supporting old-style solr.xml). Stop b> zkhost is specified as a sysprop. Read solr.xml from ZK. If found, apply local solr.properties (if any) and stop. c> zkhost is specified in local solr.properties. Use it to get solr.xml from ZK. If found, apply local solr.properties (if any) and stop d> Use the current hard-coded old-style solr.xml file. Do NOT apply any local solr.properties (this will change when we stop supporting old-style solr.xml).
          Hide
          Mark Miller added a comment -

          given the assumptions it's pretty straight-forward.

          I'm pretty sure it is, just because it's so close to what is already done with solrcore.properties.

          a> solr.xml is a local file. Apply local solr.properties only if new-style (changes when we stop supporting old-style solr.xml). Stop

          Yeah, I think only new style is fine. Look how this is done with solrcore.properties - that's a good starting point.

          b> zkhost is specified as a sysprop. Read solr.xml from ZK. If found, apply local solr.properties (if any) and stop.

          Right - as part of loading the solr.xml file you should be able to just pass along the props you read like solrcore.properties.

          c> zkhost is specified in local solr.properties. Use it to get solr.xml from ZK. If found, apply local solr.properties (if any) and stop

          Right, the zkhost exception - we just first try and find it in this prop file if it exists and zkhost is not found as a sys prop.

          d> Use the current hard-coded old-style solr.xml file. Do NOT apply any local solr.properties (this will change when we stop supporting old-style solr.xml).

          Yeah, that's back compat support and we don't need to add features for it IMO.

          Show
          Mark Miller added a comment - given the assumptions it's pretty straight-forward. I'm pretty sure it is, just because it's so close to what is already done with solrcore.properties. a> solr.xml is a local file. Apply local solr.properties only if new-style (changes when we stop supporting old-style solr.xml). Stop Yeah, I think only new style is fine. Look how this is done with solrcore.properties - that's a good starting point. b> zkhost is specified as a sysprop. Read solr.xml from ZK. If found, apply local solr.properties (if any) and stop. Right - as part of loading the solr.xml file you should be able to just pass along the props you read like solrcore.properties. c> zkhost is specified in local solr.properties. Use it to get solr.xml from ZK. If found, apply local solr.properties (if any) and stop Right, the zkhost exception - we just first try and find it in this prop file if it exists and zkhost is not found as a sys prop. d> Use the current hard-coded old-style solr.xml file. Do NOT apply any local solr.properties (this will change when we stop supporting old-style solr.xml). Yeah, that's back compat support and we don't need to add features for it IMO.
          Hide
          Shawn Heisey added a comment -

          What are the cons to always using ZK?

          I missed this question from Erik when I was looking earlier.

          The primary problem I see with always using zookeeper takes a little explanation.

          In a non-ZK install, making Solr fault tolerant for queries is fairly easy - fire up another node and set up replication. That's not true fault tolerance, as we all know.

          Running the embedded zookeeper is something we don't recommend to people except for single-server proof of concept setups. I don't even know if it's possible to set up a fully functional fault-tolerant ensemble using the embedded zookeeper.

          Unfortunately, setting up a standalone ensemble is not trivial. It's not HARD, but when you go from just running "java -jar start.jar" on a single server to a real-world SolrCloud, either you have to start over or you end up performing arcane rituals to migrate your existing setup.

          In my opinion, it is a generally bad idea to have step-by-step documentation that results in a setup that isn't fault tolerant. I have used a lot of open source software with documentation like this. If documentation about adding redundancy exists, it is often found in a completely different location as reference material, not step-by-step instructions.

          Our SolrCloud examples encourage setting up separate Solr JVMs on the same server with the embedded zookeeper. Because that's the available documentation, there are probably production installs set up this way. I think that kind of setup should be in footnotes or appendices, and the main examples should be multi-node, with clear instructions about how to make sure it won't break when something fails. I'm not sure how to make it easy to set up zookeeper. Cross-platform scripting is not easy, especially when one of those platforms is Windows.

          Show
          Shawn Heisey added a comment - What are the cons to always using ZK? I missed this question from Erik when I was looking earlier. The primary problem I see with always using zookeeper takes a little explanation. In a non-ZK install, making Solr fault tolerant for queries is fairly easy - fire up another node and set up replication. That's not true fault tolerance, as we all know. Running the embedded zookeeper is something we don't recommend to people except for single-server proof of concept setups. I don't even know if it's possible to set up a fully functional fault-tolerant ensemble using the embedded zookeeper. Unfortunately, setting up a standalone ensemble is not trivial. It's not HARD, but when you go from just running "java -jar start.jar" on a single server to a real-world SolrCloud, either you have to start over or you end up performing arcane rituals to migrate your existing setup. In my opinion, it is a generally bad idea to have step-by-step documentation that results in a setup that isn't fault tolerant. I have used a lot of open source software with documentation like this. If documentation about adding redundancy exists, it is often found in a completely different location as reference material, not step-by-step instructions. Our SolrCloud examples encourage setting up separate Solr JVMs on the same server with the embedded zookeeper. Because that's the available documentation, there are probably production installs set up this way. I think that kind of setup should be in footnotes or appendices, and the main examples should be multi-node, with clear instructions about how to make sure it won't break when something fails. I'm not sure how to make it easy to set up zookeeper. Cross-platform scripting is not easy, especially when one of those platforms is Windows.
          Hide
          Mark Miller added a comment -

          I don't even know if it's possible to set up a fully functional fault-tolerant ensemble using the embedded zookeeper.

          Yes, it certainly is, and it will run fine. The reason we don't necessarily recommend it is that the solr nodes running zk become somewhat special and zk runs in the same process as Solr - it becomes harder to manage this than if you simply have a separate zk ensemble, which is really just as easy to do.
          y
          It's simply a recommendation based on the logistics - embedded zk is a fully functional and fault tolerant zk. Not sure where you go the idea otherwise.

          Show
          Mark Miller added a comment - I don't even know if it's possible to set up a fully functional fault-tolerant ensemble using the embedded zookeeper. Yes, it certainly is, and it will run fine. The reason we don't necessarily recommend it is that the solr nodes running zk become somewhat special and zk runs in the same process as Solr - it becomes harder to manage this than if you simply have a separate zk ensemble, which is really just as easy to do. y It's simply a recommendation based on the logistics - embedded zk is a fully functional and fault tolerant zk. Not sure where you go the idea otherwise.
          Hide
          Mark Miller added a comment -

          I'm going to take a crack at this. Since this will only be done with the new solr.xml that does core discovery by directory structure, we need to start getting some of our tests using that.

          Show
          Mark Miller added a comment - I'm going to take a crack at this. Since this will only be done with the new solr.xml that does core discovery by directory structure, we need to start getting some of our tests using that.
          Hide
          Mark Miller added a comment -

          SOLR-4756 Change some of our existing tests to use the new solr.xml format and core discovery by directory structure.

          Show
          Mark Miller added a comment - SOLR-4756 Change some of our existing tests to use the new solr.xml format and core discovery by directory structure.
          Hide
          Mark Miller added a comment -

          Simple patch that can load solr.xml from the root location in ZooKeeper. Requires that you explicitly set a system prop to do this. Default is still to load from SolrHome. No tests yet, and probably should fail and log an error if you are not using the new core discovery mode.

          Show
          Mark Miller added a comment - Simple patch that can load solr.xml from the root location in ZooKeeper. Requires that you explicitly set a system prop to do this. Default is still to load from SolrHome. No tests yet, and probably should fail and log an error if you are not using the new core discovery mode.
          Hide
          Erick Erickson added a comment -

          OK, I'm picking this up. The last update was some time ago, things have changed a bit so a few questions

          1> Hmmm, it looks like the patch so far bypasses all of the logic above and requires a sysprop to trigger getting solr.xml from ZK so we can skip a bunch of code. Eventually (Solr5?) maybe we can do some more stuff around whether or not to try to find solr.xml, meanwhile this makes the whole problem of doing the right thing in new/old style solr.xml less error-prone.

          Show
          Erick Erickson added a comment - OK, I'm picking this up. The last update was some time ago, things have changed a bit so a few questions 1> Hmmm, it looks like the patch so far bypasses all of the logic above and requires a sysprop to trigger getting solr.xml from ZK so we can skip a bunch of code. Eventually (Solr5?) maybe we can do some more stuff around whether or not to try to find solr.xml, meanwhile this makes the whole problem of doing the right thing in new/old style solr.xml less error-prone.
          Hide
          Mark Miller added a comment -

          Right - for initial support, I went with the least invasive and most straightforward impl. I think it should actually fail if you try and run with solr.xml in zk with the old style.

          Show
          Mark Miller added a comment - Right - for initial support, I went with the least invasive and most straightforward impl. I think it should actually fail if you try and run with solr.xml in zk with the old style.
          Hide
          Noble Paul added a comment -

          I'm wondering why solr.xml is an xml file? From what I see why can't we just make it a properties file.

          Show
          Noble Paul added a comment - I'm wondering why solr.xml is an xml file? From what I see why can't we just make it a properties file.
          Hide
          Erick Erickson added a comment -

          I tried that once. It gets a little ugly with entries like solr.cloud=. The xml does divide things nicely into sections. Having a solr.xml allows a clean specification of a local properties file, there's no confusion between solr.xml and solr.properties, but that could be handled by convention since we really haven't decided what the local properties file could be (something like solr.properties and solrlocal.properties).

          But personally I don't want to go through the hassle of changing from solr.xml. I agree that functionally we should be able to get by with a properties file, but the fact that it's xml is built into the code in a lot of places and untangling the xml-ish nature is more time consuming (at least it was last time I did it then reverted) than valuable I think.

          Show
          Erick Erickson added a comment - I tried that once. It gets a little ugly with entries like solr.cloud=. The xml does divide things nicely into sections. Having a solr.xml allows a clean specification of a local properties file, there's no confusion between solr.xml and solr.properties, but that could be handled by convention since we really haven't decided what the local properties file could be (something like solr.properties and solrlocal.properties). But personally I don't want to go through the hassle of changing from solr.xml. I agree that functionally we should be able to get by with a properties file, but the fact that it's xml is built into the code in a lot of places and untangling the xml-ish nature is more time consuming (at least it was last time I did it then reverted) than valuable I think.
          Hide
          Mark Miller added a comment -

          The idea has come up before - there was a preference for the better hierarchy support with xml vs properties as well as consistency with the solrcore configuration.

          It has little to do with putting it in zookeeper though - we want the same format as if it's on disk.

          Show
          Mark Miller added a comment - The idea has come up before - there was a preference for the better hierarchy support with xml vs properties as well as consistency with the solrcore configuration. It has little to do with putting it in zookeeper though - we want the same format as if it's on disk.
          Hide
          Erick Erickson added a comment -

          Beefed up the tests, added bits about getting the uploaded solr.xml from the embedded ZK instance.

          I need to insure that the variations on embedded ZK work manually (unless I can come up with a clever test), but otherwise I think it's ready to commit and I'll probably check it in over the next day or two unless there are objections or I find problems when I look at it with fresh eyes.

          Show
          Erick Erickson added a comment - Beefed up the tests, added bits about getting the uploaded solr.xml from the embedded ZK instance. I need to insure that the variations on embedded ZK work manually (unless I can come up with a clever test), but otherwise I think it's ready to commit and I'll probably check it in over the next day or two unless there are objections or I find problems when I look at it with fresh eyes.
          Hide
          Alan Woodward added a comment -

          Rather than adding Zookeeper stuff to ConfigSolr.fromSolrHome(), can we create a new static method ConfigSolr.fromZookeeper()? And then push the system property checks back out into SolrDispatchFilter or wherever fromSolrHome is being called. Keeps each fromXXX method just doing one thing.

          I wonder if it's worth refactoring the ByteArrayInputStream re-reading dance into fromInputStream as well. It's a bit of a hack anyway, and I don't like having it in more than once place.

          Show
          Alan Woodward added a comment - Rather than adding Zookeeper stuff to ConfigSolr.fromSolrHome(), can we create a new static method ConfigSolr.fromZookeeper()? And then push the system property checks back out into SolrDispatchFilter or wherever fromSolrHome is being called. Keeps each fromXXX method just doing one thing. I wonder if it's worth refactoring the ByteArrayInputStream re-reading dance into fromInputStream as well. It's a bit of a hack anyway, and I don't like having it in more than once place.
          Hide
          Mark Miller added a comment -

          +1

          Show
          Mark Miller added a comment - +1
          Hide
          Mark Miller added a comment -

          We should also probably be strict about the property values for the setting - eg zookeeper works, solrhome works, null works (as solrhome), anything else fails with an error.

          Show
          Mark Miller added a comment - We should also probably be strict about the property values for the setting - eg zookeeper works, solrhome works, null works (as solrhome), anything else fails with an error.
          Hide
          Erick Erickson added a comment -

          I have to run right now, but is this what you two had in mind?

          Not all of the new tests run, but I have to leave for the evening and wanted to see if this is down the right path.

          Haven't dealt with the bytestream yet.

          Show
          Erick Erickson added a comment - I have to run right now, but is this what you two had in mind? Not all of the new tests run, but I have to leave for the evening and wanted to see if this is down the right path. Haven't dealt with the bytestream yet.
          Hide
          Erick Erickson added a comment -

          Another version with a quick hack (rushing out the door, may be totally wrong!) for the bytestream stuff

          [~Alan Woodward] do you have a moment to check the refactoring of the bytestream? I haven't even run any tests on it, all I know is it compiles.

          Show
          Erick Erickson added a comment - Another version with a quick hack (rushing out the door, may be totally wrong!) for the bytestream stuff [~Alan Woodward] do you have a moment to check the refactoring of the bytestream? I haven't even run any tests on it, all I know is it compiles.
          Hide
          Erick Erickson added a comment -

          New version, new tests pass (haven't tried a full test yet). But it does take out one of the annoying (and useless) params in fromInputStream.

          Show
          Erick Erickson added a comment - New version, new tests pass (haven't tried a full test yet). But it does take out one of the annoying (and useless) params in fromInputStream.
          Hide
          Alan Woodward added a comment -

          Not quite what I was thinking - I'll see if I get time tonight to put a patch together.

          Show
          Alan Woodward added a comment - Not quite what I was thinking - I'll see if I get time tonight to put a patch together.
          Hide
          Alan Woodward added a comment -

          It looks like trying to deal with the embedded zookeeper case is doomed to failure the way things work at the moment, because the embedded server isn't started until CoreContainer.load() is called, but we need to load solr.xml before then.

          So before we resolve this, we need to move the Embedded ZK logic out of ZkController, and start/stop it in SolrDispatchFilter - I guess in a separate JIRA issue?

          Show
          Alan Woodward added a comment - It looks like trying to deal with the embedded zookeeper case is doomed to failure the way things work at the moment, because the embedded server isn't started until CoreContainer.load() is called, but we need to load solr.xml before then. So before we resolve this, we need to move the Embedded ZK logic out of ZkController, and start/stop it in SolrDispatchFilter - I guess in a separate JIRA issue?
          Hide
          Mark Miller added a comment -

          Yes, I think embedded support should be another issue. I think it could be approached a couple ways.

          Show
          Mark Miller added a comment - Yes, I think embedded support should be another issue. I think it could be approached a couple ways.
          Hide
          Alan Woodward added a comment -

          This is what I was thinking of. No tests yet, and it punts on dealing with embedded ZK, but it makes ConfigSolr.fromInputStream() less stupid and pushes ZK-aware logic out of the more general configuration code.

          Show
          Alan Woodward added a comment - This is what I was thinking of. No tests yet, and it punts on dealing with embedded ZK, but it makes ConfigSolr.fromInputStream() less stupid and pushes ZK-aware logic out of the more general configuration code.
          Hide
          Erick Erickson added a comment -

          bq: It looks like trying to deal with the embedded zookeeper case is doomed to failure

          Yeah, that's what I concluded after I got a good look at what was going on. Personally I'm willing to not support the embedded case at all or at least wait until someone complains.

          Show
          Erick Erickson added a comment - bq: It looks like trying to deal with the embedded zookeeper case is doomed to failure Yeah, that's what I concluded after I got a good look at what was going on. Personally I'm willing to not support the embedded case at all or at least wait until someone complains.
          Hide
          Mark Miller added a comment -

          We need to support embedded in time, but it doesn't need to be in this issue. There are a couple pretty easy ways to do it, so I'll likely get to it eventually. For now, it's not so important.

          Show
          Mark Miller added a comment - We need to support embedded in time, but it doesn't need to be in this issue. There are a couple pretty easy ways to do it, so I'll likely get to it eventually. For now, it's not so important.
          Hide
          Erick Erickson added a comment -

          Alan's patch with some modifications and with the new test cases.

          Show
          Erick Erickson added a comment - Alan's patch with some modifications and with the new test cases.
          Hide
          Erick Erickson added a comment -

          Final patch with, CHANGES.txt entry here.

          Show
          Erick Erickson added a comment - Final patch with, CHANGES.txt entry here.
          Hide
          ASF subversion and git services added a comment -

          Commit 1514800 from Erick Erickson in branch 'dev/trunk'
          [ https://svn.apache.org/r1514800 ]

          SOLR-4718 Allow solr.xml to be stored in ZooKeeper

          Show
          ASF subversion and git services added a comment - Commit 1514800 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1514800 ] SOLR-4718 Allow solr.xml to be stored in ZooKeeper
          Hide
          ASF subversion and git services added a comment -

          Commit 1514843 from Erick Erickson in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1514843 ]

          SOLR-4718 Allow solr.xml to be stored in ZooKeeper

          Show
          ASF subversion and git services added a comment - Commit 1514843 from Erick Erickson in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1514843 ] SOLR-4718 Allow solr.xml to be stored in ZooKeeper
          Hide
          Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          Adrien Grand added a comment - 4.5 release -> bulk close

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development