Solr
  1. Solr
  2. SOLR-8378

Add upconfig and downconfig commands to the bin/solr script

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 5.4, 6.0
    • Fix Version/s: 5.5, 6.0
    • Component/s: scripts and tools
    • Labels:
      None

      Description

      It would be convenient to be able to upload and download arbitrary configsets to Zookeeper.

      This might be the last thing we need before not requiring users be aware of zkcli, which is awkward.

      1. SOLR-8378.patch
        36 kB
        Erick Erickson
      2. SOLR-8378.patch
        35 kB
        Erick Erickson
      3. SOLR-8378.patch
        37 kB
        Erick Erickson
      4. SOLR-8378.patch
        284 kB
        Erick Erickson
      5. SOLR-8378.patch
        271 kB
        Erick Erickson
      6. SOLR-8378.patch
        261 kB
        Erick Erickson
      7. SOLR-8378.patch
        24 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -

          First cut, seems to work on a very quick once-over. I need to look it over with fresh eyes, it's Saturday after all....

          Any comments on the approach? Or whether this is worth doing? Unless there are objections I'm going to check this in next week sometime after some more polish.

          One thing I'm not too happy about is that I moved the upconfig method from CreateCollectionTool to the base class so I could easily re-use it in the new ConfigSetUploadTool. Any suggestions? It needed access to the "echo" method so unless we rearrange things significantly it's easier this way.

          Upayavira I'm torn on whether to put this in 5.4. Maybe I'll put it in if there's a re-spin? What do you think? The necessity of using zkcli has long rankled me as it's yet another thing a new person has to understand. But one point of releases is to try to not shoehorn lots of stuff in at the last minute.

          I suppose a lot of it depends on whether there'll be a 5.5 release. I'd really like to get this in place in the 5x code line somewhere.
          Upayavira

          Show
          Erick Erickson added a comment - First cut, seems to work on a very quick once-over. I need to look it over with fresh eyes, it's Saturday after all.... Any comments on the approach? Or whether this is worth doing? Unless there are objections I'm going to check this in next week sometime after some more polish. One thing I'm not too happy about is that I moved the upconfig method from CreateCollectionTool to the base class so I could easily re-use it in the new ConfigSetUploadTool. Any suggestions? It needed access to the "echo" method so unless we rearrange things significantly it's easier this way. Upayavira I'm torn on whether to put this in 5.4. Maybe I'll put it in if there's a re-spin? What do you think? The necessity of using zkcli has long rankled me as it's yet another thing a new person has to understand. But one point of releases is to try to not shoehorn lots of stuff in at the last minute. I suppose a lot of it depends on whether there'll be a 5.5 release. I'd really like to get this in place in the 5x code line somewhere. Upayavira
          Hide
          Erick Erickson added a comment - - edited

          Hmmm, got some scope creep in here because what this patch also does is add in a managed_schema_configs, which I'd really like to get into the 5.4 line. The point there is to have a convenient way to use the managed schema stuff without all the field guessing stuff.

          At least the patch I'm about to put up actually puts up the managed_schema_configs that I reference in the scripts....

          Opinions?

          Show
          Erick Erickson added a comment - - edited Hmmm, got some scope creep in here because what this patch also does is add in a managed_schema_configs, which I'd really like to get into the 5.4 line. The point there is to have a convenient way to use the managed schema stuff without all the field guessing stuff. At least the patch I'm about to put up actually puts up the managed_schema_configs that I reference in the scripts.... Opinions?
          Hide
          Erick Erickson added a comment -

          I think this is ready, much of it it copy/paste/change reproducing the pattern for the other commands.

          I'm particularly interested in any comments Timothy Potter has.

          This really does two things:
          1> provides an upconfig and downconfig so we don't force the users to use zkcli or bootstrap options.

          2> provides a managed schema configset that does not have the field guessing in it.

          I'd actually like to get this in to 5.4 if there aren't too many objections, I'm guessing there'll be a re-spin. Especially because this in conjunction with the new admin API allows the managed schema to be used from the UI with or without the field guessing API.

          Unless there are objections, I plan on committing this to at least trunk and 5x Tuesday or so, whether to also include it in 5.4 is another question.

          Show
          Erick Erickson added a comment - I think this is ready, much of it it copy/paste/change reproducing the pattern for the other commands. I'm particularly interested in any comments Timothy Potter has. This really does two things: 1> provides an upconfig and downconfig so we don't force the users to use zkcli or bootstrap options. 2> provides a managed schema configset that does not have the field guessing in it. I'd actually like to get this in to 5.4 if there aren't too many objections, I'm guessing there'll be a re-spin. Especially because this in conjunction with the new admin API allows the managed schema to be used from the UI with or without the field guessing API. Unless there are objections, I plan on committing this to at least trunk and 5x Tuesday or so, whether to also include it in 5.4 is another question.
          Hide
          Varun Thacker added a comment -

          Hi Erick,

          2> provides a managed schema configset that does not have the field guessing in it.

          With SOLR-8131 ( which I plan on committing today ) all branch5x examples will explicitly use managed-schema . So basic schema will also be equivalent to the managed-only schema here? There will be less duplication of example configsets we ship with in that case.

          Also I think we should not add this to 5.4 . If there not being another 5.x release a motive here? We could do a 5.5 release in early/mid January since 6.0 could be soon after that in that case

          Show
          Varun Thacker added a comment - Hi Erick, 2> provides a managed schema configset that does not have the field guessing in it. With SOLR-8131 ( which I plan on committing today ) all branch5x examples will explicitly use managed-schema . So basic schema will also be equivalent to the managed-only schema here? There will be less duplication of example configsets we ship with in that case. Also I think we should not add this to 5.4 . If there not being another 5.x release a motive here? We could do a 5.5 release in early/mid January since 6.0 could be soon after that in that case
          Hide
          Upayavira added a comment -

          Having a schema sample that does not include the schemaless stuff is really important. Schemaless is fraught with danger, and whilst it is clever, and useful in a limited range of scenarios, we should not have it being the only managed schema sample config.

          That said, this ticket is about upconfig/downconfig. I support the addition of these to bin/solr. It is a real pain to locate the zk-cli script and explain its need/etc.

          Neither are bugs in 5.4.0 so should not be a part of that release. 5.5.0 could be released in 2 weeks time if we wanted to.

          Show
          Upayavira added a comment - Having a schema sample that does not include the schemaless stuff is really important. Schemaless is fraught with danger, and whilst it is clever, and useful in a limited range of scenarios, we should not have it being the only managed schema sample config. That said, this ticket is about upconfig/downconfig. I support the addition of these to bin/solr. It is a real pain to locate the zk-cli script and explain its need/etc. Neither are bugs in 5.4.0 so should not be a part of that release. 5.5.0 could be released in 2 weeks time if we wanted to.
          Hide
          Timothy Potter added a comment -

          I don't think this should be committed w/o also including these new commands in bin\solr.cmd for our Windows users. I like Varun's idea of just having managed schema enabled in the existing configs, making managed_schema_configs == basic_configs ... but sounds like he's already handled that issue in another ticket.

          I know the Windows stuff is a major pain, but this addition shouldn't be much more than copy-paste in a few key areas, but then there's the testing. I usually spin up an instance in EC2 to do my Windows testing, so I can help test.

          Show
          Timothy Potter added a comment - I don't think this should be committed w/o also including these new commands in bin\solr.cmd for our Windows users. I like Varun's idea of just having managed schema enabled in the existing configs, making managed_schema_configs == basic_configs ... but sounds like he's already handled that issue in another ticket. I know the Windows stuff is a major pain, but this addition shouldn't be much more than copy-paste in a few key areas, but then there's the testing. I usually spin up an instance in EC2 to do my Windows testing, so I can help test.
          Hide
          Erick Erickson added a comment -

          Timothy Potter Rats! I thought about the windows command file on Saturday but then forgot it completely. Siiigggh. I'll take you up on the testing bits. Thanks for reminding me.

          Varun Thacker Solr-8131 defaults to schemaless mode. While related, these are two different beasts from a user's perspective. I've run into lots of situations where the user wants docs to fail indexing first time, every time when it contains undefined fields, fields of an unintended type, etc.. One of the points of this ticket is to accommodate that desire while being able to define fields via the new admin UI. You'll notice that there aren't even any dynamic fields defined in the schema for instance.

          I did struggle a bit with whether or not to leave all the fieldTypes defined but in the end decided to leave them in. I became convinced that in managed schema mode they're actually more important than in the manually-edited examples we've been shipping for so long.

          bq: If there not being another 5.x release a motive here.

          I'll defer to the release manager here and he's spoken. It's not going in 5.4.

          Upayavira OK, not going in 5.4

          Show
          Erick Erickson added a comment - Timothy Potter Rats! I thought about the windows command file on Saturday but then forgot it completely. Siiigggh. I'll take you up on the testing bits. Thanks for reminding me. Varun Thacker Solr-8131 defaults to schemaless mode. While related, these are two different beasts from a user's perspective. I've run into lots of situations where the user wants docs to fail indexing first time, every time when it contains undefined fields, fields of an unintended type, etc.. One of the points of this ticket is to accommodate that desire while being able to define fields via the new admin UI. You'll notice that there aren't even any dynamic fields defined in the schema for instance. I did struggle a bit with whether or not to leave all the fieldTypes defined but in the end decided to leave them in. I became convinced that in managed schema mode they're actually more important than in the manually-edited examples we've been shipping for so long. bq: If there not being another 5.x release a motive here. I'll defer to the release manager here and he's spoken. It's not going in 5.4. Upayavira OK, not going in 5.4
          Hide
          Erick Erickson added a comment -

          Updated patch with Windows support.

          WARNING: this hasn't been tested yet as I don't have a handy Windows environment. I'll spin up an EC2 instance sometime if someone doesn't beat me to it.

          Show
          Erick Erickson added a comment - Updated patch with Windows support. WARNING: this hasn't been tested yet as I don't have a handy Windows environment. I'll spin up an EC2 instance sometime if someone doesn't beat me to it.
          Hide
          Varun Thacker added a comment -

          Solr-8131 defaults to schemaless mode. While related, these are two different beasts from a user's perspective. I've run into lots of situations where the user wants docs to fail indexing first time, every time when it contains undefined fields, fields of an unintended type, etc.. One of the points of this ticket is to accommodate that desire while being able to define fields via the new admin UI. You'll notice that there aren't even any dynamic fields defined in the schema for instance.

          SOLR-8131 doesn't default to schemaless mode. It defaults to managed schema mode , meaning the Schema APIs will be available to everyone to modify fields/fieldTypes etc. The type guessing is only part of the data_driven example like before.

          Show
          Varun Thacker added a comment - Solr-8131 defaults to schemaless mode. While related, these are two different beasts from a user's perspective. I've run into lots of situations where the user wants docs to fail indexing first time, every time when it contains undefined fields, fields of an unintended type, etc.. One of the points of this ticket is to accommodate that desire while being able to define fields via the new admin UI. You'll notice that there aren't even any dynamic fields defined in the schema for instance. SOLR-8131 doesn't default to schemaless mode. It defaults to managed schema mode , meaning the Schema APIs will be available to everyone to modify fields/fieldTypes etc. The type guessing is only part of the data_driven example like before.
          Hide
          Erick Erickson added a comment -

          Just had an offline chat with Varun. His work on SOLR-8131 and this one crossed. There's no need for a new configset, I've removed it.

          WARNING: It's late and I'll be able to test this in the morning.

          Show
          Erick Erickson added a comment - Just had an offline chat with Varun. His work on SOLR-8131 and this one crossed. There's no need for a new configset, I've removed it. WARNING: It's late and I'll be able to test this in the morning.
          Hide
          Gregory Chanan added a comment -

          Some thoughts:

          I'm on board with having useful commands in one place rather than requiring end users know about zkcli. That said, I don't think adding more uncategorized comands to the same script is the correct way to go. In our distribution (CDH) we have had script that does a bunch of different actions on solr/zk and I've found it's pretty confusing to users what command actually goes where. Ideally the users wouldn't have to know that sort of information (at least when starting up, but I think quickstart is a different enough use case to warrant special consideration), but that's just not practical – consider if the configs znode has ACLs enabled – you need to pass a reasonable endpoint-specific error message back to the user, you have to have an end-point specific mechanism to pass kerberos credentials (does this script work in a secure environment)?, etc. So what will happen if we continue along this path is we'll have a bunch of different useful commands where it is unclear to users what information they actually need to provide without looking it up each time. Heck, I wrote a lot of the commands in our distribution and I get confused .

          So, my suggestion is that we break up the commands into "subtopics" based on the endpoint (the solr http endpoint can be an unnamed default). So long story short, I'd argue for naming these:
          zk upconfig
          zk downconfig
          or something like that.

          Show
          Gregory Chanan added a comment - Some thoughts: I'm on board with having useful commands in one place rather than requiring end users know about zkcli. That said, I don't think adding more uncategorized comands to the same script is the correct way to go. In our distribution (CDH) we have had script that does a bunch of different actions on solr/zk and I've found it's pretty confusing to users what command actually goes where. Ideally the users wouldn't have to know that sort of information (at least when starting up, but I think quickstart is a different enough use case to warrant special consideration), but that's just not practical – consider if the configs znode has ACLs enabled – you need to pass a reasonable endpoint-specific error message back to the user, you have to have an end-point specific mechanism to pass kerberos credentials (does this script work in a secure environment)?, etc. So what will happen if we continue along this path is we'll have a bunch of different useful commands where it is unclear to users what information they actually need to provide without looking it up each time. Heck, I wrote a lot of the commands in our distribution and I get confused . So, my suggestion is that we break up the commands into "subtopics" based on the endpoint (the solr http endpoint can be an unnamed default). So long story short, I'd argue for naming these: zk upconfig zk downconfig or something like that.
          Hide
          Shawn Heisey added a comment -

          Just the other day, I was silently cursing the fact that zkcli was buried in a deep directory under server, rather than living in the bin directory.

          zk upconfig
          zk downconfig

          Assuming that I understand what you're proposing correctly, the command name you've decribed (zk) is very simple. Perhaps more important, it is unlikely to be confused with the zkCli script that comes with zookeeper, which causes confusion with some users trying to use zkcli. I do wonder if maybe it should be name something like zksolr instead, so the fact that it's tied to solr is more obvious. The "zk" name is very acceptable, unless Solr is packaged to LSB standards and the scripts end up someplace like /usr/bin, in which case it will be confusing.

          Show
          Shawn Heisey added a comment - Just the other day, I was silently cursing the fact that zkcli was buried in a deep directory under server, rather than living in the bin directory. zk upconfig zk downconfig Assuming that I understand what you're proposing correctly, the command name you've decribed (zk) is very simple. Perhaps more important, it is unlikely to be confused with the zkCli script that comes with zookeeper, which causes confusion with some users trying to use zkcli. I do wonder if maybe it should be name something like zksolr instead, so the fact that it's tied to solr is more obvious. The "zk" name is very acceptable, unless Solr is packaged to LSB standards and the scripts end up someplace like /usr/bin, in which case it will be confusing.
          Hide
          Mike Drob added a comment -

          I think what Greg is suggesting is that there still be bin/solr as the main entry point, but you can invoke it as solr zk upconfig and solr zk downconfig. Then maybe someday we add solr zk foo and everything is happily namespaced. This is the same pattern that Hadoop used for a while, although they eventually split into multiple executables IIRC.

          Show
          Mike Drob added a comment - I think what Greg is suggesting is that there still be bin/solr as the main entry point, but you can invoke it as solr zk upconfig and solr zk downconfig . Then maybe someday we add solr zk foo and everything is happily namespaced. This is the same pattern that Hadoop used for a while, although they eventually split into multiple executables IIRC.
          Hide
          Jan Høydahl added a comment -

          This discussion is related to SOLR-7074 and SOLR-7233 where we discuss moving zkcli.sh to bin and renaming it as bin/zk, and also let it have the ability to start standalone zookeeper.

          I'm a bit back and forth to what I think is best here. It sounds nice to keep all ZK related things in a new bin/zk, but then if that script is able to bin/zk start, bin/zk stop, bin/zk status etc, then it feels odd to mix in upconfig/downconfig since those are solr-specific and client type of operations. So I'm leaning towards letting bin/solr take care of the solr-specific Zk interaction, so the sequence of events, if we implement SOLR-7074 will be:

          bin/zk start   # We could let it sniff ZK_HOST from solr.in.sh?
          bin/solr start
          bin/solr zk upconfig -d path/to/config -n myconf
          bin/solr create -c mycoll -n myconf
          
          Show
          Jan Høydahl added a comment - This discussion is related to SOLR-7074 and SOLR-7233 where we discuss moving zkcli.sh to bin and renaming it as bin/zk , and also let it have the ability to start standalone zookeeper. I'm a bit back and forth to what I think is best here. It sounds nice to keep all ZK related things in a new bin/zk , but then if that script is able to bin/zk start , bin/zk stop , bin/zk status etc, then it feels odd to mix in upconfig/downconfig since those are solr-specific and client type of operations. So I'm leaning towards letting bin/solr take care of the solr-specific Zk interaction, so the sequence of events, if we implement SOLR-7074 will be: bin/zk start # We could let it sniff ZK_HOST from solr.in.sh? bin/solr start bin/solr zk upconfig -d path/to/config -n myconf bin/solr create -c mycoll -n myconf
          Hide
          Erick Erickson added a comment -

          I changed the code to "namespace" the ZK stuff, good suggestions!

          I'll upload a patch today/tomorrow I hope, need to get the junk working on AWS for testing Windows....

          Show
          Erick Erickson added a comment - I changed the code to "namespace" the ZK stuff, good suggestions! I'll upload a patch today/tomorrow I hope, need to get the junk working on AWS for testing Windows....
          Hide
          Erick Erickson added a comment -

          I think this is ready. I finally got an AWS Windows instance to test the Windows and it seems to check out.

          I still have to precommit and test, but if all that goes well I'll probably be committing this tomorrow morning.

          Show
          Erick Erickson added a comment - I think this is ready. I finally got an AWS Windows instance to test the Windows and it seems to check out. I still have to precommit and test, but if all that goes well I'll probably be committing this tomorrow morning.
          Hide
          Erick Erickson added a comment -

          BTW, this incorporates the "zk" namespace idea, so examples of using this are

          bin/solr zk -upconfig -d directory -n name -z localhost:2181
          bin/solr zk -downconfig -d directory -n name -z localhost:2181

          "directory" in the upconfig command can be one of the pre-configured configsets.

          Show
          Erick Erickson added a comment - BTW, this incorporates the "zk" namespace idea, so examples of using this are bin/solr zk -upconfig -d directory -n name -z localhost:2181 bin/solr zk -downconfig -d directory -n name -z localhost:2181 "directory" in the upconfig command can be one of the pre-configured configsets.
          Hide
          Erick Erickson added a comment -

          And of course I thought of some wording I wanted to change in the help text right after putting the other patch up.

          Show
          Erick Erickson added a comment - And of course I thought of some wording I wanted to change in the help text right after putting the other patch up.
          Hide
          Upayavira added a comment -

          Just a thought - could it be possible to have a -c collection-name parameter to the upconfig command, and issue a collections API call to reload that collection once the configs have been uploaded? This would improve ease of use substantially.

          Show
          Upayavira added a comment - Just a thought - could it be possible to have a -c collection-name parameter to the upconfig command, and issue a collections API call to reload that collection once the configs have been uploaded? This would improve ease of use substantially.
          Hide
          Jan Høydahl added a comment -

          I think we should not mix uploading a config set with reloading a collection. The config you upload may be shared between multiple collections for that sake. I think what we need is the reload command available from the script as well. See SOLR-8400 for followup (not to hijack this one)

          Show
          Jan Høydahl added a comment - I think we should not mix uploading a config set with reloading a collection. The config you upload may be shared between multiple collections for that sake. I think what we need is the reload command available from the script as well. See SOLR-8400 for followup (not to hijack this one)
          Hide
          Erick Erickson added a comment -

          Jan's comments are well taken, and besides I'm about to check this in

          The new UI already has the reload button for collections, so I think that covers it for now. I think if anything I'd prefer to add a "reload" operation to the start script rather than have reloading the collection become a side-effect of upconfig, but I'm not even really wild about that.

          We can always raise another JIRA

          Show
          Erick Erickson added a comment - Jan's comments are well taken, and besides I'm about to check this in The new UI already has the reload button for collections, so I think that covers it for now. I think if anything I'd prefer to add a "reload" operation to the start script rather than have reloading the collection become a side-effect of upconfig, but I'm not even really wild about that. We can always raise another JIRA
          Hide
          Upayavira added a comment -

          I'm happy with Jan's suggestion of an option to the bin/solr command. Basically, what I want to avoid is context switching. If you are at the command line, let that task be completable at the command line, rather than requiring a command line request for half, and an API call or UI click for the rest.

          Show
          Upayavira added a comment - I'm happy with Jan's suggestion of an option to the bin/solr command. Basically, what I want to avoid is context switching. If you are at the command line, let that task be completable at the command line, rather than requiring a command line request for half, and an API call or UI click for the rest.
          Hide
          ASF subversion and git services added a comment -

          Commit 1719099 from Erick Erickson in branch 'dev/trunk'
          [ https://svn.apache.org/r1719099 ]

          SOLR-8378: Add upconfig and downconfig commands to the bin/solr script

          Show
          ASF subversion and git services added a comment - Commit 1719099 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1719099 ] SOLR-8378 : Add upconfig and downconfig commands to the bin/solr script
          Hide
          ASF subversion and git services added a comment -

          Commit 1719119 from Erick Erickson in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1719119 ]

          SOLR-8378: Add upconfig and downconfig commands to the bin/solr script

          Show
          ASF subversion and git services added a comment - Commit 1719119 from Erick Erickson in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1719119 ] SOLR-8378 : Add upconfig and downconfig commands to the bin/solr script

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development