Solr
  1. Solr
  2. SOLR-6952

Re-using data-driven configsets by default is not helpful

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.0, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      When creating collections (I'm using the bin/solr scripts), I think we should automatically copy configsets, especially when running in "getting started mode" or data driven mode.

      I did the following:

      bin/solr create_collection -n foo
      bin/post foo some_data.csv
      

      I then created a second collection with the intention of sending in the same data, but this time run through a python script that changed a value from an int to a string (since it was an enumerated type) and was surprised to see that I got:

      Caused by: java.lang.NumberFormatException: For input string: "NA"
      at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
      at java.lang.Long.parseLong(Long.java:441)

      for my new version of the data that passes in a string instead of an int, as this new collection had only seen strings for that field.

      1. SOLR-6952.patch
        26 kB
        Timothy Potter
      2. SOLR-6952.patch
        14 kB
        Timothy Potter

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          Should it be a feature of the scripts or should it be an option in the Collection create?
          Now that we made the configsets mutable , it makes sense to make it a more accessible feature

          Show
          Noble Paul added a comment - Should it be a feature of the scripts or should it be an option in the Collection create? Now that we made the configsets mutable , it makes sense to make it a more accessible feature
          Hide
          Grant Ingersoll added a comment -

          To work around this, I tried this from a clean install:

          1. bin/solr -cloud
          2. bin/solr create_collectioin foo
          3. bin/solr create_collection foo2

          I then indexed the data to foo using the ints and then followed up and indexed to foo2 using the Strings and much to my dismay, I got the same error and have come to find out that the configset is being shared. This is bad, IMO. At a minimum, data-driven configsets should be copied from the default template and we should never modify the base template for a specific instance. Not sure on the other ones, but my gut says we should copy, not modify.

          Show
          Grant Ingersoll added a comment - To work around this, I tried this from a clean install: bin/solr -cloud bin/solr create_collectioin foo bin/solr create_collection foo2 I then indexed the data to foo using the ints and then followed up and indexed to foo2 using the Strings and much to my dismay, I got the same error and have come to find out that the configset is being shared. This is bad, IMO. At a minimum, data-driven configsets should be copied from the default template and we should never modify the base template for a specific instance. Not sure on the other ones, but my gut says we should copy, not modify.
          Hide
          Noble Paul added a comment -

          Keeping in mind ease the script by default should create a copy unless specified otherwise

          Show
          Noble Paul added a comment - Keeping in mind ease the script by default should create a copy unless specified otherwise
          Hide
          Timothy Potter added a comment - - edited

          How should the user specify they want to reuse a config that already exists in ZooKeeper instead of creating a new config in ZK by copying the template? The default behavior will copy the template and name the config the same name as the collection in ZK. Maybe something like a "-sharedConfig" option?

          bin/solr create_collection -n foo -sharedConfig data_driven_schema_configs
          

          This means to use the data_driven_schema_configs as-is in ZooKeeper and not copy it to a new config directory. I like making the "shared" concept explicit in the param / help for the command but open to other approaches too.

          Alternatively, we can change the interface to create_collection / create_core to use a -t parameter (t for template) and then make the -c optional, giving us:

          Example 1:

          bin/solr create_collection -n foo -t data_driven_schema_configs
          

          Result will be to copy the data_driven_schema_configs directory to ZooKeeper as /configs/foo

          Example 2:

          bin/solr create_collection -n foo -t data_driven_schema_configs -c shared
          

          Result will be to copy the data_driven_schema_configs directory to ZooKeeper as /configs/shared

          Of course, if /configs/shared already exists, then it will be used without uploading anything new ...

          Show
          Timothy Potter added a comment - - edited How should the user specify they want to reuse a config that already exists in ZooKeeper instead of creating a new config in ZK by copying the template? The default behavior will copy the template and name the config the same name as the collection in ZK. Maybe something like a "-sharedConfig" option? bin/solr create_collection -n foo -sharedConfig data_driven_schema_configs This means to use the data_driven_schema_configs as-is in ZooKeeper and not copy it to a new config directory. I like making the "shared" concept explicit in the param / help for the command but open to other approaches too. Alternatively, we can change the interface to create_collection / create_core to use a -t parameter (t for template) and then make the -c optional, giving us: Example 1: bin/solr create_collection -n foo -t data_driven_schema_configs Result will be to copy the data_driven_schema_configs directory to ZooKeeper as /configs/foo Example 2: bin/solr create_collection -n foo -t data_driven_schema_configs -c shared Result will be to copy the data_driven_schema_configs directory to ZooKeeper as /configs/shared Of course, if /configs/shared already exists, then it will be used without uploading anything new ...
          Hide
          Noble Paul added a comment -

          I would say , first we should add support for this in collection API with an extra request param. Collection API should copy a config to a new dir if that param is passed.

          The script should use that param ON by default . The reason is going forward config is editable, through configoverlay.json and params.json . So, shared configs are dangerous and unsuspecting users will not know why things are screwed up

          example I would prefer

          bin/solr create_collection -n foo -t data_driven_schema_configs -c -shareconfig
          
          Show
          Noble Paul added a comment - I would say , first we should add support for this in collection API with an extra request param. Collection API should copy a config to a new dir if that param is passed. The script should use that param ON by default . The reason is going forward config is editable, through configoverlay.json and params.json . So, shared configs are dangerous and unsuspecting users will not know why things are screwed up example I would prefer bin/solr create_collection -n foo -t data_driven_schema_configs -c -shareconfig
          Hide
          Timothy Potter added a comment -

          Collection API has nothing to do with loading a configuration into ZooKeeper. Currently, you use zkCli.sh/bat to load a configuration directory into ZooKeeper and when doing so, you can assign any name you want to the configuration directory that is uploaded. Since bin/solr is being fixed to handle copying vs. sharing by default, I don't think there are any changes needed to the Collection API.

          Show
          Timothy Potter added a comment - Collection API has nothing to do with loading a configuration into ZooKeeper. Currently, you use zkCli.sh/bat to load a configuration directory into ZooKeeper and when doing so, you can assign any name you want to the configuration directory that is uploaded. Since bin/solr is being fixed to handle copying vs. sharing by default, I don't think there are any changes needed to the Collection API.
          Hide
          Noble Paul added a comment - - edited

          Collection API has nothing to do with loading a configuration into ZooKeeper

          I know that. I meant to say that if someone is not using the script to create a collection ,(and using the http API) he misses the fun

          Show
          Noble Paul added a comment - - edited Collection API has nothing to do with loading a configuration into ZooKeeper I know that. I meant to say that if someone is not using the script to create a collection ,(and using the http API) he misses the fun
          Hide
          Timothy Potter added a comment -

          Here's a patch that implements the desired behavior. Easiest way to understand is to look at a few examples:

          Example 1

          bin/solr create -n foo
          

          Will upload the data_driven_schema_configs directory (the default) into ZooKeeper as /configs/foo, i.e. the data_driven_schema_configs "template" is copied to a unique config directory in ZooKeeper using the name of the collection you are creating.

          Example 2

          bin/solr create -n foo2 -t basic_configs -c SharedBasicSchema
          

          Will upload the basic_configs directory into ZooKeeper as /configs/SharedBasicSchema. If one wants to reuse the SharedBasicSchema configuration directory when creating another collection, they can just do: bin/solr create -n foo3 -c SharedBasicSchema

          If we're happy with this approach, I'll port over the changes to solr.cmd (for Windows)

          Show
          Timothy Potter added a comment - Here's a patch that implements the desired behavior. Easiest way to understand is to look at a few examples: Example 1 bin/solr create -n foo Will upload the data_driven_schema_configs directory (the default) into ZooKeeper as /configs/foo, i.e. the data_driven_schema_configs "template" is copied to a unique config directory in ZooKeeper using the name of the collection you are creating. Example 2 bin/solr create -n foo2 -t basic_configs -c SharedBasicSchema Will upload the basic_configs directory into ZooKeeper as /configs/SharedBasicSchema. If one wants to reuse the SharedBasicSchema configuration directory when creating another collection, they can just do: bin/solr create -n foo3 -c SharedBasicSchema If we're happy with this approach, I'll port over the changes to solr.cmd (for Windows)
          Hide
          Timothy Potter added a comment -

          Actually, since I'm tweaking the arg names of bin/solr create options, I think I'll just line them up with what was already being done in zkcli.sh. Specifically, I'm going to change the options to be:

          -c = name of collection or core to create (was -n)
          -d = configuration directory to copy (was -c)
          -n = configuration name (didn't exist)
          
          Show
          Timothy Potter added a comment - Actually, since I'm tweaking the arg names of bin/solr create options, I think I'll just line them up with what was already being done in zkcli.sh. Specifically, I'm going to change the options to be: -c = name of collection or core to create (was -n) -d = configuration directory to copy (was -c) -n = configuration name (didn't exist)
          Hide
          Noble Paul added a comment -

          What r the long names ?

          Show
          Noble Paul added a comment - What r the long names ?
          Hide
          Timothy Potter added a comment -

          same as zkcli.sh

          Show
          Timothy Potter added a comment - same as zkcli.sh
          Hide
          Timothy Potter added a comment -

          Here's an updated patch that changes around some of the parameter names to be consistent with the zkcli.sh script. I also tackled the "create" alias (SOLR-6933) in this patch since it was easier to address both issues with one patch.

          Example 1

          bin/solr create -c foo
          

          This is equivalent to doing:

          bin/solr create -c foo -d data_driven_schema_configs
          

          or

          bin/solr create -c foo -d data_driven_schema_configs -n foo
          

          The create action will upload the data_driven_schema_configs directory (the default) into ZooKeeper as /configs/foo, i.e. the data_driven_schema_configs "template" is copied to a unique config directory in ZooKeeper using the name of the collection you are creating.

          Example 2

          bin/solr create -c foo2 -d basic_configs -n SharedBasicSchema
          

          This will upload the basic_configs directory into ZooKeeper as /configs/SharedBasicSchema. If one wants to reuse the SharedBasicSchema configuration directory when creating another collection, they can just do:

          bin/solr create -c foo3 -n SharedBasicSchema
          

          Going to start porting these changes to the Windows solr.cmd, so please speak up now or this is what we'll have for 5.0

          Show
          Timothy Potter added a comment - Here's an updated patch that changes around some of the parameter names to be consistent with the zkcli.sh script. I also tackled the "create" alias ( SOLR-6933 ) in this patch since it was easier to address both issues with one patch. Example 1 bin/solr create -c foo This is equivalent to doing: bin/solr create -c foo -d data_driven_schema_configs or bin/solr create -c foo -d data_driven_schema_configs -n foo The create action will upload the data_driven_schema_configs directory (the default) into ZooKeeper as /configs/foo, i.e. the data_driven_schema_configs "template" is copied to a unique config directory in ZooKeeper using the name of the collection you are creating. Example 2 bin/solr create -c foo2 -d basic_configs -n SharedBasicSchema This will upload the basic_configs directory into ZooKeeper as /configs/SharedBasicSchema. If one wants to reuse the SharedBasicSchema configuration directory when creating another collection, they can just do: bin/solr create -c foo3 -n SharedBasicSchema Going to start porting these changes to the Windows solr.cmd, so please speak up now or this is what we'll have for 5.0
          Hide
          ASF subversion and git services added a comment -

          Commit 1651231 from Timothy Potter in branch 'dev/trunk'
          [ https://svn.apache.org/r1651231 ]

          SOLR-6952: bin/solr create action should copy configset directory instead of reusing an existing configset in ZooKeeper by default; commit also includes fix for SOLR-6933 - create alias

          Show
          ASF subversion and git services added a comment - Commit 1651231 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1651231 ] SOLR-6952 : bin/solr create action should copy configset directory instead of reusing an existing configset in ZooKeeper by default; commit also includes fix for SOLR-6933 - create alias
          Hide
          ASF subversion and git services added a comment -

          Commit 1651233 from Timothy Potter in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1651233 ]

          SOLR-6952: bin/solr create action should copy configset directory instead of reusing an existing configset in ZooKeeper by default; commit also includes fix for SOLR-6933 - create alias

          Show
          ASF subversion and git services added a comment - Commit 1651233 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1651233 ] SOLR-6952 : bin/solr create action should copy configset directory instead of reusing an existing configset in ZooKeeper by default; commit also includes fix for SOLR-6933 - create alias
          Hide
          Noble Paul added a comment -

          This has broken the blob store API

          The schema and config are automatically created by the system for .system collection

          There should be a way to create a colection without creating a configset

           bin/solr create -c .system -n .system
          
          Show
          Noble Paul added a comment - This has broken the blob store API The schema and config are automatically created by the system for .system collection There should be a way to create a colection without creating a configset bin/solr create -c .system -n .system
          Hide
          Timothy Potter added a comment -

          There should be a way to create a colection without creating a configset

          I disagree with that requirement. If something special is needed for .system I think we shouldn't expose that at the user interface level (which bin/solr create is).

          Show
          Timothy Potter added a comment - There should be a way to create a colection without creating a configset I disagree with that requirement. If something special is needed for .system I think we shouldn't expose that at the user interface level (which bin/solr create is).
          Hide
          Noble Paul added a comment -

          This is opened as a new ticket SOLR-7502

          Show
          Noble Paul added a comment - This is opened as a new ticket SOLR-7502

            People

            • Assignee:
              Timothy Potter
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development