Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10574

Choose a default configset for Solr 7

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Currently, the data_driven_schema_configs is the default configset when collections are created using the bin/solr script and no configset is specified.
      However, that may not be the best choice. We need to decide which is the best choice, out of the box, considering many users might create collections without knowing about the concept of a configset going forward.

      (See also SOLR-10272)

      Proposed changes:

      1. Remove data_driven_schema_configs and basic_configs
      2. Introduce a combined configset, _default based on the above two configsets.
      3. Build a "toggleable" data driven functionality into _default

      Usage:

      1. Create a collection (using _default configset)
      2. Data driven / schemaless functionality is enabled by default; so just start indexing your documents.
      3. If don't want data driven / schemaless, disable this behaviour:
        curl http://host:8983/solr/coll1/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
        
      4. Create schema fields using schema API, and index documents
      1. SOLR-10574.patch
        10 kB
        Ishan Chattopadhyaya
      2. SOLR-10574.patch
        6 kB
        Ishan Chattopadhyaya
      3. SOLR-10574.patch
        6 kB
        Ishan Chattopadhyaya
      4. SOLR-10574.patch
        236 kB
        Ishan Chattopadhyaya
      5. SOLR-10574-refguide.patch
        17 kB
        Ishan Chattopadhyaya

        Issue Links

          Activity

          Hide
          ctargett Cassandra Targett added a comment -

          I took a quick look at refGuide patch and I think we should commit it

          Thanks Jan Høydahl, I was on vacation last week. I'll give it another review later this week once I'm caught up again.

          Show
          ctargett Cassandra Targett added a comment - I took a quick look at refGuide patch and I think we should commit it Thanks Jan Høydahl , I was on vacation last week. I'll give it another review later this week once I'm caught up again.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 04054fc524432388c7cea722b766815a950ca736 in lucene-solr's branch refs/heads/branch_7_0 from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04054fc ]

          SOLR-10574, SOLR-10272: Refguide documentation for _default configset

          (cherry picked from commit 112bdda)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 04054fc524432388c7cea722b766815a950ca736 in lucene-solr's branch refs/heads/branch_7_0 from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04054fc ] SOLR-10574 , SOLR-10272 : Refguide documentation for _default configset (cherry picked from commit 112bdda)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d3acebcfe58320fa94517517caa1e3a537640a51 in lucene-solr's branch refs/heads/branch_7x from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d3acebc ]

          SOLR-10574, SOLR-10272: Refguide documentation for _default configset

          (cherry picked from commit 112bdda)

          Show
          jira-bot ASF subversion and git services added a comment - Commit d3acebcfe58320fa94517517caa1e3a537640a51 in lucene-solr's branch refs/heads/branch_7x from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d3acebc ] SOLR-10574 , SOLR-10272 : Refguide documentation for _default configset (cherry picked from commit 112bdda)
          Hide
          janhoy Jan Høydahl added a comment -

          I'll cherry pick it for you and push as part of my next commit.

          Show
          janhoy Jan Høydahl added a comment - I'll cherry pick it for you and push as part of my next commit.
          Hide
          janhoy Jan Høydahl added a comment -

          Will you also backport to branch_7x and branch_7_0?

          Show
          janhoy Jan Høydahl added a comment - Will you also backport to branch_7x and branch_7_0?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 112bdda47eb9827e80500c767d09422efeb9b91e in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=112bdda ]

          SOLR-10574, SOLR-10272: Refguide documentation for _default configset

          Show
          jira-bot ASF subversion and git services added a comment - Commit 112bdda47eb9827e80500c767d09422efeb9b91e in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=112bdda ] SOLR-10574 , SOLR-10272 : Refguide documentation for _default configset
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Sure, I'll commit it shortly.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Sure, I'll commit it shortly.
          Hide
          janhoy Jan Høydahl added a comment -

          I took a quick look at refGuide patch and I think we should commit it, as I'm already starting to bump into edits for other issues that touch the same lines...

          Show
          janhoy Jan Høydahl added a comment - I took a quick look at refGuide patch and I think we should commit it, as I'm already starting to bump into edits for other issues that touch the same lines...
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Here are the documentation changes for SOLR-10574. Cassandra Targett, can you please review?
          I'm still working on the changes for SOLR-10272.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Here are the documentation changes for SOLR-10574 . Cassandra Targett , can you please review? I'm still working on the changes for SOLR-10272 .
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Documentation for this feature remains, which I'll complete as part of SOLR-10272. Thanks everyone for feedback and help!

          If there's something that remains to be done, but doesn't have a child JIRA here, please create a sub-task and add here.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Documentation for this feature remains, which I'll complete as part of SOLR-10272 . Thanks everyone for feedback and help! If there's something that remains to be done, but doesn't have a child JIRA here, please create a sub-task and add here.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4a7fc59ad1b3375e9a7572f694e16cd1aef0b28 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4a7fc5 ]

          SOLR-10574: Adding _default configset, replacing data_driven_schema_configs and basic_configs

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4a7fc59ad1b3375e9a7572f694e16cd1aef0b28 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4a7fc5 ] SOLR-10574 : Adding _default configset, replacing data_driven_schema_configs and basic_configs
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I'm working on this on branch jira/solr-10574. Somehow, I am unable to understand why CacheHeaderTest#testCacheVetoHandler is failing; working on it.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I'm working on this on branch jira/solr-10574. Somehow, I am unable to understand why CacheHeaderTest#testCacheVetoHandler is failing; working on it.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1a58412e4ac9ff85f82696da8f3b0597ca45617e in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a58412 ]

          SOLR-10574: Reverting previous commits to tackle test failues

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1a58412e4ac9ff85f82696da8f3b0597ca45617e in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a58412 ] SOLR-10574 : Reverting previous commits to tackle test failues
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Thanks, Steve, Jan, Cassandra. I'll take a look at the tests and the numeric types right away.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Thanks, Steve, Jan, Cassandra. I'll take a look at the tests and the numeric types right away.
          Hide
          ctargett Cassandra Targett added a comment -

          Looks like there are some failing tests after this commit

          I'm going to guess these failures are at least related to a problem I have in a local "master" build where I had to manually enter "_default" as the name of the configset. None of the options were acceptable (because they no longer exist).

          Show
          ctargett Cassandra Targett added a comment - Looks like there are some failing tests after this commit I'm going to guess these failures are at least related to a problem I have in a local "master" build where I had to manually enter "_default" as the name of the configset. None of the options were acceptable (because they no longer exist).
          Hide
          steve_rowe Steve Rowe added a comment -

          Looks like there are some failing tests after this commit - see https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1384/ and https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19916/. I think these are the affected tests:

          org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection
          org.apache.solr.util.TestSolrCLIRunExample.testInteractiveSolrCloudExample
          org.apache.solr.util.TestSolrCLIRunExample.testSchemalessExample

          Show
          steve_rowe Steve Rowe added a comment - Looks like there are some failing tests after this commit - see https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1384/ and https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19916/ . I think these are the affected tests: org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection org.apache.solr.util.TestSolrCLIRunExample.testInteractiveSolrCloudExample org.apache.solr.util.TestSolrCLIRunExample.testSchemalessExample
          Hide
          janhoy Jan Høydahl added a comment -

          I think your commit 7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d inadvertedly changed the types of auto guessed fields from pints etc back to tints for file basic_configs/conf/solrconfig.xml. See https://github.com/apache/lucene-solr/commit/7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d#diff-62e510c3b695ae74c99a1aee9d8a22a1

          Show
          janhoy Jan Høydahl added a comment - I think your commit 7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d inadvertedly changed the types of auto guessed fields from pints etc back to tints for file basic_configs/conf/solrconfig.xml . See https://github.com/apache/lucene-solr/commit/7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d#diff-62e510c3b695ae74c99a1aee9d8a22a1
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Committed the unified configurations.

          Created a separate SOLR-10920 for the warning part. I'll take it up together with SOLR-10902.

          Does someone have time for tackling SOLR-10887 and SOLR-10903, please? I'm planning on tackling SOLR-10272 after this and might not have time to get to these soon.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Committed the unified configurations. Created a separate SOLR-10920 for the warning part. I'll take it up together with SOLR-10902 . Does someone have time for tackling SOLR-10887 and SOLR-10903 , please? I'm planning on tackling SOLR-10272 after this and might not have time to get to these soon.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0ec9d64d816ec907235cddef972d7c45fc78f332 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0ec9d64 ]

          SOLR-10574: Changes and upgrade notes

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0ec9d64d816ec907235cddef972d7c45fc78f332 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0ec9d64 ] SOLR-10574 : Changes and upgrade notes
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d1c807dd70ea77963877d753050e15512eb698a0 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d1c807d ]

          SOLR-10574: Adding unified _default config set

          Show
          jira-bot ASF subversion and git services added a comment - Commit d1c807dd70ea77963877d753050e15512eb698a0 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d1c807d ] SOLR-10574 : Adding unified _default config set
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c2429b ]

          SOLR-10574: Adding data driven support to basic_confs and adding payload fields

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7c2429bebf82d1ff5ab2d4dcabb07e57659c0b0d in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c2429b ] SOLR-10574 : Adding data driven support to basic_confs and adding payload fields
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Thanks David, I'll add the warning part in the patch.
          I've added SOLR-10902 for enabling/disabling data driven as a CREATE parameter. Added SOLR-10903 for the edismax discussions (FYI Yonik Seeley, Alexandre Rafalovitch).

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Thanks David, I'll add the warning part in the patch. I've added SOLR-10902 for enabling/disabling data driven as a CREATE parameter. Added SOLR-10903 for the edismax discussions (FYI Yonik Seeley , Alexandre Rafalovitch ).
          Hide
          dsmiley David Smiley added a comment -

          Alexandre Rafalovitch to your proposal 5 days ago, it makes sense to me but I think the file extension of the schema is a distraction to this issue we're having a conversation on. It's difficult to try to stay on-topic; credit to Ishan Chattopadhyaya to deflecting it to SOLR-10887.

          +0 to Ishan's latest patch... with a heavy sigh I see data-driven on by default and I'm going to have to start memorizing how to disable the darned thing. Commit away. Hopefully warnings etc. can be added still? Another issue? I don't want us all to collectively feel the need to warn users (on solr-user or IRC or wherever) when they hit a problem related to data driven when Solr itself can warn them against this setting.

          Show
          dsmiley David Smiley added a comment - Alexandre Rafalovitch to your proposal 5 days ago, it makes sense to me but I think the file extension of the schema is a distraction to this issue we're having a conversation on. It's difficult to try to stay on-topic; credit to Ishan Chattopadhyaya to deflecting it to SOLR-10887 . +0 to Ishan's latest patch... with a heavy sigh I see data-driven on by default and I'm going to have to start memorizing how to disable the darned thing. Commit away. Hopefully warnings etc. can be added still? Another issue? I don't want us all to collectively feel the need to warn users (on solr-user or IRC or wherever) when they hit a problem related to data driven when Solr itself can warn them against this setting.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Alexandre, Erick, should we spin off the edismax based searching on all fields in a separate issue and tackle as a follow up to this issue (after the patch here is committed)?

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Alexandre, Erick, should we spin off the edismax based searching on all fields in a separate issue and tackle as a follow up to this issue (after the patch here is committed)?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Adding patch for the proposed changes (for easy reviewing, adding on top of basic_confs).
          I'm also working on a committable patch (that contains the script changes for this). That patch would remove both data_driven_schema_configs and basic_confs.
          As suggested by David, I'll try to commit it in two commits for better history reviewing.

          Note: this patch does not contain the managed-schema -> managed-schema.xml changes. It can be dealt with separately in SOLR-10887.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Adding patch for the proposed changes (for easy reviewing, adding on top of basic_confs). I'm also working on a committable patch (that contains the script changes for this). That patch would remove both data_driven_schema_configs and basic_confs. As suggested by David, I'll try to commit it in two commits for better history reviewing. Note: this patch does not contain the managed-schema -> managed-schema.xml changes. It can be dealt with separately in SOLR-10887 .
          Hide
          erickerickson Erick Erickson added a comment -

          Jan Høydahl Alexandre Rafalovitch Defaulting to searching all fields works for me, and in fact is superior to the catch-all field IMO since when the user found out that the queries were slow it would be a configuration change rather than a re-index. This latter would be necessary in the catch-all case to get rid of the extra data in the text field.

          I'm happy with any solution that satisfies the condition that if a new user indexes some data then does a non-fielded query they get results.

          Show
          erickerickson Erick Erickson added a comment - Jan Høydahl Alexandre Rafalovitch Defaulting to searching all fields works for me, and in fact is superior to the catch-all field IMO since when the user found out that the queries were slow it would be a configuration change rather than a re-index. This latter would be necessary in the catch-all case to get rid of the extra data in the text field. I'm happy with any solution that satisfies the condition that if a new user indexes some data then does a non-fielded query they get results.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          I made a proposal 5 days ago in this issue that I thought was an interesting alternative to at least discuss (search for autofields). But I think it may have been lost in all other activities here. I would love somebody to comment on it even if it is not a valid approach in the end for this specific problem.

          Show
          arafalov Alexandre Rafalovitch added a comment - I made a proposal 5 days ago in this issue that I thought was an interesting alternative to at least discuss (search for autofields). But I think it may have been lost in all other activities here. I would love somebody to comment on it even if it is not a valid approach in the end for this specific problem.
          Hide
          janhoy Jan Høydahl added a comment -

          Jan Hoydahl to be present in schema: yes used by default:no

          Long-term I want no catch-all field at all. Because no matter how much we document and try to educate, reality is that the defaults (or at least the practices used by the defaults) will end up in production for a high percentage of installs.

          Instead let's consider an ability for the ootb default configsets to auto search all fields if neither df or qf are specified. A potential fast-track solution is to extend SimpleQParserPlugin to interpret qf=* as a catch-all mode where it simply iterates all indexed fields in schema and searches across these. We could then add to our /select and /query handlers in the default config sets: defType=simple&qf=*. Or we could make simple the new default parser instead of lucene (horrible name btw). This could of course be introduced in 7.x and start with catchall text in 7.0.0...

          With a qf=* catch-all, the WARNING in docs needs to instead be a warning that qf should be tuned or else the query may be too expensive for indices with many fields. Another issue with this approach is for installs where the schema lists hundreds of fields but most docs in the index contain only a handful fields. It could perhaps be possible to do a two-phase search where the first phase is to compute fields in use for the doc set after applying all fq's, and then phase 2 to search across those fields.

          Show
          janhoy Jan Høydahl added a comment - Jan Hoydahl to be present in schema: yes used by default:no Long-term I want no catch-all field at all. Because no matter how much we document and try to educate, reality is that the defaults (or at least the practices used by the defaults) will end up in production for a high percentage of installs. Instead let's consider an ability for the ootb default configsets to auto search all fields if neither df or qf are specified. A potential fast-track solution is to extend SimpleQParserPlugin to interpret qf=* as a catch-all mode where it simply iterates all indexed fields in schema and searches across these. We could then add to our /select and /query handlers in the default config sets: defType=simple&qf=* . Or we could make simple the new default parser instead of lucene (horrible name btw). This could of course be introduced in 7.x and start with catchall text in 7.0.0... With a qf=* catch-all, the WARNING in docs needs to instead be a warning that qf should be tuned or else the query may be too expensive for indices with many fields. Another issue with this approach is for installs where the schema lists hundreds of fields but most docs in the index contain only a handful fields. It could perhaps be possible to do a two-phase search where the first phase is to compute fields in use for the doc set after applying all fq's, and then phase 2 to search across those fields.
          Hide
          erickerickson Erick Erickson added a comment -

          I'll add a yes to managed schema having an xml extension. Agree make it a separate issue.

          Catch-all text field: yes. Enabled by default: yes with warning.

          Since this is not for production anyway, might as well make it as easy as possible to get started. If we're going to enable data_driven, we should have a catch-all field enabled by default. Neither one is something I'd recommend going to production with without close examination.

          So to me it's a "both or neither" preference. The point of having data_driven as the default is to lower first-time barriers to entry. If the catch-all field is there and it's the pre-configured "df" for the request handlers people get results the first time they index and search without even knowing they have fields in their documents. Otherwise they're left scratching their heads because they indexed stuff but didn't find anything.

          So we'd then tell them "Examine your index to see what fields were actually defined, and do fielded search ('cause they don't even necessarily know what the docs look like!). Or enable a catch-all field and re-index", which is a minimal improvement in first-time experience over what we have now, at least they were able to index docs if not successfully search them the first time they tried.

          Perhaps the warning (in the schema file and in startup guides or maybe "taking Solr to production") is something akin to "add-unknown-fields-to-the-schema and the default behavior of copying all fields to text are options intended for getting started. Production systems rarely enable either of these two options. See solrconfig.xml and managed-schema(.xml) for the text 'RARELY ENABLED FOR PRODUCTION' ". Or something like that.

          Show
          erickerickson Erick Erickson added a comment - I'll add a yes to managed schema having an xml extension. Agree make it a separate issue. Catch-all text field: yes. Enabled by default: yes with warning. Since this is not for production anyway, might as well make it as easy as possible to get started. If we're going to enable data_driven, we should have a catch-all field enabled by default. Neither one is something I'd recommend going to production with without close examination. So to me it's a "both or neither" preference. The point of having data_driven as the default is to lower first-time barriers to entry. If the catch-all field is there and it's the pre-configured "df" for the request handlers people get results the first time they index and search without even knowing they have fields in their documents. Otherwise they're left scratching their heads because they indexed stuff but didn't find anything. So we'd then tell them "Examine your index to see what fields were actually defined, and do fielded search ('cause they don't even necessarily know what the docs look like!). Or enable a catch-all field and re-index", which is a minimal improvement in first-time experience over what we have now, at least they were able to index docs if not successfully search them the first time they tried. Perhaps the warning (in the schema file and in startup guides or maybe "taking Solr to production") is something akin to "add-unknown-fields-to-the-schema and the default behavior of copying all fields to text are options intended for getting started. Production systems rarely enable either of these two options. See solrconfig.xml and managed-schema(.xml) for the text 'RARELY ENABLED FOR PRODUCTION' ". Or something like that.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          I'll add a patch with data-driven enabled by default, catch all field present (but not used). I'd prefer if we did the managed-schema to managed-schema.xml change as a separate issue SOLR-10887 (since it requires backcompat handling, and I don't want to complicate this issue).

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited I'll add a patch with data-driven enabled by default, catch all field present (but not used). I'd prefer if we did the managed-schema to managed-schema.xml change as a separate issue SOLR-10887 (since it requires backcompat handling, and I don't want to complicate this issue).
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          So far, here's what I've summarized from comments above. Please correct me if I understood your position incorrectly.

          Data driven enabled/disabled by default

          Ishan Chattopadhyaya	enabled
          David Smiley		both: disabled is fine, enabled is fine with adequete warning
          Shawn Heisey		disabled
          Jan Hoydahl		enabled
          Erick Erickson		enabled with warning
          Noble Paul		disabled
          Yonik Seeley		disabled (but no strong preference?)
          
          Disabled - 3.5
          Enabled - 3.5
          
          Decision: Split (until someone pitches in or changes vote)
          

          managed-schema should have .xml extension?

          Ishan Chattopadhyaya	no
          Varun Thacker		yes
          Alexandre Rafalovich	yes (judging by comments)
          Jan Hoydahl		yes
          Yonik Seeley		yes
          
          Decision: .xml should be back, with backcompat handling
          

          Catch all text to be used as copy field target by default?

          Yonik Seeley		to be present in schema:yes		used by default:no
          Jan Hoydahl		to be present in schema:yes		used by default:no
          
          Decision: (discussions are inconclusive yet)
          
          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - So far, here's what I've summarized from comments above. Please correct me if I understood your position incorrectly. Data driven enabled/disabled by default Ishan Chattopadhyaya enabled David Smiley both: disabled is fine, enabled is fine with adequete warning Shawn Heisey disabled Jan Hoydahl enabled Erick Erickson enabled with warning Noble Paul disabled Yonik Seeley disabled (but no strong preference?) Disabled - 3.5 Enabled - 3.5 Decision: Split (until someone pitches in or changes vote) managed-schema should have .xml extension? Ishan Chattopadhyaya no Varun Thacker yes Alexandre Rafalovich yes (judging by comments) Jan Hoydahl yes Yonik Seeley yes Decision: .xml should be back, with backcompat handling Catch all text to be used as copy field target by default? Yonik Seeley to be present in schema:yes used by default :no Jan Hoydahl to be present in schema:yes used by default :no Decision: (discussions are inconclusive yet)
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          there are int/ints pint/pints where the "s" variant is MV but the bare variant is SV.

          Yeah, those are the fieldTypes, and we have corresponding dynamicField definitions. But are the single-valued types now used when guessing types? They were not in the past, but perhaps that's changed.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - there are int/ints pint/pints where the "s" variant is MV but the bare variant is SV. Yeah, those are the fieldTypes, and we have corresponding dynamicField definitions. But are the single-valued types now used when guessing types? They were not in the past, but perhaps that's changed.
          Hide
          erickerickson Erick Erickson added a comment -

          Sorting Of course. Gah...

          But no, not everything is MV in data-driven, there are int/ints pint/pints where the "s" variant is MV but the bare variant is SV.

          Seeing "pint" really startled me since they've been deprecated since 4.8 or so...

          Show
          erickerickson Erick Erickson added a comment - Sorting Of course. Gah... But no, not everything is MV in data-driven, there are int/ints pint/pints where the "s" variant is MV but the bare variant is SV. Seeing "pint" really startled me since they've been deprecated since 4.8 or so...
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          If we have a text field but don't copy anything to it, what good is it?

          It gives an easier starting point and a convention for when it is needed/desired.

          their first queries would all get zero hits since they'd inevitably just to a "q=some terms"

          How would they even know how to do that much? What URL do they hit, and what parameters to pass? They must be following a tutorial or some documentation, no?

          Why not make everything multi-valued by default with data_driven?

          I think it is... and if you've tried it, the experience is bad for anything but text fields. If you try to sort on one of those fields, or use one of them in a function query, you'll get exceptions. I tried writing a basic tutorial using schemaless in the past. I couldn't do it and had to resort to dynamic fields.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - If we have a text field but don't copy anything to it, what good is it? It gives an easier starting point and a convention for when it is needed/desired. their first queries would all get zero hits since they'd inevitably just to a "q=some terms" How would they even know how to do that much? What URL do they hit, and what parameters to pass? They must be following a tutorial or some documentation, no? Why not make everything multi-valued by default with data_driven? I think it is... and if you've tried it, the experience is bad for anything but text fields. If you try to sort on one of those fields, or use one of them in a function query, you'll get exceptions. I tried writing a basic tutorial using schemaless in the past. I couldn't do it and had to resort to dynamic fields.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          For text, what if we switched to eDisMax and used either qf multifield expansion or variable substitution to generate a synthetic field reference. As in: https://home.apache.org/~hossman/rev2016/#/6 or https://home.apache.org/~hossman/rev2016/#/17

          And the field adding code would append to alias or expansion variable definition.

          So, it would be something like

          defType=edismax
          qf = id price autofields
          f.autofields.qf = auto1 auto2 auto3

          or

          defType=edismax
          qf = id price ${autofields}
          autofields = auto1 auto2 auto3

          This could actually solve some of the problems with explaining why the original field types are ignored during the default search, allow people to edit the mapping to remove some particularly-large fields, etc.

          And as to big warnings, remember that the scheme gets rewritten after the first non-manual modifications. So, all the warnings and explanations in data-driven schema disappear as soon as it is actually used. A bit of a catch-22 there.

          Show
          arafalov Alexandre Rafalovitch added a comment - For text , what if we switched to eDisMax and used either qf multifield expansion or variable substitution to generate a synthetic field reference. As in: https://home.apache.org/~hossman/rev2016/#/6 or https://home.apache.org/~hossman/rev2016/#/17 And the field adding code would append to alias or expansion variable definition. So, it would be something like defType=edismax qf = id price autofields f.autofields.qf = auto1 auto2 auto3 or defType=edismax qf = id price ${autofields} autofields = auto1 auto2 auto3 This could actually solve some of the problems with explaining why the original field types are ignored during the default search, allow people to edit the mapping to remove some particularly-large fields, etc. And as to big warnings, remember that the scheme gets rewritten after the first non-manual modifications. So, all the warnings and explanations in data-driven schema disappear as soon as it is actually used. A bit of a catch-22 there.
          Hide
          erickerickson Erick Erickson added a comment - - edited

          If we have a _text_ field but don't copy anything to it, what good is it? The user has to get in there and change the schema to search. If they have to intervene before using it that would defeat the purpose of making it zero-touch to start. And unless we do something with wildcard fields being searched by default or the _all query type, their first queries would all get zero hits since they'd inevitably just to a "q=some terms" which would search against _text_

          ------------

          Why not make everything multi-valued by default with data_driven? What functionality would be lost? Then switching to single-valued becomes an optimization if they need to tune things.

          Jan Høydahl bq: update the field to multi valued before passing on the update request.

          I don't think that would work, at least not without a lot of work. I tried a quick experiment just changing multiValued from false to true then updating a couple of docs. When I tried grouping and faceting got "org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: eoe". Don't particularly know whether it was the grouping or the faceting that caused it. Maybe we could fix this up but I don't think it would be simple.

          Just used default master techproducts schema, string type. DocValues is false. But DocValues MV fields are SORTED_SET so they'd have their own issues I'd guess.

          ------------------
          The friction here I think, is that that the zero-touch startup requires us to make some decisions that we know aren't valid for production systems. At least not at scale. Throwing everything into a _text_ field won't scale. Searching all text fields all the time won't scale, at least not at the scale I often see. At that scale users must hand-tune the schema. Or at least understand the tradeoffs. But not doing one of those things requires that new users struggle with schema definitions before doing anything.

          Maybe we can resolve this tension by using one of the not-for-production solutions but raising some flags to the user that they're, well, not for production use at scale? Take the _all query type suggestion for instance. If we go that route then provide a request handler called "gettingstarted" or "demo" or "DONOTUSETHISINPRODUCTION". Well, maybe not that latter. Then direct users there in the getting started guides and the like, perhaps with notifications that once they're comfortable they need to dive into the schema definitions when they set up "for real".

          Varun Thacker The JIRA has already been created I think: SOLR-5917

          Show
          erickerickson Erick Erickson added a comment - - edited If we have a _text_ field but don't copy anything to it, what good is it? The user has to get in there and change the schema to search. If they have to intervene before using it that would defeat the purpose of making it zero-touch to start. And unless we do something with wildcard fields being searched by default or the _all query type, their first queries would all get zero hits since they'd inevitably just to a "q=some terms" which would search against _text_ ------------ Why not make everything multi-valued by default with data_driven? What functionality would be lost? Then switching to single-valued becomes an optimization if they need to tune things. Jan Høydahl bq: update the field to multi valued before passing on the update request. I don't think that would work, at least not without a lot of work. I tried a quick experiment just changing multiValued from false to true then updating a couple of docs. When I tried grouping and faceting got "org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: eoe". Don't particularly know whether it was the grouping or the faceting that caused it. Maybe we could fix this up but I don't think it would be simple. Just used default master techproducts schema, string type. DocValues is false. But DocValues MV fields are SORTED_SET so they'd have their own issues I'd guess. ------------------ The friction here I think, is that that the zero-touch startup requires us to make some decisions that we know aren't valid for production systems. At least not at scale. Throwing everything into a _text_ field won't scale. Searching all text fields all the time won't scale, at least not at the scale I often see. At that scale users must hand-tune the schema. Or at least understand the tradeoffs. But not doing one of those things requires that new users struggle with schema definitions before doing anything. Maybe we can resolve this tension by using one of the not-for-production solutions but raising some flags to the user that they're, well, not for production use at scale? Take the _all query type suggestion for instance. If we go that route then provide a request handler called "gettingstarted" or "demo" or "DONOTUSETHISINPRODUCTION". Well, maybe not that latter. Then direct users there in the getting started guides and the like, perhaps with notifications that once they're comfortable they need to dive into the schema definitions when they set up "for real". Varun Thacker The JIRA has already been created I think: SOLR-5917
          Hide
          varunthacker Varun Thacker added a comment -

          I'm all for having the field in the schema, but not auto-copying all other fields to it

          Maybe I didn't understand this part . How would this look like? Will we have the field defined and then a user should add copyFields to the fields they care about searching?

          If people want to search across all fields, I think the right way is something like edismax and wildcard support in the "qf" (query fields) parameter.

          That's a great idea! I'll create a separate Jira for that

          So to summarize this part of the discussion would this be accurate?

          • We should have the _text_ field defined
          • We will not copy everything into it
          • As a followup item it will be nice if edismax's qf param to add wildcard support
          Show
          varunthacker Varun Thacker added a comment - I'm all for having the field in the schema, but not auto-copying all other fields to it Maybe I didn't understand this part . How would this look like? Will we have the field defined and then a user should add copyFields to the fields they care about searching? If people want to search across all fields, I think the right way is something like edismax and wildcard support in the "qf" (query fields) parameter. That's a great idea! I'll create a separate Jira for that So to summarize this part of the discussion would this be accurate? We should have the _ text _ field defined We will not copy everything into it As a followup item it will be nice if edismax's qf param to add wildcard support
          Hide
          janhoy Jan Høydahl added a comment -

          I'm all for having the field in the schema, but not auto-copying all other fields to it

          Agree. Also ES ended up with disabling the _all field and instead introduce a new all query type that auto expands the query to all textual fields, with a constant score. I like that approach.

          We auto-detect multi-valued so we don't break if we come across multiple values later, but then there are numerous things that only work with single-valued fields

          Could data-driven create a new field as single-valued at first, and somehow tag it in IndexSchema as auto-detected so that if it sees multiple values for the same field later it can update the field to multi valued before passing on the update request. That would give the least surprises for a novice user, at the same time as it would work the same for manually created single-value fields. If each field in IndexSchema had a boolean autoCreated then it would also be easier to QA a managed-schema since we'd know if a field is explicitly defined or just added by the update chain.

          Show
          janhoy Jan Høydahl added a comment - I'm all for having the field in the schema, but not auto-copying all other fields to it Agree. Also ES ended up with disabling the _all field and instead introduce a new all query type that auto expands the query to all textual fields, with a constant score. I like that approach. We auto-detect multi-valued so we don't break if we come across multiple values later, but then there are numerous things that only work with single-valued fields Could data-driven create a new field as single-valued at first, and somehow tag it in IndexSchema as auto-detected so that if it sees multiple values for the same field later it can update the field to multi valued before passing on the update request. That would give the least surprises for a novice user, at the same time as it would work the same for manually created single-value fields. If each field in IndexSchema had a boolean autoCreated then it would also be easier to QA a managed-schema since we'd know if a field is explicitly defined or just added by the update chain.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I think we should have the catch-all field enabled all the time.

          I'm all for having the field in the schema, but not auto-copying all other fields to it, doubling the indexing workload.
          If people want to search across all fields, I think the right way is something like edismax and wildcard support in the "qf" (query fields) parameter.

          As far as data-drive-by-default... I'd be more in favor if it worked better. The biggest issue I hit in the past was single/multi-valued problem. We auto-detect multi-valued so we don't break if we come across multiple values later, but then there are numerous things that only work with single-valued fields. We need fields that can be multi-valued, but act as single-valued when they aren't.

          managed-schema.xml

          +0 for the .xml extension. I prefer it to no extension, but there is back compat to consider. I'd also prefer that if it get changed, we first look for "managed-schema.xml", then "managed-schema", and then "schema.xml" to preserve back compat.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I think we should have the catch-all field enabled all the time. I'm all for having the field in the schema, but not auto-copying all other fields to it, doubling the indexing workload. If people want to search across all fields, I think the right way is something like edismax and wildcard support in the "qf" (query fields) parameter. As far as data-drive-by-default... I'd be more in favor if it worked better. The biggest issue I hit in the past was single/multi-valued problem. We auto-detect multi-valued so we don't break if we come across multiple values later, but then there are numerous things that only work with single-valued fields. We need fields that can be multi-valued, but act as single-valued when they aren't. managed-schema.xml +0 for the .xml extension. I prefer it to no extension, but there is back compat to consider. I'd also prefer that if it get changed, we first look for "managed-schema.xml", then "managed-schema", and then "schema.xml" to preserve back compat.
          Hide
          noble.paul Noble Paul added a comment - - edited

          My 2 cents. I've always been against enabling data driven schema by default. I'm still against it. The reason is, the auto created field types are usually suboptimal.

          However, when a user is creating a collection , if we can warn the user with a message on how to enable that feature should be good enough. Anyway, in the cluster start wizard , we can use that as default.

          Make it possible to decide data-driven or not while creating collection? bin/solr create -c foo -data-driven false

          Yeah , this flag is useful

          Show
          noble.paul Noble Paul added a comment - - edited My 2 cents. I've always been against enabling data driven schema by default. I'm still against it. The reason is, the auto created field types are usually suboptimal. However, when a user is creating a collection , if we can warn the user with a message on how to enable that feature should be good enough. Anyway, in the cluster start wizard , we can use that as default. Make it possible to decide data-driven or not while creating collection? bin/solr create -c foo -data-driven false Yeah , this flag is useful
          Hide
          erickerickson Erick Erickson added a comment -

          I've pretty much come round to enabling data-driven by default. I like the idea of printing warnings about "not recommended for production", possibly adding same to the documentation?

          Show
          erickerickson Erick Erickson added a comment - I've pretty much come round to enabling data-driven by default. I like the idea of printing warnings about "not recommended for production", possibly adding same to the documentation?
          Hide
          janhoy Jan Høydahl added a comment - - edited

          It is important to remember that even in a data driven mode, nothing stops you from creating a collection, adding a few fields with Schema API that you want to force fieldType for, and then start indexing docs. I've used this approach in POC settings many times, and it makes sure the fields you define up front behave well while still being able to explore, search and facet on new, unknown fields. Then before going to production, you clean up the schema and flip the data-driven switch.

          So for 7.x I tend to agree with Ishan and make it possible to add a collection from Admin UI as the first thing you try after install, and have it behave exactly like bin/solr create -c foo.

          Then let us make data-driven more mature, here are some rough thoughts

          • Implement SOLR-9526, indexing text as both tokenized and string
          • If update/json, perhaps be better at guessing primitive types from JSON type (not possible from XML, CSV)
          • Add a add-unknown-fields-to-the-schema-dryrun update chain which buffers N docs before guessing, and does no indexing
          • Add a <data-driven>true|false</data-driven> tag/API to the schema, and let DirectUpdateHandler? enable/disable the update chain based on this
          • Make it possible to decide data-driven or not while creating collection? bin/solr create -c foo -data-driven false

          Wrt schema.xml vs managed-schema, I'm +0 on renaming to managed-schema.xml, the "managed" part in the name and comments in the file gives warning enough. What if we add an API POST /collection/schema/xml which takes the complete XML file as body, as a safe way to continue hand-editing the xml schema? It would also be easy to add an Admin UI edit textbox if we had this...

          Show
          janhoy Jan Høydahl added a comment - - edited It is important to remember that even in a data driven mode, nothing stops you from creating a collection, adding a few fields with Schema API that you want to force fieldType for, and then start indexing docs. I've used this approach in POC settings many times, and it makes sure the fields you define up front behave well while still being able to explore, search and facet on new, unknown fields. Then before going to production, you clean up the schema and flip the data-driven switch. So for 7.x I tend to agree with Ishan and make it possible to add a collection from Admin UI as the first thing you try after install, and have it behave exactly like bin/solr create -c foo . Then let us make data-driven more mature, here are some rough thoughts Implement SOLR-9526 , indexing text as both tokenized and string If update/json , perhaps be better at guessing primitive types from JSON type (not possible from XML, CSV) Add a add-unknown-fields-to-the-schema-dryrun update chain which buffers N docs before guessing, and does no indexing Add a <data-driven>true|false</data-driven> tag/API to the schema, and let DirectUpdateHandler? enable/disable the update chain based on this Make it possible to decide data-driven or not while creating collection? bin/solr create -c foo -data-driven false Wrt schema.xml vs managed-schema, I'm +0 on renaming to managed-schema.xml , the "managed" part in the name and comments in the file gives warning enough. What if we add an API POST /collection/schema/xml which takes the complete XML file as body, as a safe way to continue hand-editing the xml schema? It would also be easy to add an Admin UI edit textbox if we had this...
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          On xml extension for managed-schema. Not having an XML extension means that file is a special case everywhere. Admin UI had several JIRAs because that file would not display properly, file system sorting is confusing, presentations that try to explain things are confusing. Even just viewing the example schemas on filesystem is confusing as there is no data-type mapping and editors do not open it up without explicit intervention.

          Show
          arafalov Alexandre Rafalovitch added a comment - On xml extension for managed-schema. Not having an XML extension means that file is a special case everywhere. Admin UI had several JIRAs because that file would not display properly, file system sorting is confusing, presentations that try to explain things are confusing. Even just viewing the example schemas on filesystem is confusing as there is no data-type mapping and editors do not open it up without explicit intervention.
          Hide
          dsmiley David Smiley added a comment -

          ... and further if the log message contained the curl command to disable it. If it's too long then maybe a shortened URL to relevant Solr docs about this.

          Show
          dsmiley David Smiley added a comment - ... and further if the log message contained the curl command to disable it. If it's too long then maybe a shortened URL to relevant Solr docs about this.
          Hide
          dsmiley David Smiley added a comment -

          That's an excellent argument Ishan.

          I'd feel a bit better if we logged a WARN when data-driven is being used that it's not recommended for production use. This could be logged at a level that could be explicitly ignored by logging levels if someone is convinced it makes sense for their production scenario.

          Show
          dsmiley David Smiley added a comment - That's an excellent argument Ishan. I'd feel a bit better if we logged a WARN when data-driven is being used that it's not recommended for production use. This could be logged at a level that could be explicitly ignored by logging levels if someone is convinced it makes sense for their production scenario.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Seems like David and Shawn both want data driven nature to be off by default (from the configset point of view).

          I'd like to pitch in that we do otherwise (i.e. lets make data driven as default), and here are my thoughts:
          I agree that data driven nature, as it stands, has problems. However, for a new user, who only knows how to create a collection and index documents, the data driven nature gets him far more ahead today than when he has data driven disabled. In the latter scenario, he needs to know how to create schema fields or know how to use dynamic fields (or worse, how to work with hand-edited configsets). So, as a default, a new user is better served with data driven nature being available without any extra steps. However, for a user who is willing to add schema fields anyway (e.g. using schema API), this extra step to disable data-driven doesn't seem very out of the place. Now, in 7x, IMO we should try to improve the data driven nature in terms of handling all the scenarios it currently struggles with, and try making it great again. Just my 723 satoshis (=$0.02).

          Having said that, I am willing to go either way based on consensus here.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Seems like David and Shawn both want data driven nature to be off by default (from the configset point of view). I'd like to pitch in that we do otherwise (i.e. lets make data driven as default), and here are my thoughts: I agree that data driven nature, as it stands, has problems. However, for a new user, who only knows how to create a collection and index documents, the data driven nature gets him far more ahead today than when he has data driven disabled. In the latter scenario, he needs to know how to create schema fields or know how to use dynamic fields (or worse, how to work with hand-edited configsets). So, as a default, a new user is better served with data driven nature being available without any extra steps. However, for a user who is willing to add schema fields anyway (e.g. using schema API), this extra step to disable data-driven doesn't seem very out of the place. Now, in 7x, IMO we should try to improve the data driven nature in terms of handling all the scenarios it currently struggles with, and try making it great again. Just my 723 satoshis (=$0.02). Having said that, I am willing to go either way based on consensus here.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          The data driven example has a "text" catch-all field. Should toggling the schemaless mode enable add it?

          I think we should have the catch-all field enabled all the time. That's what I was referring to (for things I want to do next) when I said: Also, I haven't consolidated the managed-schema differences between basic_configs and data_driven_schema_configs into this _default configset yet..

          I've never liked the fact it's no longer has a .xml extension and your text editor doesn't highlight it by default.

          I very much liked it. It puts us on the path towards not encouraging users to update the schema by hand. I think we should soon plug all gaps in our schema and config APIs, and then discourage (deprecate?) hand editing of schema altogether.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - The data driven example has a "text" catch-all field. Should toggling the schemaless mode enable add it? I think we should have the catch-all field enabled all the time. That's what I was referring to (for things I want to do next) when I said: Also, I haven't consolidated the managed-schema differences between basic_configs and data_driven_schema_configs into this _default configset yet. . I've never liked the fact it's no longer has a .xml extension and your text editor doesn't highlight it by default. I very much liked it. It puts us on the path towards not encouraging users to update the schema by hand. I think we should soon plug all gaps in our schema and config APIs, and then discourage (deprecate?) hand editing of schema altogether.
          Hide
          varunthacker Varun Thacker added a comment -

          Should the filename for the managed schema regain its .xml extension?

          I've never liked the fact it's no longer has a .xml extension and your text editor doesn't highlight it by default. Let's create a separate Jira and discuss more there?

          The data driven example has a "text" catch-all field. Should toggling the schemaless mode enable add it?

          Show
          varunthacker Varun Thacker added a comment - Should the filename for the managed schema regain its .xml extension? I've never liked the fact it's no longer has a .xml extension and your text editor doesn't highlight it by default. Let's create a separate Jira and discuss more there? The data driven example has a " text " catch-all field. Should toggling the schemaless mode enable add it?
          Hide
          elyograg Shawn Heisey added a comment -

          Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here?

          Repeating what I said earlier: I think field guessing should be off by default in basic_configs (or whatever we end up calling it), because (IMHO) it has a tendency to cause more problems than it solves. While I understand a desire to maintain default settings from version to version, I believe that strict adherence to this only applies to new minor releases.

          With 7.0, we have an opportunity to review past decisions, decide whether they are still relevant. I think that every assumption and every default should be subject to review on every new major version. I expect almost all of them to be retained after such a review ... massive changes would be a sign that we were doing a terrible job.

          Related tangents: 1) Should we have a classic schema configset, or do we want to eventually remove the classic factory? 2) Should the filename for the managed schema regain its .xml extension?

          Show
          elyograg Shawn Heisey added a comment - Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here? Repeating what I said earlier: I think field guessing should be off by default in basic_configs (or whatever we end up calling it), because (IMHO) it has a tendency to cause more problems than it solves. While I understand a desire to maintain default settings from version to version, I believe that strict adherence to this only applies to new minor releases. With 7.0, we have an opportunity to review past decisions, decide whether they are still relevant. I think that every assumption and every default should be subject to review on every new major version. I expect almost all of them to be retained after such a review ... massive changes would be a sign that we were doing a terrible job. Related tangents: 1) Should we have a classic schema configset, or do we want to eventually remove the classic factory? 2) Should the filename for the managed schema regain its .xml extension?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Lets give it a few more days longer for folks to speak up on this choice.

          I ran myself up a corner since we, likely, don't have many days before Anshum enforces the feature freeze for 7.0. I'll write an email on the dev list soliciting reviews.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Lets give it a few more days longer for folks to speak up on this choice. I ran myself up a corner since we, likely, don't have many days before Anshum enforces the feature freeze for 7.0. I'll write an email on the dev list soliciting reviews.
          Hide
          dsmiley David Smiley added a comment -

          I should have clarified that my discussion about editable configs versus using APIs wasn't a request for your guidance on how to do either. I just wanted to articulate a trade-off.

          Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here?

          Well unless others have more input with stronger opinions or I'm outnumbered, I think data driven should be disabled by default. Lets give it a few more days longer for folks to speak up on this choice.

          Thanks for the new patch – looks good.

          Show
          dsmiley David Smiley added a comment - I should have clarified that my discussion about editable configs versus using APIs wasn't a request for your guidance on how to do either. I just wanted to articulate a trade-off. Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here? Well unless others have more input with stronger opinions or I'm outnumbered, I think data driven should be disabled by default . Lets give it a few more days longer for folks to speak up on this choice. Thanks for the new patch – looks good.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I think we have to start including this in all our quick start guides.

          Another interesting suggestion was made by Cassandra Targett that we should look at way to set this parameter to true or false at the time of collection creation, so as to have the script support it during collection creation as well as to make it easier to build a UI that lets someone create a collection with either data driven on or off. I was thinking that we can attempt that as a follow up JIRA later.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I think we have to start including this in all our quick start guides. Another interesting suggestion was made by Cassandra Targett that we should look at way to set this parameter to true or false at the time of collection creation, so as to have the script support it during collection creation as well as to make it easier to build a UI that lets someone create a collection with either data driven on or off. I was thinking that we can attempt that as a follow up JIRA later.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          I looked at this patch and I see that update.autoCreateFields is set to false in the config - i.e. data driven is disabled by default. I'm very pleased with that default! But it's not consistent with what you said you did.

          Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here?

          Might the toggle mechanism be made easier somehow?

          Couldn't find anything easier, that is also clean enough to not expose any internal implementation specific details (e.g. concept like update chain, name of the chain, etc.)

          I confess I remain a fan of "classic" (non-managed) configuration because I can simply go in and edit a config file to what I want it to be (and I can read the config) all without reading documentation. I can even search the config.

          By hand, this line needs to be changed:
          From:

            <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
                     processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
          

          To:

            <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="false"
                     processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
          

          The APIs require that I go lookup documentation somewhere and hope there's a one-liner ready for me to paste in to curl. Even with the super cool v2 APIs, it's not going to help me know that there's a special custom user property "update.autoCreateFields" that can be toggled.

          This is the one liner:

          curl http://host:8983/solr/coll1/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
          

          I think we have to start including this in all our quick start guides.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited I looked at this patch and I see that update.autoCreateFields is set to false in the config - i.e. data driven is disabled by default. I'm very pleased with that default! But it's not consistent with what you said you did. Yes, I wanted it to be true by default (to maintain previous behaviour). But I don't have any strong opinion either way. What is the consensus here? Might the toggle mechanism be made easier somehow? Couldn't find anything easier, that is also clean enough to not expose any internal implementation specific details (e.g. concept like update chain, name of the chain, etc.) I confess I remain a fan of "classic" (non-managed) configuration because I can simply go in and edit a config file to what I want it to be (and I can read the config) all without reading documentation. I can even search the config. By hand, this line needs to be changed: From: <updateRequestProcessorChain name= "add-unknown-fields-to-the-schema" default = "${update.autoCreateFields: true }" processor= "uuid,remove-blank,field-name-mutating,parse- boolean ,parse- long ,parse- double ,parse-date,add-schema-fields" > To: <updateRequestProcessorChain name= "add-unknown-fields-to-the-schema" default = " false " processor= "uuid,remove-blank,field-name-mutating,parse- boolean ,parse- long ,parse- double ,parse-date,add-schema-fields" > The APIs require that I go lookup documentation somewhere and hope there's a one-liner ready for me to paste in to curl. Even with the super cool v2 APIs, it's not going to help me know that there's a special custom user property "update.autoCreateFields" that can be toggled. This is the one liner: curl http: //host:8983/solr/coll1/config -d '{ "set-user-property" : { "update.autoCreateFields" : " false " }}' I think we have to start including this in all our quick start guides.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Updating patch that updates the changes over the basic_confs (for easier reviewing).

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Updating patch that updates the changes over the basic_confs (for easier reviewing).
          Hide
          dsmiley David Smiley added a comment -

          I think a patch modifying basic_configs would be easier to interpret for code-review purposes as it would be a diff against existing files instead of adding whole new files; and of course I didn't truly check what the differences were. It might even be good to commit this in stages so that git change tracking can better recognize what is going on (our future selves will thank us!). For example, commit changes to basic configs such that it is what we want the default to be, and then in another commit, both rename basic configs, and remove the data driven configs.

          I looked at this patch and I see that update.autoCreateFields is set to false in the config – i.e. data driven is disabled by default. I'm very pleased with that default! But it's not consistent with what you said you did.

          Might the toggle mechanism be made easier somehow? I confess I remain a fan of "classic" (non-managed) configuration because I can simply go in and edit a config file to what I want it to be (and I can read the config) all without reading documentation. I can even search the config. The APIs require that I go lookup documentation somewhere and hope there's a one-liner ready for me to paste in to curl. Even with the super cool v2 APIs, it's not going to help me know that there's a special custom user property "update.autoCreateFields" that can be toggled.

          Show
          dsmiley David Smiley added a comment - I think a patch modifying basic_configs would be easier to interpret for code-review purposes as it would be a diff against existing files instead of adding whole new files; and of course I didn't truly check what the differences were. It might even be good to commit this in stages so that git change tracking can better recognize what is going on (our future selves will thank us!). For example, commit changes to basic configs such that it is what we want the default to be, and then in another commit, both rename basic configs, and remove the data driven configs. I looked at this patch and I see that update.autoCreateFields is set to false in the config – i.e. data driven is disabled by default. I'm very pleased with that default! But it's not consistent with what you said you did. Might the toggle mechanism be made easier somehow? I confess I remain a fan of "classic" (non-managed) configuration because I can simply go in and edit a config file to what I want it to be (and I can read the config) all without reading documentation. I can even search the config. The APIs require that I go lookup documentation somewhere and hope there's a one-liner ready for me to paste in to curl. Even with the super cool v2 APIs, it's not going to help me know that there's a special custom user property "update.autoCreateFields" that can be toggled.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          Apologies and a bit of an update on my radio silence. I had offline discussions with Noble Paul, Hoss Man, Shalin Shekhar Mangar.

          There were various approaches that I was considering:

          1. The initParams based enabling/disabling mechanism for data driven nature. Discarded this, considering Noble's concerns that initParams with globbing/wildcards support is a risky tool for user to shoot himself on the foot (if he gets the wildcards wrong), and hence it is a possibility that we may want to remove initParams support going forward.
          2. Trying to create the chain programmatically was not easy, since the AddSchemaFieldsUpdateProcessorFactory needs field type names as defined in the managed-schema/schema.xml. Hence, if the chain is created programmatically, the user would not be able to switch them to point fields instead of trie fields or vice versa for example.
          3. Letting the user enable/disable the data driven nature by adding "update.chain=add-unknown-fields-to-the-schema" to every paramset in ImplicitPlugins.json and then letting the user use the config API to update the "update.chain" parameter's value for enabling/disabling. This approach exposed too much of the internals like "update chain" and the name of the chain etc. in the command to enable/disable data driven nature and hence potentially confusing.

          A very important consideration in setting up this enable/disable data driven feature was that if we are going to use the "add-unknown-fields-to-schema" update chain exactly as it is defined in data-driven-schema-configs as of today, then it would be impossible for the user to modify the update chain (or parts of the chain) using the config API, as the config API cannot edit URPs that are within an update chain, and also it doesn't support creating/editing update chains.

          So, the solution (as in the patch) was to break out the individual URPs in the add-unknown-fields-to-the-schema chain into top level named URPs (hence they would be editable using config APIs) and creating a chain using those named URPs that is functionally similar. There is a nice, not well documented, default=true|false attribute for update chains that has been (and should have been all along) used to enable/disable the data driven nature (based on a variable).

          So, TLDR; check out the new _default configset in the patch. It has data driven nature enabled by default. The data driven nature can be enabled/disabled using the following:

          Disable schemaless/data driven nature:
          curl http://host:8983/solr/coll1/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
          Enable schemaless/data driven nature:
          curl http://host:8983/solr/coll1/config -d '{"set-user-property": {"update.autoCreateFields":"true"}}'
          

          Would appreciate a review.

          Note: the patch contains only the new default configset. However, we also need to remove the existing data_driven_schema_configs and basic_configs and update the script. Also, I haven't consolidated the managed-schema differences between basic_configs and data_driven_schema_configs into this _default configset yet.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Apologies and a bit of an update on my radio silence. I had offline discussions with Noble Paul , Hoss Man , Shalin Shekhar Mangar . There were various approaches that I was considering: The initParams based enabling/disabling mechanism for data driven nature. Discarded this, considering Noble's concerns that initParams with globbing/wildcards support is a risky tool for user to shoot himself on the foot (if he gets the wildcards wrong), and hence it is a possibility that we may want to remove initParams support going forward. Trying to create the chain programmatically was not easy, since the AddSchemaFieldsUpdateProcessorFactory needs field type names as defined in the managed-schema/schema.xml. Hence, if the chain is created programmatically, the user would not be able to switch them to point fields instead of trie fields or vice versa for example. Letting the user enable/disable the data driven nature by adding "update.chain=add-unknown-fields-to-the-schema" to every paramset in ImplicitPlugins.json and then letting the user use the config API to update the "update.chain" parameter's value for enabling/disabling. This approach exposed too much of the internals like "update chain" and the name of the chain etc. in the command to enable/disable data driven nature and hence potentially confusing. A very important consideration in setting up this enable/disable data driven feature was that if we are going to use the "add-unknown-fields-to-schema" update chain exactly as it is defined in data-driven-schema-configs as of today, then it would be impossible for the user to modify the update chain (or parts of the chain) using the config API, as the config API cannot edit URPs that are within an update chain, and also it doesn't support creating/editing update chains. So, the solution (as in the patch) was to break out the individual URPs in the add-unknown-fields-to-the-schema chain into top level named URPs (hence they would be editable using config APIs) and creating a chain using those named URPs that is functionally similar. There is a nice, not well documented, default=true|false attribute for update chains that has been (and should have been all along) used to enable/disable the data driven nature (based on a variable). So, TLDR ; check out the new _default configset in the patch. It has data driven nature enabled by default. The data driven nature can be enabled/disabled using the following: Disable schemaless/data driven nature: curl http: //host:8983/solr/coll1/config -d '{ "set-user-property" : { "update.autoCreateFields" : " false " }}' Enable schemaless/data driven nature: curl http: //host:8983/solr/coll1/config -d '{ "set-user-property" : { "update.autoCreateFields" : " true " }}' Would appreciate a review. Note: the patch contains only the new default configset. However, we also need to remove the existing data_driven_schema_configs and basic_configs and update the script. Also, I haven't consolidated the managed-schema differences between basic_configs and data_driven_schema_configs into this _default configset yet.
          Hide
          dsmiley David Smiley added a comment -

          Sounds very good Ishan; I'm encouraged by your progress!

          Show
          dsmiley David Smiley added a comment - Sounds very good Ishan; I'm encouraged by your progress!
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Initially, I was looking at the toggleable flag to be set as follows:

          Start with basic_configs configset, create a collection "test3" with that configset.
          
          Enable data driven nature:
          
          curl http://localhost:8983/solr/test3/config -d '{"add-initparams": {"name": "data-driven-nature", "path": "/update/**", "defaults": {"update.chain": "add-unknown-fields-to-the-schema"}}}'
          
          Disable data driven nature:
          
          curl http://localhost:8983/solr/test3/config -d '{"delete-initparams" : "data-driven-nature" }'
          

          This currently works as of 6.x and this would've required minimal changes to achieve what we wanted (maybe just wrap that lengthy command into a shorter wrapper).

          However, Noble informed me that, going forward, editing initparams is not the best choice. Upon his suggestion, and also Alexandre's, I am now looking at using paramsets and trying to construct the update chain (which is currently called "add-unknown-fields-to-the-schema") programmatically, and used upon the passing in of the appropriate parameter(s) for enabling/disabling data-driven nature. I shall post a patch soon.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Initially, I was looking at the toggleable flag to be set as follows: Start with basic_configs configset, create a collection "test3" with that configset. Enable data driven nature: curl http: //localhost:8983/solr/test3/config -d '{ "add-initparams" : { "name" : "data-driven-nature" , "path" : "/update/**" , "defaults" : { "update.chain" : "add-unknown-fields-to-the-schema" }}}' Disable data driven nature: curl http: //localhost:8983/solr/test3/config -d '{ "delete-initparams" : "data-driven-nature" }' This currently works as of 6.x and this would've required minimal changes to achieve what we wanted (maybe just wrap that lengthy command into a shorter wrapper). However, Noble informed me that, going forward, editing initparams is not the best choice. Upon his suggestion, and also Alexandre's, I am now looking at using paramsets and trying to construct the update chain (which is currently called "add-unknown-fields-to-the-schema") programmatically, and used upon the passing in of the appropriate parameter(s) for enabling/disabling data-driven nature. I shall post a patch soon.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          I had an idea (in some other JIRA) that we have a kitchen-sync schema with some Admin-UI extra that allows to push any of its own definitions to a different schema via managed API. So, when you want to add Armenian field type, you load up/create collection with that schema, open it up on the Admin UI, and then push the definition to your primary collection. Then, the config collection can be deleted again.

          I have not tested if it is possible. The main issue is generating API configuration format as well as Admin-UI glue (which could be similar to /browse).

          Show
          arafalov Alexandre Rafalovitch added a comment - I had an idea (in some other JIRA) that we have a kitchen-sync schema with some Admin-UI extra that allows to push any of its own definitions to a different schema via managed API. So, when you want to add Armenian field type, you load up/create collection with that schema, open it up on the Admin UI, and then push the definition to your primary collection. Then, the config collection can be deleted again. I have not tested if it is possible. The main issue is generating API configuration format as well as Admin-UI glue (which could be similar to /browse).
          Hide
          elyograg Shawn Heisey added a comment -

          +1 to shorter configs, with references in the top comment to documentation and other resources. Those references might include kitchen-sink versions similar to what we currently have, in an alternate directory.

          Show
          elyograg Shawn Heisey added a comment - +1 to shorter configs, with references in the top comment to documentation and other resources. Those references might include kitchen-sink versions similar to what we currently have, in an alternate directory.
          Hide
          janhoy Jan Høydahl added a comment -

          I also hope we could strip down the minimum needed explicit config that still does something useful. I mean, the managed-schema for data-driven only defines three fields but is 1044 lines long! I would hope it could be something like 10 lines... Perhaps the whole set of default field-types and copy-fields could be included with one new tag e.g. <standardFieldTypes/>, and the txt files in lang folder could be loaded as resources from the class path. Similarly we don't need to ship currency.xml, elevate.xml, protwords.txt, stopwords.txt and synonyms.txt. Users can add those, or better, use their managed counterparts.
          Similarly solrconfig.xml for data_driven is 1408 lines long, which must be very confusing for new users. Could we do more default and implicit stuff like we did with implicit AdminHandlers? And perhaps bake the add-unknown-fields-to-the-schema chain into an implicit feature that could be toggled during update &fieldGuessing=on|off|training where in training mode it would only learn. If someone needed to override the implicit chain they could add a tag <fieldGuessingChain>my-better-chain</fieldGuessingChain>. We could probably have a "default" solrconfig of 50 lines!

          Show
          janhoy Jan Høydahl added a comment - I also hope we could strip down the minimum needed explicit config that still does something useful. I mean, the managed-schema for data-driven only defines three fields but is 1044 lines long ! I would hope it could be something like 10 lines... Perhaps the whole set of default field-types and copy-fields could be included with one new tag e.g. <standardFieldTypes/> , and the txt files in lang folder could be loaded as resources from the class path. Similarly we don't need to ship currency.xml, elevate.xml, protwords.txt, stopwords.txt and synonyms.txt. Users can add those, or better, use their managed counterparts. Similarly solrconfig.xml for data_driven is 1408 lines long , which must be very confusing for new users. Could we do more default and implicit stuff like we did with implicit AdminHandlers? And perhaps bake the add-unknown-fields-to-the-schema chain into an implicit feature that could be toggled during update &fieldGuessing=on|off|training where in training mode it would only learn. If someone needed to override the implicit chain they could add a tag <fieldGuessingChain>my-better-chain</fieldGuessingChain> . We could probably have a "default" solrconfig of 50 lines!
          Hide
          elyograg Shawn Heisey added a comment -

          I agree with David Smiley regarding the default for schemaless mode. I think it's a mistake that our default (basic_configs) configuration adds unknown fields.

          The idea of high level toggles for functionality is really cool, particularly if they can be easily toggled within the admin UI.

          I understand the appeal of schemaless mode, and I do not have any objection to Solr having the capability, but I firmly believe that it should be disabled by default. I have found that when the update processor guesses what type to use for a field containing text, what it chooses is frequently incorrect for what a user needs, which means that the user must re-index after they fix the incorrect type. Older behavior (fail to index until unknown fields are added) means there is less re-indexing.

          Show
          elyograg Shawn Heisey added a comment - I agree with David Smiley regarding the default for schemaless mode. I think it's a mistake that our default (basic_configs) configuration adds unknown fields. The idea of high level toggles for functionality is really cool, particularly if they can be easily toggled within the admin UI. I understand the appeal of schemaless mode, and I do not have any objection to Solr having the capability, but I firmly believe that it should be disabled by default. I have found that when the update processor guesses what type to use for a field containing text, what it chooses is frequently incorrect for what a user needs, which means that the user must re-index after they fix the incorrect type. Older behavior (fail to index until unknown fields are added) means there is less re-indexing.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          Can we use paramset API to add/remove the schemaless chain? Like with do with files example for /browse facets.

          So, start with it configured but not used, and have very clear comments/readme about the request parameter to use to have it activated and to have it set as default.

          And - ideally - also have it work with dry-run endpoint too, somehow.

          Show
          arafalov Alexandre Rafalovitch added a comment - Can we use paramset API to add/remove the schemaless chain? Like with do with files example for /browse facets. So, start with it configured but not used, and have very clear comments/readme about the request parameter to use to have it activated and to have it set as default. And - ideally - also have it work with dry-run endpoint too, somehow.
          Hide
          dsmiley David Smiley added a comment -

          +1 to what you said except your parenthetical:

          -0 to making "add-unknown-fields-to-the-schema" the default (at /update). I don't like it as the default but I'm not standing in your way of making it so as long as users can easily change the setting after.

          Show
          dsmiley David Smiley added a comment - +1 to what you said except your parenthetical: -0 to making "add-unknown-fields-to-the-schema" the default (at /update). I don't like it as the default but I'm not standing in your way of making it so as long as users can easily change the setting after.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Indeed. How about the following?

          1. Lets deprecate what we know as data_driven_schema_configs
          2. Build a "toggleable" data driven functionality into the basic_configs configset (and make it the default)
          3. Optionally, build a training mode, which spits out a schema based on a dry run of documents received.
          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Indeed. How about the following? Lets deprecate what we know as data_driven_schema_configs Build a "toggleable" data driven functionality into the basic_configs configset (and make it the default) Optionally, build a training mode, which spits out a schema based on a dry run of documents received.
          Hide
          dsmiley David Smiley added a comment -

          Instead of thinking of this as one configset versus another, perhaps we can think of this as a configset that has some options that are easily toggle-able. At least I'm thinking of the "add-unknown-fields-to-the-schema" update chain in particular.

          Show
          dsmiley David Smiley added a comment - Instead of thinking of this as one configset versus another, perhaps we can think of this as a configset that has some options that are easily toggle-able. At least I'm thinking of the "add-unknown-fields-to-the-schema" update chain in particular.
          Hide
          arafalov Alexandre Rafalovitch added a comment -

          I feel that none of the current examples are really awesome anymore, given the growth and change in Solr. Ideally, we would rebuild the schemas (and possibly associated tutorials) as per SOLR-10329. But it would be a really big project, so it is a bit of a dream so far.

          More immediate, I think the question is how to do the needed workflow tasks with whatever schema is chosen. The basic set of features I can think of:

          • Create collection (duh)
          • Add new fields automatically? (schemaless mode?)
          • Add new fields manually (API)
          • Remove fields (e.g. when schemaless detection was wrong and needs to be redone) (API?)
          • Add type definitions (API)
          • Remove type definitions (API?)
          • Modify solrconfig.xml (API, overrides and external paramsets)
          • Lock the schema from future modifications

          Theoretically, any managed schema can do that, but the question is how easy it is for the user to understand what is there already and how to modify it for their own needs.

          Show
          arafalov Alexandre Rafalovitch added a comment - I feel that none of the current examples are really awesome anymore, given the growth and change in Solr. Ideally, we would rebuild the schemas (and possibly associated tutorials) as per SOLR-10329 . But it would be a really big project, so it is a bit of a dream so far. More immediate, I think the question is how to do the needed workflow tasks with whatever schema is chosen. The basic set of features I can think of: Create collection (duh) Add new fields automatically? (schemaless mode?) Add new fields manually (API) Remove fields (e.g. when schemaless detection was wrong and needs to be redone) (API?) Add type definitions (API) Remove type definitions (API?) Modify solrconfig.xml (API, overrides and external paramsets) Lock the schema from future modifications Theoretically, any managed schema can do that, but the question is how easy it is for the user to understand what is there already and how to modify it for their own needs.

            People

            • Assignee:
              ichattopadhyaya Ishan Chattopadhyaya
              Reporter:
              ichattopadhyaya Ishan Chattopadhyaya
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development