Solr
  1. Solr
  2. SOLR-5228

Deprecate <fields> and <types> tags in schema.xml

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      On the solr-user mailing list, Nutan recently mentioned spending days trying to track down a problem that turned out to be because he had attempted to add a <dynamicField .. /> that was outside of the <fields> block in his schema.xml – Solr was just silently ignoring it.

      We have made improvements in other areas of config validation by generating statup errors when tags/attributes are found that are not expected – but in this case i think we should just stop expecting/requiring that the <fields> and <types> tags will be used to group these sorts of things. I think schema.xml parsing should just start ignoring them and only care about finding the <field>, <dynamicField>, and <fieldType> tags wherever they may be.

      If people want to keep using them, fine. If people want to mix fieldTypes and fields side by side (perhaps specify a fieldType, then list all the fields using it) fine. I don't see any value in forcing people to use them, but we definitely shouldn't leave things the way they are with otherwise perfectly valid field/type declarations being silently ignored.

      I'll take this on unless i see any objections.

      1. SOLR-5228.patch
        21 kB
        Erick Erickson
      2. SOLR-5228.patch
        11 kB
        Erick Erickson

        Activity

        Hide
        Robert Muir added a comment -

        I think its annoying the fields and fieldTypes have to be in separate sections too. This makes it hard for you to logically arrange things in such a way that its readable without lots of scrolling up and down and getting lost.

        Can we just go the simple route of deprecating 'fields' and 'types' in 4.x (throw error in 5.x), and in 4.x also allow field/fieldtypes to be "top-level" in the schema.

        I think this is ultimately simpler than just willy-nilly allowing shit to be nested underneath anywhere: thats hard to maintain: and it still allows people who want to group types/fields together to do that, and those that want to put them side-by-side to do that too.

        Show
        Robert Muir added a comment - I think its annoying the fields and fieldTypes have to be in separate sections too. This makes it hard for you to logically arrange things in such a way that its readable without lots of scrolling up and down and getting lost. Can we just go the simple route of deprecating 'fields' and 'types' in 4.x (throw error in 5.x), and in 4.x also allow field/fieldtypes to be "top-level" in the schema. I think this is ultimately simpler than just willy-nilly allowing shit to be nested underneath anywhere: thats hard to maintain: and it still allows people who want to group types/fields together to do that, and those that want to put them side-by-side to do that too.
        Hide
        Erick Erickson added a comment -

        not very far below where fieldType and fields are parsed out with a path that includes <types> or <fields> there's this bit for copyField:

        expression = "//" + COPY_FIELD;
        nodes = (NodeList) xpath.evaluate(expression, document, XPathConstants.NODESET);

        for (int i=0; i<nodes.getLength(); i++) {

        I happened to run into this because a customer put <copyField> tags inside the <fields> tag and it worked which surprised me at the time....

        Seems like the model we could use, we wouldn't even need to formally deprecate the <types> or <fields> tags, just comment that they were no longer necessary.

        FWIW

        Show
        Erick Erickson added a comment - not very far below where fieldType and fields are parsed out with a path that includes <types> or <fields> there's this bit for copyField: expression = "//" + COPY_FIELD; nodes = (NodeList) xpath.evaluate(expression, document, XPathConstants.NODESET); for (int i=0; i<nodes.getLength(); i++) { I happened to run into this because a customer put <copyField> tags inside the <fields> tag and it worked which surprised me at the time.... Seems like the model we could use, we wouldn't even need to formally deprecate the <types> or <fields> tags, just comment that they were no longer necessary. FWIW
        Hide
        Robert Muir added a comment -

        thats the willy-nilly approach I mentioned: I dont like it.

        if we are gonna do that, no point in using xml at all, we get no value from it, only horrors.

        The problem here is not field/dynamicField elements and "where they can be", the problem is the fieldType/types elements: they are useless and bring no value. Lets get rid of them.

        Show
        Robert Muir added a comment - thats the willy-nilly approach I mentioned: I dont like it. if we are gonna do that, no point in using xml at all, we get no value from it, only horrors. The problem here is not field/dynamicField elements and "where they can be", the problem is the fieldType/types elements: they are useless and bring no value. Lets get rid of them.
        Hide
        Tomás Fernández Löbbe added a comment -

        What about increasing the schema version? It is currently 1.5. Solr could continue supporting 1.5 as it is now with <types> and <fieldTypes>, create the version 1.6 that does not support those (and throws exception if present). 5.x would support 1.6+ versions, 4.x should support both but use 1.6 in the example. Anyone who needs to upgrade between 4.x versions can just keep their schema using 1.5. Anyone creating a new schema would start with the 1.6.

        Show
        Tomás Fernández Löbbe added a comment - What about increasing the schema version? It is currently 1.5. Solr could continue supporting 1.5 as it is now with <types> and <fieldTypes>, create the version 1.6 that does not support those (and throws exception if present). 5.x would support 1.6+ versions, 4.x should support both but use 1.6 in the example. Anyone who needs to upgrade between 4.x versions can just keep their schema using 1.5. Anyone creating a new schema would start with the 1.6.
        Hide
        Shawn Heisey added a comment -

        The schema version changes how Solr interprets default settings. I'm fairly sure that it has nothing to do with the XML structure. I don't think we need a new schema version for this.

        +1 to Robert's idea in the first comment. I will restate it below to make sure I understand it properly:

        • Allow <field> and <fieldType> at the top level under <schema>.
        • Deprecate <fields> and <types> in 4x. Remove them in trunk. The unknown tags will fail parsing.
        • Don't worry about supporting all options in the deprecated sections.
        Show
        Shawn Heisey added a comment - The schema version changes how Solr interprets default settings. I'm fairly sure that it has nothing to do with the XML structure. I don't think we need a new schema version for this. +1 to Robert's idea in the first comment. I will restate it below to make sure I understand it properly: Allow <field> and <fieldType> at the top level under <schema>. Deprecate <fields> and <types> in 4x. Remove them in trunk. The unknown tags will fail parsing. Don't worry about supporting all options in the deprecated sections.
        Hide
        Yonik Seeley added a comment - - edited

        I like Erick's idea. No need to make people's life hard who are upgrading (or who have existing tools to deal with our schema's schema
        We should support field and fieldType definitions at the top level and stop using the old way.

        edit: re-reading the issue, this is the idea that Hoss originally had too. People can still use the old style if they want.

        Show
        Yonik Seeley added a comment - - edited I like Erick's idea. No need to make people's life hard who are upgrading (or who have existing tools to deal with our schema's schema We should support field and fieldType definitions at the top level and stop using the old way. edit: re-reading the issue, this is the idea that Hoss originally had too. People can still use the old style if they want.
        Hide
        Tomás Fernández Löbbe added a comment -

        

        The schema version changes how Solr interprets default settings. I'm fairly sure that it has nothing to do with the XML structure. I don't think we need a new schema version for this.


        Schema version should be used for whatever is necessary. It should be telling Solr how the schema.xml should be read, and I think this is a good case. I think it would make more clear which type of schema you want to use. Also, in case of an external tool reading Solr schema, it could tell how it should be read. That said, I’m OK with increasing it or not, it's just an idea.

        People can still use the old style if they want.

        I think this could lead to confusion. Someone that’s new with Solr will probably read a couple of sample schema files (the example schema for sure, but also probably also reading from blogs, forums, etc), that will be different but do the same thing, trying to track a bug may be more complicated until they understand this change. I’d prefer to get a clear error that says that I’m configuring the fields/types in the wrong place, that it has changed, or something like that.
        Migrating this should be easier, because by that time you probably have more understanding of how the schema is structured and why/how it has change. After that, it's a trivial change.

        Show
        Tomás Fernández Löbbe added a comment -  The schema version changes how Solr interprets default settings. I'm fairly sure that it has nothing to do with the XML structure. I don't think we need a new schema version for this. Schema version should be used for whatever is necessary. It should be telling Solr how the schema.xml should be read, and I think this is a good case. I think it would make more clear which type of schema you want to use. Also, in case of an external tool reading Solr schema, it could tell how it should be read. That said, I’m OK with increasing it or not, it's just an idea. People can still use the old style if they want. I think this could lead to confusion. Someone that’s new with Solr will probably read a couple of sample schema files (the example schema for sure, but also probably also reading from blogs, forums, etc), that will be different but do the same thing, trying to track a bug may be more complicated until they understand this change. I’d prefer to get a clear error that says that I’m configuring the fields/types in the wrong place, that it has changed, or something like that. Migrating this should be easier, because by that time you probably have more understanding of how the schema is structured and why/how it has change. After that, it's a trivial change.
        Hide
        Yonik Seeley added a comment -

        Someone that’s new with Solr will probably read a couple of sample schema files (the example schema for sure, but also probably also reading from blogs, forums, etc), that will be different but do the same thing,

        The change will lead to confusion since there will be tons of existing blogs, examples, etc, that use the old style. I don't see being lenient with what we accept leading to more confusion, esp if we take care to change all of our examples.

        Show
        Yonik Seeley added a comment - Someone that’s new with Solr will probably read a couple of sample schema files (the example schema for sure, but also probably also reading from blogs, forums, etc), that will be different but do the same thing, The change will lead to confusion since there will be tons of existing blogs, examples, etc, that use the old style. I don't see being lenient with what we accept leading to more confusion, esp if we take care to change all of our examples.
        Hide
        Tomás Fernández Löbbe added a comment -

        Well, I don't think it's confusing if you get an error message that's clear about the change. A similar strategy was taken for example when indexDefault/indexMain were removed from solrconfig.xml and and now that solr.xml has changed.
        Its also better in the sense that you don't need to maintain all configuration formats variations forever, just for the current version.

        Show
        Tomás Fernández Löbbe added a comment - Well, I don't think it's confusing if you get an error message that's clear about the change. A similar strategy was taken for example when indexDefault/indexMain were removed from solrconfig.xml and and now that solr.xml has changed. Its also better in the sense that you don't need to maintain all configuration formats variations forever, just for the current version.
        Hide
        Yonik Seeley added a comment -

        Well, I don't think it's confusing if you get an error message that's clear about the change.

        What's confusing about having no error message and having it "just work"? A user trying an older schema and having it work is less confusing than them getting an error.

        The confusion is initially seeing both formats and wondering what the difference is. That confusion will exist no matter what we do (assuming we make this change).

        A similar strategy was taken for example when indexDefault/indexMain were removed from solrconfig.xml and and now that solr.xml has changed.

        It's a different situation. The danger in that case was that someone would change indexDefault and expect the changes to take effect, when in fact they were ignored. That situation does not exist here if we do as Erick suggests and use a copyField like pattern: expression = "//" + COPY_FIELD;

        Its also better in the sense that you don't need to maintain all configuration formats variations forever

        Due to the power of xpath, it's a one line change.

        Show
        Yonik Seeley added a comment - Well, I don't think it's confusing if you get an error message that's clear about the change. What's confusing about having no error message and having it "just work"? A user trying an older schema and having it work is less confusing than them getting an error. The confusion is initially seeing both formats and wondering what the difference is. That confusion will exist no matter what we do (assuming we make this change). A similar strategy was taken for example when indexDefault/indexMain were removed from solrconfig.xml and and now that solr.xml has changed. It's a different situation. The danger in that case was that someone would change indexDefault and expect the changes to take effect, when in fact they were ignored. That situation does not exist here if we do as Erick suggests and use a copyField like pattern: expression = "//" + COPY_FIELD; Its also better in the sense that you don't need to maintain all configuration formats variations forever Due to the power of xpath, it's a one line change.
        Hide
        Erick Erickson added a comment -

        Yonik hit it I think. There is no maintenance here after the change to support
        both, this isn't at all like the back-compat issues faced in index support
        between versions for instance.

        I spend more time than I probably should on the user's list, I don't want to spend
        more of it explaining to people that we made a decision to change this and
        caused their existing tools/environment/whatever to blow up.

        Show
        Erick Erickson added a comment - Yonik hit it I think. There is no maintenance here after the change to support both, this isn't at all like the back-compat issues faced in index support between versions for instance. I spend more time than I probably should on the user's list, I don't want to spend more of it explaining to people that we made a decision to change this and caused their existing tools/environment/whatever to blow up.
        Hide
        Erick Erickson added a comment -

        Decided to work up a patch to see what it would take. Was a little surprised by some of the work with the schema api, had to modify a few tests:

        Steve Rowe - if you have a chance, please see if I've done violence to the managed schema code I changed. There were a few changes both to code and tests.

        All:
        The "FIELDS" static in IndexSchema.java turns out to be overloaded a bit in case you're wondering why I took "types" out but left "fields" in. It's used to put together output packets as well as parse the schema.xml file.

        Finally, I just took some of the <fields> and <types> tags out of some of the stock test files on the theory that would test code more randomly than I would.

        For the final patch, I'll take the <fields> and <types> tags out of the stock distro schema.xml file(s).

        Let me know if there are objections, I'll commit next week if not. I hear what Robert and Tomas are saying, but in this case I don't think the hassle to the users of enforcing these tag's absence is worth it. And it would be more work for the devs I think.

        Show
        Erick Erickson added a comment - Decided to work up a patch to see what it would take. Was a little surprised by some of the work with the schema api, had to modify a few tests: Steve Rowe - if you have a chance, please see if I've done violence to the managed schema code I changed. There were a few changes both to code and tests. All: The "FIELDS" static in IndexSchema.java turns out to be overloaded a bit in case you're wondering why I took "types" out but left "fields" in. It's used to put together output packets as well as parse the schema.xml file. Finally, I just took some of the <fields> and <types> tags out of some of the stock test files on the theory that would test code more randomly than I would. For the final patch, I'll take the <fields> and <types> tags out of the stock distro schema.xml file(s). Let me know if there are objections, I'll commit next week if not. I hear what Robert and Tomas are saying, but in this case I don't think the hassle to the users of enforcing these tag's absence is worth it. And it would be more work for the devs I think.
        Hide
        Steve Rowe added a comment - - edited

        Hi Erick,

        Patch looks fine, except, as Robert said earlier:

        Can we just go the simple route of deprecating 'fields' and 'types' in 4.x (throw error in 5.x), and in 4.x also allow field/fieldtypes to be "top-level" in the schema.
        I think this is ultimately simpler than just willy-nilly allowing shit to be nested underneath anywhere: thats hard to maintain:

        Your XPath changes in IndexSchema willy-nilly allow (dynamic)field / fieldType to be nested underneath anywhere:

        465: // ../fieldType or fieldtype
        466: expression = "//" + FIELD_TYPE + XPATH_OR + "//" + FIELD_TYPE.toLowerCase(Locale.ROOT);
        [...]
        638: //  ../field | /schema/dynamicField
        639: String expression = "//" + FIELD + XPATH_OR + "//" + DYNAMIC_FIELD;
        640: NodeList nodes = (NodeList)xpath.evaluate(expression, document, XPathConstants.NODESET);
        

        I think we should keep the old XPath and add ones allowing top-level versions, something like (untested) the following (after putting back the TYPES constant):

        edit reordered the XPath alternatives to have the old-school versions last

        466: //               /schema/fieldtype | /schema/fieldType | /schema/types/fieldtype | /schema/types/fieldType 
        467: expression =     stepsToPath(SCHEMA, FIELD_TYPE.toLowerCase(Locale.ROOT)) // backcompat(?) 
        468:     + XPATH_OR + stepsToPath(SCHEMA, FIELD_TYPE)
        469:     + XPATH_OR + stepsToPath(SCHEMA, TYPES, FIELD_TYPE.toLowerCase(Locale.ROOT))
        470:     + XPATH_OR + stepsToPath(SCHEMA, TYPES, FIELD_TYPE);
        [...]
        642: //                  /schema/field | /schema/dynamicField | /schema/fields/field | /schema/fields/dynamicField
        643: String expression = stepsToPath(SCHEMA, FIELD)
        644:        + XPATH_OR + stepsToPath(SCHEMA, DYNAMIC_FIELD)
        645:        + XPATH_OR + stepsToPath(SCHEMA, FIELDS, FIELD) 
        646:        + XPATH_OR + stepsToPath(SCHEMA, FIELDS, DYNAMIC_FIELD);
        
        Show
        Steve Rowe added a comment - - edited Hi Erick, Patch looks fine, except, as Robert said earlier: Can we just go the simple route of deprecating 'fields' and 'types' in 4.x (throw error in 5.x), and in 4.x also allow field/fieldtypes to be "top-level" in the schema. I think this is ultimately simpler than just willy-nilly allowing shit to be nested underneath anywhere: thats hard to maintain: Your XPath changes in IndexSchema willy-nilly allow (dynamic)field / fieldType to be nested underneath anywhere: 465: // ../fieldType or fieldtype 466: expression = "//" + FIELD_TYPE + XPATH_OR + "//" + FIELD_TYPE.toLowerCase(Locale.ROOT); [...] 638: // ../field | /schema/dynamicField 639: String expression = "//" + FIELD + XPATH_OR + "//" + DYNAMIC_FIELD; 640: NodeList nodes = (NodeList)xpath.evaluate(expression, document, XPathConstants.NODESET); I think we should keep the old XPath and add ones allowing top-level versions, something like (untested) the following (after putting back the TYPES constant): edit reordered the XPath alternatives to have the old-school versions last 466: // /schema/fieldtype | /schema/fieldType | /schema/types/fieldtype | /schema/types/fieldType 467: expression = stepsToPath(SCHEMA, FIELD_TYPE.toLowerCase(Locale.ROOT)) // backcompat(?) 468: + XPATH_OR + stepsToPath(SCHEMA, FIELD_TYPE) 469: + XPATH_OR + stepsToPath(SCHEMA, TYPES, FIELD_TYPE.toLowerCase(Locale.ROOT)) 470: + XPATH_OR + stepsToPath(SCHEMA, TYPES, FIELD_TYPE); [...] 642: // /schema/field | /schema/dynamicField | /schema/fields/field | /schema/fields/dynamicField 643: String expression = stepsToPath(SCHEMA, FIELD) 644: + XPATH_OR + stepsToPath(SCHEMA, DYNAMIC_FIELD) 645: + XPATH_OR + stepsToPath(SCHEMA, FIELDS, FIELD) 646: + XPATH_OR + stepsToPath(SCHEMA, FIELDS, DYNAMIC_FIELD);
        Hide
        Erick Erickson added a comment - - edited

        Hmm, I guess the discussion has been about whether it makes sense to ever pull out the old-style /schema/fields or /schema/types sections. Deprecating them is certainly reasonable, but I'd balk at dis-allowing them. Your suggestion doesn't dis-allow them though.

        Thinking about it some more, my patch allows definitions like

        <fieldType>
           <field>...</field>
        </fieldType>
        

        or

        <field>
           <fieldType>...</fieldType>
        </field>
        

        or, for that matter, something like

        <copyField>
          <fieldType>
            <field>...</field>
          </fieldType>
        </copyField>
        

        to be defined in schema.xml. Blech. I haven't a clue how Solr would behave in these cases...... and there's always case N+1.

        Looks like I'm talking myself into adopting your suggestion.

        Show
        Erick Erickson added a comment - - edited Hmm, I guess the discussion has been about whether it makes sense to ever pull out the old-style /schema/fields or /schema/types sections. Deprecating them is certainly reasonable, but I'd balk at dis-allowing them. Your suggestion doesn't dis-allow them though. Thinking about it some more, my patch allows definitions like <fieldType> <field>...</field> </fieldType> or <field> <fieldType>...</fieldType> </field> or, for that matter, something like <copyField> <fieldType> <field>...</field> </fieldType> </copyField> to be defined in schema.xml. Blech. I haven't a clue how Solr would behave in these cases...... and there's always case N+1. Looks like I'm talking myself into adopting your suggestion.
        Hide
        Erick Erickson added a comment -

        Patch implementing Steve's suggestion and removing <types> and <fields> tags from a bunch of our example files.

        I'll commit this after testing today/tomorrow.

        Show
        Erick Erickson added a comment - Patch implementing Steve's suggestion and removing <types> and <fields> tags from a bunch of our example files. I'll commit this after testing today/tomorrow.
        Hide
        ASF subversion and git services added a comment -

        Commit 1580515 from Erick Erickson in branch 'dev/trunk'
        [ https://svn.apache.org/r1580515 ]

        SOLR-5228: Deprecate <fields> and <types> tags in schema.xml

        Show
        ASF subversion and git services added a comment - Commit 1580515 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1580515 ] SOLR-5228 : Deprecate <fields> and <types> tags in schema.xml
        Hide
        Steve Rowe added a comment -

        +1, thanks Erick.

        One (maybe intentional?) thing I noticed: You removed <types> and </types> from src/test-files/solr/collection1/conf/schema-add-schema-fields-update-processor.xml, but not <fields> and </fields> - in all other changed files, you removed all of them.

        Show
        Steve Rowe added a comment - +1, thanks Erick. One (maybe intentional?) thing I noticed: You removed <types> and </types> from src/test-files/solr/collection1/conf/schema-add-schema-fields-update-processor.xml , but not <fields> and </fields> - in all other changed files, you removed all of them.
        Hide
        Steve Rowe added a comment -

        Erick, I think the CHANGES.txt entry should go under "Other Changes" instead of "New Features" (I don't think config changes for existing features qualifies as new?), and I also think there should be some form of extra notice, probably in the "Upgrading from 4.7" section at the top, calling out this change in the examples and in accepted user config.

        Show
        Steve Rowe added a comment - Erick, I think the CHANGES.txt entry should go under "Other Changes" instead of "New Features" (I don't think config changes for existing features qualifies as new?), and I also think there should be some form of extra notice, probably in the "Upgrading from 4.7" section at the top, calling out this change in the examples and in accepted user config.
        Hide
        Tomás Fernández Löbbe added a comment -

        Taking this path I think the changes look good. Thanks Erick.
        Are you planning on committing to 4x too?

        Show
        Tomás Fernández Löbbe added a comment - Taking this path I think the changes look good. Thanks Erick. Are you planning on committing to 4x too?
        Hide
        Ryan Ernst added a comment -

        Why was this committed so quickly? It was less than 24 hours from the initial patch, on a weekend (Sat night to Sunday morning), on a semi contentious issue (there are differing opinions on how this issue should be handled).

        From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0. The committed change has no deprecation in 4x, and no error in 5.0. It simply adds support for fieldType/field at the root level. There are also no tests to check the "old" way still works in 4x?

        Show
        Ryan Ernst added a comment - Why was this committed so quickly? It was less than 24 hours from the initial patch, on a weekend (Sat night to Sunday morning), on a semi contentious issue (there are differing opinions on how this issue should be handled). From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0. The committed change has no deprecation in 4x, and no error in 5.0. It simply adds support for fieldType/field at the root level. There are also no tests to check the "old" way still works in 4x?
        Hide
        Yonik Seeley added a comment -

        From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0.

        There seemed to be mostly consensus for allowing all field/fieldType declarations at the top level, but I didn't see a consensus for removing support for the old/existing style.

        Show
        Yonik Seeley added a comment - From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0. There seemed to be mostly consensus for allowing all field/fieldType declarations at the top level, but I didn't see a consensus for removing support for the old/existing style.
        Hide
        Mark Miller added a comment -

        Why was this committed so quickly?

        +1 - given the comments from so many different people on this issue, seems like there should have been an attempt to work with everyone here. A warning to commit and a chance for people to object. We are a commit then review project, but on an issue that has pulled in attention from so many, we should favor a process that includes everyone.

        From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0. The committed change has no deprecation in 4x, and no error in 5.0. It simply adds support for fieldType/field at the root level. There are also no tests to check the "old" way still works in 4x?
        

        +1

        Show
        Mark Miller added a comment - Why was this committed so quickly? +1 - given the comments from so many different people on this issue, seems like there should have been an attempt to work with everyone here. A warning to commit and a chance for people to object. We are a commit then review project, but on an issue that has pulled in attention from so many, we should favor a process that includes everyone. From reading through previous comments, it seemed there was some basic agreement on deprecation in 4x and then error in 5.0. The committed change has no deprecation in 4x, and no error in 5.0. It simply adds support for fieldType/field at the root level. There are also no tests to check the "old" way still works in 4x? +1
        Hide
        Mark Miller added a comment -

        Why was this committed so quickly?

        Ryan Ernst, from the other side, nothing is set in stone yet. This issue can certainly still evolve if people still have objections.

        Show
        Mark Miller added a comment - Why was this committed so quickly? Ryan Ernst , from the other side, nothing is set in stone yet. This issue can certainly still evolve if people still have objections.
        Hide
        Yonik Seeley added a comment -

        There appeared to be two camps:
        1) allow field declarations at the top level
        2) allow field declarations at the top level, disallow (in 5x) current style

        It doesn't seem to be a stretch to do the part that people agree with (which is what I think Erick did).
        Unless one considers a third option (do nothing) and considers that a better option than #1? I didn't see any one expressing that opinion.

        Show
        Yonik Seeley added a comment - There appeared to be two camps: 1) allow field declarations at the top level 2) allow field declarations at the top level, disallow (in 5x) current style It doesn't seem to be a stretch to do the part that people agree with (which is what I think Erick did). Unless one considers a third option (do nothing) and considers that a better option than #1? I didn't see any one expressing that opinion.
        Hide
        Mark Miller added a comment -

        What Erick did is not crazy. I don't think he thought he was doing anything controversial.

        But that has little relevance to Ryan's very valid comments. I support them 100%.

        Show
        Mark Miller added a comment - What Erick did is not crazy. I don't think he thought he was doing anything controversial. But that has little relevance to Ryan's very valid comments. I support them 100%.
        Hide
        Shawn Heisey added a comment -

        I was one in support of removing the old enclosing tags in trunk. I believe it's a good idea to fail early and loudly when old syntax gets used – once you cross the major version boundary, anyway.

        I am OK with what I know about Erick's commit - Yonik has a good point regarding the longevity of existing information on the Internet. I've not actually looked at the commits, so these statements might be premature: Perhaps we can simply have 5.0 log a warning. When 5.0 begins the march towards release and trunk becomes 6.0, we can discuss whether we think it's a good idea to make it an unrecoverable error.

        Show
        Shawn Heisey added a comment - I was one in support of removing the old enclosing tags in trunk. I believe it's a good idea to fail early and loudly when old syntax gets used – once you cross the major version boundary, anyway. I am OK with what I know about Erick's commit - Yonik has a good point regarding the longevity of existing information on the Internet. I've not actually looked at the commits, so these statements might be premature: Perhaps we can simply have 5.0 log a warning. When 5.0 begins the march towards release and trunk becomes 6.0, we can discuss whether we think it's a good idea to make it an unrecoverable error.
        Show
        Yonik Seeley added a comment - http://en.wikipedia.org/wiki/Parkinson%27s_Law_of_Triviality
        Hide
        Benson Margulies added a comment -

        I apologize for showing up so late with an opinion. I can't get over the feeling that this might be solving the wrong problem.

        In XML, the structure of

          <SOME_ITEMs>
            <SOME_ITEM>
            </SOME_ITEM>
            ...
        </SOME_ITEMs>
        

        is ancient and honorable. Yea, some schemas dispense with the container for the group, but plenty do not. The source of this was someone who misplaced an item and didn't get a diagnosis. Why don't we concentrate on diagnosis? Why not create a schema and, by default, check it? It's not like we're in a giant hurry at start-up compared to the extra time of enabling a validating parse.

        Grouping these guys together is harmless at worst and slight helpful at best.

        If we are going to change the schema, I would beg that anyone changing it put forth an actual, well, schema that is an accurate representation of what is allowed.

        So I'm belatedly -1 on this change, for why tiny little bit its worth.

        Show
        Benson Margulies added a comment - I apologize for showing up so late with an opinion. I can't get over the feeling that this might be solving the wrong problem. In XML, the structure of <SOME_ITEMs> <SOME_ITEM> </SOME_ITEM> ... </SOME_ITEMs> is ancient and honorable. Yea, some schemas dispense with the container for the group, but plenty do not. The source of this was someone who misplaced an item and didn't get a diagnosis. Why don't we concentrate on diagnosis? Why not create a schema and, by default, check it? It's not like we're in a giant hurry at start-up compared to the extra time of enabling a validating parse. Grouping these guys together is harmless at worst and slight helpful at best. If we are going to change the schema, I would beg that anyone changing it put forth an actual, well, schema that is an accurate representation of what is allowed. So I'm belatedly -1 on this change, for why tiny little bit its worth.
        Hide
        Walter Underwood added a comment - - edited

        I certainly agree that validating XML configs against a schema is far more useful than making the parser more permissive. We should fix the problem, not the symptom.

        Just this week, I found that the "required" attribute is not documented, not even the default value. That sort of stuff is easier to spot with validation, because you can check the docs against that.

        And how many questions have we answered about "multivalued" vs. "multiValued"?

        As we start supporting modification for config files, this gets more important. They should be validated before write.

        Show
        Walter Underwood added a comment - - edited I certainly agree that validating XML configs against a schema is far more useful than making the parser more permissive. We should fix the problem, not the symptom. Just this week, I found that the "required" attribute is not documented, not even the default value. That sort of stuff is easier to spot with validation, because you can check the docs against that. And how many questions have we answered about "multivalued" vs. "multiValued"? As we start supporting modification for config files, this gets more important. They should be validated before write.
        Hide
        Yonik Seeley added a comment -

        The source of this was someone who misplaced an item and didn't get a diagnosis. Why don't we concentrate on diagnosis?

        That's a good point. We shouldn't conflate that with the other schema-changing issues here.

        Show
        Yonik Seeley added a comment - The source of this was someone who misplaced an item and didn't get a diagnosis. Why don't we concentrate on diagnosis? That's a good point. We shouldn't conflate that with the other schema-changing issues here.
        Hide
        ASF subversion and git services added a comment -

        Commit 1580725 from Erick Erickson in branch 'dev/trunk'
        [ https://svn.apache.org/r1580725 ]

        SOLR-5228: Deprecate <fields> and <types> tags in schema.xml. Moved entries in CHANGES.txt as per Steve Rowe's comments

        Show
        ASF subversion and git services added a comment - Commit 1580725 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1580725 ] SOLR-5228 : Deprecate <fields> and <types> tags in schema.xml. Moved entries in CHANGES.txt as per Steve Rowe's comments
        Hide
        Erick Erickson added a comment -

        Why was it committed so quickly? Because I was about to catch a trans-continental flight.
        Not much of a reason, true. We can always un-commit if I've screwed the pooch, it
        hasn't been released yet after all.

        Why wasn't it committed to 4x? Because I didn't have time to merge, test, precommit and
        commit before I had to leave and the plane didn't have WiFi. Taking care of that now. Again,
        not much of an excuse although it is a reason.

        Why wasn't a deprecated message created? Because I had to leave before being
        able to commit the CHANGES.txt following Steve's suggestions, as far as the
        CHANGES.txt file is concerned. We can always raise another JIRA about logging
        deprecation messages if people feel it is important. Just did that for trunk.

        Benson:
        I go back and forth on the formal schema idea, but I think that's a separate issue?
        How does that alter this patch?

        I'm pretty sure it's been suggested multiple times, but somehow it's never gotten
        any traction. It'd have to deal with any custom components and their parameters,
        which might be a challenge.

        From my perspective, I made an attempt to synthesize the comments of people
        who had expressed their opinions. I think this is an OK patch, we can always alter
        it before it gets released.

        I'm rushed for the next two weeks. I hereby authorize anyone with enough interest
        to roll back, alter, or submit new patches as they see fit. It seems like 6 months is
        enough time for people to have commented if they were interested though.

        Show
        Erick Erickson added a comment - Why was it committed so quickly? Because I was about to catch a trans-continental flight. Not much of a reason, true. We can always un-commit if I've screwed the pooch, it hasn't been released yet after all. Why wasn't it committed to 4x? Because I didn't have time to merge, test, precommit and commit before I had to leave and the plane didn't have WiFi. Taking care of that now. Again, not much of an excuse although it is a reason. Why wasn't a deprecated message created? Because I had to leave before being able to commit the CHANGES.txt following Steve's suggestions, as far as the CHANGES.txt file is concerned. We can always raise another JIRA about logging deprecation messages if people feel it is important. Just did that for trunk. Benson: I go back and forth on the formal schema idea, but I think that's a separate issue? How does that alter this patch? I'm pretty sure it's been suggested multiple times, but somehow it's never gotten any traction. It'd have to deal with any custom components and their parameters, which might be a challenge. From my perspective, I made an attempt to synthesize the comments of people who had expressed their opinions. I think this is an OK patch, we can always alter it before it gets released. I'm rushed for the next two weeks. I hereby authorize anyone with enough interest to roll back, alter, or submit new patches as they see fit. It seems like 6 months is enough time for people to have commented if they were interested though.
        Hide
        ASF subversion and git services added a comment -

        Commit 1580736 from Erick Erickson in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1580736 ]

        SOLR-5228: Deprecate <fields> and <types> tags in schema.xml, plus incorporated Steve's comments about CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1580736 from Erick Erickson in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1580736 ] SOLR-5228 : Deprecate <fields> and <types> tags in schema.xml, plus incorporated Steve's comments about CHANGES.txt
        Hide
        Erick Erickson added a comment -

        P.S. Steve:
        I intentionally left the <fields> </fields> in
        src/test-files/solr/collection1/conf/schema-add-schema-fields-update-processor.xml
        on the theory that there might be something weird with only one of them defined.

        Good eyes!

        A bit ugly I'll admit...

        Show
        Erick Erickson added a comment - P.S. Steve: I intentionally left the <fields> </fields> in src/test-files/solr/collection1/conf/schema-add-schema-fields-update-processor.xml on the theory that there might be something weird with only one of them defined. Good eyes! A bit ugly I'll admit...
        Hide
        David Smiley added a comment -

        I like the idea of having a bona-fide schema, but it's not clear how to deal with custom extensions – which I've done in custom coding. If the schema defines the core but lets any other XML element be anywhere else for extensibility, it doesn't help the user that misnames or misplaces something.

        Show
        David Smiley added a comment - I like the idea of having a bona-fide schema, but it's not clear how to deal with custom extensions – which I've done in custom coding. If the schema defines the core but lets any other XML element be anywhere else for extensibility, it doesn't help the user that misnames or misplaces something.
        Hide
        Benson Margulies added a comment -

        Allow the person extending the schema to provide a, well, extended schema.

        Show
        Benson Margulies added a comment - Allow the person extending the schema to provide a, well, extended schema.
        Hide
        Erick Erickson added a comment -

        bq: Allow the person extending the schema to provide a, well, extended schema.

        Now, in order to run, we either have to
        1> incorporate each and every extension anyone in the world wants to write into our releases
        or
        2> require everyone who doesn't want to/can't give their extension back to Apache to maintain their own DTD forevermore, updating it at each release of Solr they want to upgrade.

        Don't get me wrong, I have complete and total sympathy with the end goal here. If we can solve this in a way that extends we could save users countless hours, your point about multivalued .vs. multiValued is well taken indeed. I suppose trading that off against the added pain caused by invalidating current tools is a judgement call.

        And I don't know that much about extending DTDs. Is there a way to do something similar to xinclude? Or shutting off validation. Hmmm, I rather like that one at first glance, although I'm not sure where to specify it. Perhaps a default of on, expert users could turn validation off if they added custom stuff or decide to keep their own copy of the DTD up to date.

        How much you want to bet the first time we tried to run tests we'd find some of our test schemas have errors?

        Show
        Erick Erickson added a comment - bq: Allow the person extending the schema to provide a, well, extended schema. Now, in order to run, we either have to 1> incorporate each and every extension anyone in the world wants to write into our releases or 2> require everyone who doesn't want to/can't give their extension back to Apache to maintain their own DTD forevermore, updating it at each release of Solr they want to upgrade. Don't get me wrong, I have complete and total sympathy with the end goal here. If we can solve this in a way that extends we could save users countless hours, your point about multivalued .vs. multiValued is well taken indeed. I suppose trading that off against the added pain caused by invalidating current tools is a judgement call. And I don't know that much about extending DTDs. Is there a way to do something similar to xinclude? Or shutting off validation. Hmmm, I rather like that one at first glance, although I'm not sure where to specify it. Perhaps a default of on, expert users could turn validation off if they added custom stuff or decide to keep their own copy of the DTD up to date. How much you want to bet the first time we tried to run tests we'd find some of our test schemas have errors?
        Hide
        Benson Margulies added a comment -

        DTD's are useless. We need to pick one of W3C XML Schema or RNG. RNG is a lot easier to work with. Schematron is another possibility, but I have no experience. See http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/package-summary.html.

        Choices are:

        • validation is easy to disable; people who customize disable it
        • customizers take the entire schema, add to it, and provide their added one. Not so good for multiples.
        • customizers are constrained to use namespaces – you customize, you add an XML namespace, and you provide a schema for your namespace.

        Of course the first time we try this we'll find problems in the test schemas.

        Has anyone done anything in this area that I could start from if I was inclined to try to work on this?

        Show
        Benson Margulies added a comment - DTD's are useless. We need to pick one of W3C XML Schema or RNG. RNG is a lot easier to work with. Schematron is another possibility, but I have no experience. See http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/package-summary.html . Choices are: validation is easy to disable; people who customize disable it customizers take the entire schema, add to it, and provide their added one. Not so good for multiples. customizers are constrained to use namespaces – you customize, you add an XML namespace, and you provide a schema for your namespace. Of course the first time we try this we'll find problems in the test schemas. Has anyone done anything in this area that I could start from if I was inclined to try to work on this?
        Hide
        David Smiley added a comment -

        +1 to namespaces. External code could add the namespace to the custom stuff, and that part could be non-validating and excluded from the "core" schema. It also makes it clear where there's custom stuff.

        Show
        David Smiley added a comment - +1 to namespaces. External code could add the namespace to the custom stuff, and that part could be non-validating and excluded from the "core" schema. It also makes it clear where there's custom stuff.
        Hide
        Erick Erickson added a comment -

        Works for me I guess. Let's move this to another JIRA though, it'll get lost here.

        Show
        Erick Erickson added a comment - Works for me I guess. Let's move this to another JIRA though, it'll get lost here.
        Hide
        Mark Miller added a comment -

        Why was it committed so quickly? Because I was about to catch a trans-continental flight. Not much of a reason, true.

        Heh - Dude, that's a fantastic reason to not quick commit. I've heard the reason come up before - a much more Apache like rule of thumb is to just hold off if you are going off the grid rather than hurry up.

        Show
        Mark Miller added a comment - Why was it committed so quickly? Because I was about to catch a trans-continental flight. Not much of a reason, true. Heh - Dude, that's a fantastic reason to not quick commit. I've heard the reason come up before - a much more Apache like rule of thumb is to just hold off if you are going off the grid rather than hurry up.
        Hide
        Jan Høydahl added a comment -

        If anyone plans to remove support for the old syntax in 5.x, remember that it should be possible for people to upgrade the WAR to 5.x while keeping their existing config and schema without things crashing. I.e. when parsing chema, deprecate/fail depending on <luceneMatchVersion> in solrconfig, not depending on actual Solr version.

        A customer of mine has extended schema with their custom namespace and that works very well. +1 for validating non-namespaced tags only. This brings to memory the Currency field type which takes providerClass as one of its xml attributes. Then, depending on the provider class, it may expect other attributes to exist, such as OpenExchangeRatesOrgProvider expecting attributes ratesFileLocation and refreshInterval without a prefix. It will be hard to strictly validate xml attributes that can be user-defined like this, and in fact our core FieldTypes are really just plugins too...

        All in all I think we'd be in better shape through some simple custom validation in Java code which validates the basics such as allowed tags and sub tags, and complains if there are unknown tags without a namespace. Perhaps also check for existence of misspelled stuff such as multivalued, precisionstep etc. Then we can save full XSD or whatever for another day

        Show
        Jan Høydahl added a comment - If anyone plans to remove support for the old syntax in 5.x, remember that it should be possible for people to upgrade the WAR to 5.x while keeping their existing config and schema without things crashing. I.e. when parsing chema, deprecate/fail depending on <luceneMatchVersion> in solrconfig, not depending on actual Solr version. A customer of mine has extended schema with their custom namespace and that works very well. +1 for validating non-namespaced tags only. This brings to memory the Currency field type which takes providerClass as one of its xml attributes. Then, depending on the provider class, it may expect other attributes to exist, such as OpenExchangeRatesOrgProvider expecting attributes ratesFileLocation and refreshInterval without a prefix. It will be hard to strictly validate xml attributes that can be user-defined like this, and in fact our core FieldTypes are really just plugins too... All in all I think we'd be in better shape through some simple custom validation in Java code which validates the basics such as allowed tags and sub tags, and complains if there are unknown tags without a namespace. Perhaps also check for existence of misspelled stuff such as multivalued, precisionstep etc. Then we can save full XSD or whatever for another day
        Hide
        ASF subversion and git services added a comment -

        Commit 1584414 from sarowe@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1584414 ]

        Remove SOLR-5228 from the "New Features" section - it's already in the "Other Changes" section

        Show
        ASF subversion and git services added a comment - Commit 1584414 from sarowe@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1584414 ] Remove SOLR-5228 from the "New Features" section - it's already in the "Other Changes" section
        Hide
        ASF subversion and git services added a comment -

        Commit 1584416 from sarowe@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1584416 ]

        Remove SOLR-5228 from the "New Features" section - its already in the "Other Changes" section (merged trunk r1584414)

        Show
        ASF subversion and git services added a comment - Commit 1584416 from sarowe@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1584416 ] Remove SOLR-5228 from the "New Features" section - its already in the "Other Changes" section (merged trunk r1584414)
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0
        Hide
        Parvesh Garg added a comment -

        One use case for keeping sections is to have the capability of <xi:include functionality. We have a lot of cores and use this for common fields and field types that we don't want anyone to modify per core.

        <xi:includes take only well formed xml (meaning they need a root element) and make the included xml a child element at the place where the directive is put. We already struggled with not being able to put <copyField and <similarity in a common file as they are directly under the root element <schema.

        My understanding of <xi:include could be wrong, happy to get corrected. If not and we still want to remove these sections, can there be another way to provide an include directive. Something like commons-configuration does.

        Show
        Parvesh Garg added a comment - One use case for keeping sections is to have the capability of <xi:include functionality. We have a lot of cores and use this for common fields and field types that we don't want anyone to modify per core. <xi:includes take only well formed xml (meaning they need a root element) and make the included xml a child element at the place where the directive is put. We already struggled with not being able to put <copyField and <similarity in a common file as they are directly under the root element <schema. My understanding of <xi:include could be wrong, happy to get corrected. If not and we still want to remove these sections, can there be another way to provide an include directive. Something like commons-configuration does.
        Hide
        David Smiley added a comment -

        Good point Parvesh! I wonder if Solr should handle this in a generic sense... like if hypothetically there was a special tag like <INCLUDE> that Solr auto-removes first things after it reads the XML. That would enable new uses of XInclude where today there is no natural enclosing tag to do so.

        Show
        David Smiley added a comment - Good point Parvesh! I wonder if Solr should handle this in a generic sense... like if hypothetically there was a special tag like <INCLUDE> that Solr auto-removes first things after it reads the XML. That would enable new uses of XInclude where today there is no natural enclosing tag to do so.
        Hide
        Shawn Heisey added a comment -

        I wonder if Solr should handle this in a generic sense... like if hypothetically there was a special tag like <INCLUDE> that Solr auto-removes first things after it reads the XML. That would enable new uses of XInclude where today there is no natural enclosing tag to do so.

        I really like this idea. If the issue doesn't show up in the next few hours (and hasn't been filed already), I'll go ahead and file it. My solrconfig.xml is almost entirely xinclude tags. I would have used all xinclude tags, but the admin, analysis, and jmx handlers didn't make much sense to put into individual xml files, and I couldn't put them in an xml file together. Also, I have entirely too many includes, I'd like to be able to combine some of them together.

        I've never touched xml/xpath parsing ... does the parser support an easy way of moving tags up a level in the structure? If we can convert XML to a NamedList, I would expect that to be pretty easy.

        Show
        Shawn Heisey added a comment - I wonder if Solr should handle this in a generic sense... like if hypothetically there was a special tag like <INCLUDE> that Solr auto-removes first things after it reads the XML. That would enable new uses of XInclude where today there is no natural enclosing tag to do so. I really like this idea. If the issue doesn't show up in the next few hours (and hasn't been filed already), I'll go ahead and file it. My solrconfig.xml is almost entirely xinclude tags. I would have used all xinclude tags, but the admin, analysis, and jmx handlers didn't make much sense to put into individual xml files, and I couldn't put them in an xml file together. Also, I have entirely too many includes, I'd like to be able to combine some of them together. I've never touched xml/xpath parsing ... does the parser support an easy way of moving tags up a level in the structure? If we can convert XML to a NamedList, I would expect that to be pretty easy.

          People

          • Assignee:
            Erick Erickson
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development