Solr
  1. Solr
  2. SOLR-2671

SchemaField.indexOptions() should return removed to decrease visibility of expert & experimental IndexOptions class

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      LUCENE-2048 introduced a SchemaField.indexOptions method which returns an IndexOptions object based on the omitTermFreqAndPosition and omitPosition boolean properties – but after some consideration it was decided that it was not a good idea to expose that expert level class at this level of the Solr internal APIs, and that SchemaField should instead only expose direct accessors for the specific properties.

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          bulk close for 3.4

          Show
          Robert Muir added a comment - bulk close for 3.4
          Hide
          Hoss Man added a comment -

          Committed revision 1150840. - trunk
          Committed revision 1150848. - 3x

          ...i didn't mention this in CHANGES.txt since it's really just an implementation detail of LUCENE-2048 which is already mentioned

          Show
          Hoss Man added a comment - Committed revision 1150840. - trunk Committed revision 1150848. - 3x ...i didn't mention this in CHANGES.txt since it's really just an implementation detail of LUCENE-2048 which is already mentioned
          Hide
          Hoss Man added a comment -

          linking to issue where indexOptions() was introduced

          Show
          Hoss Man added a comment - linking to issue where indexOptions() was introduced
          Hide
          Robert Muir added a comment -

          yeah, i like it: this way the surface area is less exposed.

          Show
          Robert Muir added a comment - yeah, i like it: this way the surface area is less exposed.
          Hide
          Hoss Man added a comment -

          Not really: currently the xml files have two properties, but the Solr APIs expose indexOptions(), which it then sets on the lucene Field when it makes it.

          Alternatively, the Solr APIs could have two booleans, or a real class or something like that so that nobody ever "sees" indexOptions unless they interact with a lucene Field.

          Good point ... i think getting IndexOptions out of SchemaField and down into FieldType (where the only way you see it is if you are directly dealing with a lucene Field Object) makes sense.

          Take a look at the attached patch – it replaces SchemaField.indexOptions() with two boolean methods corresponding to the properties (and adds back a deprecated omitTf() as an alias - that was removed in LUCENE-2048 but isn't back-compat on the 3x branch) and moves the IndexOptions logic into FieldType.getIndexOptions (which is consistent with the getFieldStore and getFieldIndex that were already in that class)

          Opinions?

          (obviously we can always add better javadocs, but i wanted to start with the important part)

          Show
          Hoss Man added a comment - Not really: currently the xml files have two properties, but the Solr APIs expose indexOptions(), which it then sets on the lucene Field when it makes it. Alternatively, the Solr APIs could have two booleans, or a real class or something like that so that nobody ever "sees" indexOptions unless they interact with a lucene Field. Good point ... i think getting IndexOptions out of SchemaField and down into FieldType (where the only way you see it is if you are directly dealing with a lucene Field Object) makes sense. Take a look at the attached patch – it replaces SchemaField.indexOptions() with two boolean methods corresponding to the properties (and adds back a deprecated omitTf() as an alias - that was removed in LUCENE-2048 but isn't back-compat on the 3x branch) and moves the IndexOptions logic into FieldType.getIndexOptions (which is consistent with the getFieldStore and getFieldIndex that were already in that class) Opinions? (obviously we can always add better javadocs, but i wanted to start with the important part)
          Hide
          Robert Muir added a comment -

          isn't that what we do already? at the user friendly level we use the "omitTermFreqAndPositions" and "termPositions" prroperties – indexOptions() is only used by FieldType at the last possible moment.

          Not really: currently the xml files have two properties, but the Solr APIs expose indexOptions(), which it then sets on the lucene Field when it makes it.

          Alternatively, the Solr APIs could have two booleans, or a real class or something like that so that nobody ever "sees" indexOptions unless they interact with a lucene Field.

          The reason i didn't do this initially is because with two booleans and the crazy implication between them, its easy to introduce bugs (as seen by the buggy verification check in SOLR-2669 !)... Plus it forced me to review all usages of the booleans in the current code that uses the SchemaField, versus adding an additional one (which might create bugs where something should be considering the new case but didnt)

          Show
          Robert Muir added a comment - isn't that what we do already? at the user friendly level we use the "omitTermFreqAndPositions" and "termPositions" prroperties – indexOptions() is only used by FieldType at the last possible moment. Not really: currently the xml files have two properties, but the Solr APIs expose indexOptions(), which it then sets on the lucene Field when it makes it. Alternatively, the Solr APIs could have two booleans, or a real class or something like that so that nobody ever "sees" indexOptions unless they interact with a lucene Field. The reason i didn't do this initially is because with two booleans and the crazy implication between them, its easy to introduce bugs (as seen by the buggy verification check in SOLR-2669 !)... Plus it forced me to review all usages of the booleans in the current code that uses the SchemaField, versus adding an additional one (which might create bugs where something should be considering the new case but didnt)
          Hide
          Hoss Man added a comment -

          better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be)

          I don't think we should do any of this.
          This thing is totally expert, and labeled as such.

          Fair enough – it's expert and used at a low level so you don't want to spend cycles on an explicit null check ... but i don't really understand why you are opposed to adding docs to the (public) AbstractField.setIndexOptions that say null is not allowed and that the value of the enum is meaningless/unused unless the field is indexed.

          If we want the Solr SchemaField stuff to use something more friendly, then it shouldn't use this IndexOptions enum at all, and instead use its own system internally and convert to it at the last minute.

          isn't that what we do already? at the user friendly level we use the "omitTermFreqAndPositions" and "termPositions" prroperties – indexOptions() is only used by FieldType at the last possible moment.

          That's my point about adding direct accessors for those properties (so code has a "friendly" way of getting the values of those props w/o deciphering the IndexOptions object)

          Show
          Hoss Man added a comment - better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be) I don't think we should do any of this. This thing is totally expert, and labeled as such. Fair enough – it's expert and used at a low level so you don't want to spend cycles on an explicit null check ... but i don't really understand why you are opposed to adding docs to the (public) AbstractField.setIndexOptions that say null is not allowed and that the value of the enum is meaningless/unused unless the field is indexed. If we want the Solr SchemaField stuff to use something more friendly, then it shouldn't use this IndexOptions enum at all, and instead use its own system internally and convert to it at the last minute. isn't that what we do already? at the user friendly level we use the "omitTermFreqAndPositions" and "termPositions" prroperties – indexOptions() is only used by FieldType at the last possible moment. That's my point about adding direct accessors for those properties (so code has a "friendly" way of getting the values of those props w/o deciphering the IndexOptions object)
          Hide
          Robert Muir added a comment -

          It's not a valid value on the enum, but it is seems like it should be a valid value indicating that no IndexOptions should be used (AbstractField.setIndexOptions doesn't seem to reject null as valid input)

          Either that or there should be an explicit IndexOptions.NONE value (which kind of makes me wonder why we have both FieldInfos.IndexOptions and Field.Index ... seems like maybe they should be one Class with setters for the optional parts)

          better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be)

          I don't think we should do any of this.
          This thing is totally expert, and labeled as such.
          Its used in the lowest level postings readers and other performance critical places.
          I don't think it should be made friendly, and I don't think we need null checks anywhere in lucene code that takes it.

          If we want the Solr SchemaField stuff to use something more friendly, then it shouldn't use this IndexOptions enum at all, and instead use its own system internally and convert to it at the last minute.

          Show
          Robert Muir added a comment - It's not a valid value on the enum, but it is seems like it should be a valid value indicating that no IndexOptions should be used (AbstractField.setIndexOptions doesn't seem to reject null as valid input) Either that or there should be an explicit IndexOptions.NONE value (which kind of makes me wonder why we have both FieldInfos.IndexOptions and Field.Index ... seems like maybe they should be one Class with setters for the optional parts) better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be) I don't think we should do any of this. This thing is totally expert, and labeled as such. Its used in the lowest level postings readers and other performance critical places. I don't think it should be made friendly, and I don't think we need null checks anywhere in lucene code that takes it. If we want the Solr SchemaField stuff to use something more friendly, then it shouldn't use this IndexOptions enum at all, and instead use its own system internally and convert to it at the last minute.
          Hide
          Hoss Man added a comment -

          I don't think we should pass null to the lucene Field (its not a valid value for the enum).

          It's not a valid value on the enum, but it is seems like it should be a valid value indicating that no IndexOptions should be used (AbstractField.setIndexOptions doesn't seem to reject null as valid input)

          Either that or there should be an explicit IndexOptions.NONE value (which kind of makes me wonder why we have both FieldInfos.IndexOptions and Field.Index ... seems like maybe they should be one Class with setters for the optional parts)

          Ugh... now my head hurts.

          Bottom line: if null isn't a valid value to pass to setIndexOptions, then setIndexOptions should be doced as such and throw IllegalArguementException ... right? because otherwise it seems totally normal to say "I'm not indexing this field, therefor i'm specifying null IndexOptions"

          Speaking of omitNorms and Boost, I don't think we should set indexOptions to null, unless we do the same with these consistently.

          Hmmm .... i see what you're saying, and i agree they createField shouldn't set these on the Field object if !indexed, ala..

          protected Fieldable createField(String name, String val, Field.Store storage, Field.Index index,
                                            Field.TermVector vec, boolean omitNorms, IndexOptions options, float boost){
            Field f = new Field(name,
                                val,
                                storage,
                                index,
                                vec);
            if (index.isIndexed()) {
              f.setOmitNorms(omitNorms);
              f.setIndexOptions(options);
              f.setBoost(boost);
            }
            return f;
          }
          

          ...but i'm not sure it makes sense to make them nullable...

          In the boost case, it's not a property of the SchemaField, it comes at runtime when the Field instances are being created, and the client code (ie: The Request Handler adding docs) doesn't know (and isn't expected to check) whether a the SchemaField it's using specifies index=true/false.

          So even if we change boost to be a Boolean, your reasoning still holds that the if the field is !indexed, that boost shouldn't be used (null or otherwise).

          I suppose from a backcompat standpoint there's not much harm in changing the FieldType.createField args to make it Boolean instead of boolean (all the calling code should still work because of autoboxing) i'm just not sure i see much advantage.

          In the case of SchemaField.omitNorms – and all the term vector methods on SchemaField as well – those are simple accessors for the properties. It doesn't seem like a good idea to make them "smart" about indexed, they just report wether the property was or was not set when the SchemaField was created.

          The diff between SchemaField.omitNorms() and SchemaField.indexOptions() is that indexOptions isn't a direct accessor for any one property, it returns the effective IndexOptions object used based on two properties (which don't have their own accessors). So the question of whether indexOptions() should return null seems orthogonal to whether omitNorms() should return null. But it also makes me realize that for consistency we should really add explicit termFreqAndPosition() and termPosition() accessors to be consistent with omitNorms() ... which leaves us back deciding on how indexOptions() (as a "smart" accessor) should behave.

          Now my head really hurts.

          It seems like maybe the crux of the issue is really:

          • add termFreqAndPosition() and termPosition() methods to SchemaField to be consistent with the other explicit properties
          • add docs in AbstractField, SchemaField, and FieldType.createField on which things are meaningless if indexed=false – just putting "(value|behavior) is (meaningless|undefined) if #isIndexed() is false" in the javadocs on most of those get/set methods would be a big win for now)
          • better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be)
          • an IllegalArguementException on any method that takes an IndexOptions arg if that object is null
          • change FieldType.createField as described above to only call those Field setters "if indexed()"

          ...if we do that, then we can leave SchemaField.indexOptions() alone and it will be clear from the docs that the value returned is meaningless for non indexed fields.

          what do you think?

          Show
          Hoss Man added a comment - I don't think we should pass null to the lucene Field (its not a valid value for the enum). It's not a valid value on the enum, but it is seems like it should be a valid value indicating that no IndexOptions should be used (AbstractField.setIndexOptions doesn't seem to reject null as valid input) Either that or there should be an explicit IndexOptions.NONE value (which kind of makes me wonder why we have both FieldInfos.IndexOptions and Field.Index ... seems like maybe they should be one Class with setters for the optional parts) Ugh... now my head hurts. Bottom line: if null isn't a valid value to pass to setIndexOptions, then setIndexOptions should be doced as such and throw IllegalArguementException ... right? because otherwise it seems totally normal to say "I'm not indexing this field, therefor i'm specifying null IndexOptions" Speaking of omitNorms and Boost, I don't think we should set indexOptions to null, unless we do the same with these consistently. Hmmm .... i see what you're saying, and i agree they createField shouldn't set these on the Field object if !indexed, ala.. protected Fieldable createField( String name, String val, Field.Store storage, Field.Index index, Field.TermVector vec, boolean omitNorms, IndexOptions options, float boost){ Field f = new Field(name, val, storage, index, vec); if (index.isIndexed()) { f.setOmitNorms(omitNorms); f.setIndexOptions(options); f.setBoost(boost); } return f; } ...but i'm not sure it makes sense to make them nullable... In the boost case, it's not a property of the SchemaField, it comes at runtime when the Field instances are being created, and the client code (ie: The Request Handler adding docs) doesn't know (and isn't expected to check) whether a the SchemaField it's using specifies index=true/false. So even if we change boost to be a Boolean, your reasoning still holds that the if the field is !indexed, that boost shouldn't be used (null or otherwise). I suppose from a backcompat standpoint there's not much harm in changing the FieldType.createField args to make it Boolean instead of boolean (all the calling code should still work because of autoboxing) i'm just not sure i see much advantage. In the case of SchemaField.omitNorms – and all the term vector methods on SchemaField as well – those are simple accessors for the properties. It doesn't seem like a good idea to make them "smart" about indexed, they just report wether the property was or was not set when the SchemaField was created. The diff between SchemaField.omitNorms() and SchemaField.indexOptions() is that indexOptions isn't a direct accessor for any one property, it returns the effective IndexOptions object used based on two properties (which don't have their own accessors). So the question of whether indexOptions() should return null seems orthogonal to whether omitNorms() should return null. But it also makes me realize that for consistency we should really add explicit termFreqAndPosition() and termPosition() accessors to be consistent with omitNorms() ... which leaves us back deciding on how indexOptions() (as a "smart" accessor) should behave. Now my head really hurts. It seems like maybe the crux of the issue is really: add termFreqAndPosition() and termPosition() methods to SchemaField to be consistent with the other explicit properties add docs in AbstractField, SchemaField, and FieldType.createField on which things are meaningless if indexed=false – just putting "(value|behavior) is (meaningless|undefined) if #isIndexed() is false" in the javadocs on most of those get/set methods would be a big win for now) better docs on IndexOptions / AbstractField about null not being a valid value for IndexOptions (unless i've convinced you that it should be) an IllegalArguementException on any method that takes an IndexOptions arg if that object is null change FieldType.createField as described above to only call those Field setters "if indexed()" ...if we do that, then we can leave SchemaField.indexOptions() alone and it will be clear from the docs that the value returned is meaningless for non indexed fields. what do you think?
          Hide
          Robert Muir added a comment -

          I think this is ok, but:

          • I don't think we should pass null to the lucene Field (its not a valid value for the enum).
            e.g. the part that creates the field shouldn't do:
            f.setOmitNorms(omitNorms);
            f.setIndexOptions(options);
            f.setBoost(boost);
            but instead if (options != null) { f.setIndexOptions(...) }
          • Speaking of omitNorms and Boost, I don't think we should set indexOptions to null, unless we do the same with these consistently.
            So if indexOptions will be null in the non-indexed case, then boost should become Float and omitNorms Boolean, and these should be null too.
          Show
          Robert Muir added a comment - I think this is ok, but: I don't think we should pass null to the lucene Field (its not a valid value for the enum). e.g. the part that creates the field shouldn't do: f.setOmitNorms(omitNorms); f.setIndexOptions(options); f.setBoost(boost); but instead if (options != null) { f.setIndexOptions(...) } Speaking of omitNorms and Boost, I don't think we should set indexOptions to null, unless we do the same with these consistently. So if indexOptions will be null in the non-indexed case, then boost should become Float and omitNorms Boolean, and these should be null too.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development