Solr
  1. Solr
  2. SOLR-2348

No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless.

      1. SOLR-2348.patch
        25 kB
        Hoss Man
      2. SOLR-2348.patch
        11 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Hide
          Hoss Man added a comment -

          Committed revision 1071480. - 3x

          Show
          Hoss Man added a comment - Committed revision 1071480. - 3x
          Hide
          Hoss Man added a comment -

          Committed revision 1071459. - trunk

          working on 3x backporting now

          Show
          Hoss Man added a comment - Committed revision 1071459. - trunk working on 3x backporting now
          Hide
          Hoss Man added a comment -

          Updated patch that fixes the test failures.

          For the most part, this is fairly straight forward: tests that were abusing multiValued fields as if they were single valued.

          The one situation where i made a genuine code change was in AbstractSubTypeFieldType and the way it deals with the "subFieldType" attribute. When it's used, the registerPolyFieldDynamicPrototype function registers a new dynamic field based on the specified fieldType instance. I updated the properties used to generate these dynamicFields so that it explicitly specifies multiValued=false (it was already specifying indexed=true and stored=false)

          I could have just updated the test schemas so that the fieldType specified was already multiValued, but i think this makes more sense from a functional standpoint. the existing code already enabled a use case like this...

          <fieldType name="double" class="solr.TrieDoubleField" indexed="false" multiValued="false" ... />
          <fieldType name="xy" class="solr.PointType" dimension="2" subFieldType="double"/>
          

          ...so it makes sense that this should work equally well automatically...

          <fieldType name="double" class="solr.TrieDoubleField" indexed="true" multiValued="true" ... />
          <fieldType name="xy" class="solr.PointType" dimension="2" subFieldType="double"/>
          
          Show
          Hoss Man added a comment - Updated patch that fixes the test failures. For the most part, this is fairly straight forward: tests that were abusing multiValued fields as if they were single valued. The one situation where i made a genuine code change was in AbstractSubTypeFieldType and the way it deals with the "subFieldType" attribute. When it's used, the registerPolyFieldDynamicPrototype function registers a new dynamic field based on the specified fieldType instance. I updated the properties used to generate these dynamicFields so that it explicitly specifies multiValued=false (it was already specifying indexed=true and stored=false) I could have just updated the test schemas so that the fieldType specified was already multiValued, but i think this makes more sense from a functional standpoint. the existing code already enabled a use case like this... <fieldType name="double" class="solr.TrieDoubleField" indexed="false" multiValued="false" ... /> <fieldType name="xy" class="solr.PointType" dimension="2" subFieldType="double"/> ...so it makes sense that this should work equally well automatically... <fieldType name="double" class="solr.TrieDoubleField" indexed="true" multiValued="true" ... /> <fieldType name="xy" class="solr.PointType" dimension="2" subFieldType="double"/>
          Hide
          Robert Muir added a comment -

          Hoss, sorry to move it out, glad to see you are working on it!

          Show
          Robert Muir added a comment - Hoss, sorry to move it out, glad to see you are working on it!
          Hide
          Hoss Man added a comment -

          i'm actively working on this today .. moving back in line for 3.1

          Show
          Hoss Man added a comment - i'm actively working on this today .. moving back in line for 3.1
          Hide
          Hoss Man added a comment -

          patch with needed functionality, breaks some tests (most likely tests abusing multiValued field types)...

          hossman@bester:~/lucene/dev/solr$ grep -L "Failures: 0, Errors: 0" build/test-results/TEST-org.apache.solr.*
          build/test-results/TEST-org.apache.solr.schema.PolyFieldTest.txt
          build/test-results/TEST-org.apache.solr.search.function.distance.DistanceFunctionTest.txt
          build/test-results/TEST-org.apache.solr.search.function.SortByFunctionTest.txt
          build/test-results/TEST-org.apache.solr.search.QueryParsingTest.txt
          build/test-results/TEST-org.apache.solr.search.SpatialFilterTest.txt
          build/test-results/TEST-org.apache.solr.search.TestIndexSearcher.txt
          build/test-results/TEST-org.apache.solr.search.TestQueryTypes.txt
          
          Show
          Hoss Man added a comment - patch with needed functionality, breaks some tests (most likely tests abusing multiValued field types)... hossman@bester:~/lucene/dev/solr$ grep -L "Failures: 0, Errors: 0" build/test-results/TEST-org.apache.solr.* build/test-results/TEST-org.apache.solr.schema.PolyFieldTest.txt build/test-results/TEST-org.apache.solr.search.function.distance.DistanceFunctionTest.txt build/test-results/TEST-org.apache.solr.search.function.SortByFunctionTest.txt build/test-results/TEST-org.apache.solr.search.QueryParsingTest.txt build/test-results/TEST-org.apache.solr.search.SpatialFilterTest.txt build/test-results/TEST-org.apache.solr.search.TestIndexSearcher.txt build/test-results/TEST-org.apache.solr.search.TestQueryTypes.txt
          Hide
          Robert Muir added a comment -

          moving to 3.2

          Show
          Robert Muir added a comment - moving to 3.2
          Hide
          Robert Muir added a comment -

          There isn't any patch here yet, can we move out to 3.2?

          Show
          Robert Muir added a comment - There isn't any patch here yet, can we move out to 3.2?
          Hide
          Hoss Man added a comment -

          My hope had been that this would be really straightforward, and a simple inspection of the SchemaField (or FieldType) could be done inside the FieldCacheSource to cover all cases – except that FieldCacheSource (and it's subclasses) is only ever given a field name, and never gets a copy of the FieldType, SchemaField of even the IndexSchema to inspect to ensure that using the FieldCache is viable.

          This means that we have to take the same basic approach as SOLR-2339 - every FieldType impl that utilizes a FieldCacheSource in it's getValueSource method needs to check if FieldCache is viable for that field (ie: indexed, not multivalued)

          We could rename the checkSortability method I just added in SOLR-2339 into a "checkFieldCachibility" type method and use it for both purposes, but:

          • it currently throws exceptions with specific messages like "can not sort on unindexed field: "
          • I seem to recall some folks talking about the idea of expanding FieldCache to support more things like UninvertedField does, in which case being multivalued won't prevent you from using the FieldCache on a field which would ultimately mean the pre-conditions for using a FieldCacheSource would change. We could imagine the user specifing a function that takes in vector args to use to collapse the multiple values per doc on a per usage basis (ie: in this function query case, use the max value of the multiple values for each doc; in this function query, use the average value of the multiple values for each doc; etc...)

          with that in mind, i think for now the most straight forward thing to do is to add a "checkFieldCacheSource(QParser)" method to SchemaField that would be cut/paste of checkSortability (with the error message wording changed) and make all of the (applicable) FieldType.getValueSource methods call it. In the future, it could evolve differently then checkSortability – either removing the "!multivalued" assertion completley, or introspecting the Qparser context in some way to determine that neccessary info has been provided to know how to use the (hypothetical) multivalued FieldCache (hard t ospeculate at this point)

          Show
          Hoss Man added a comment - My hope had been that this would be really straightforward, and a simple inspection of the SchemaField (or FieldType) could be done inside the FieldCacheSource to cover all cases – except that FieldCacheSource (and it's subclasses) is only ever given a field name, and never gets a copy of the FieldType, SchemaField of even the IndexSchema to inspect to ensure that using the FieldCache is viable. This means that we have to take the same basic approach as SOLR-2339 - every FieldType impl that utilizes a FieldCacheSource in it's getValueSource method needs to check if FieldCache is viable for that field (ie: indexed, not multivalued) We could rename the checkSortability method I just added in SOLR-2339 into a "checkFieldCachibility" type method and use it for both purposes, but: it currently throws exceptions with specific messages like "can not sort on unindexed field: " I seem to recall some folks talking about the idea of expanding FieldCache to support more things like UninvertedField does, in which case being multivalued won't prevent you from using the FieldCache on a field which would ultimately mean the pre-conditions for using a FieldCacheSource would change. We could imagine the user specifing a function that takes in vector args to use to collapse the multiple values per doc on a per usage basis (ie: in this function query case, use the max value of the multiple values for each doc; in this function query, use the average value of the multiple values for each doc; etc...) with that in mind, i think for now the most straight forward thing to do is to add a "checkFieldCacheSource(QParser)" method to SchemaField that would be cut/paste of checkSortability (with the error message wording changed) and make all of the (applicable) FieldType.getValueSource methods call it. In the future, it could evolve differently then checkSortability – either removing the "!multivalued" assertion completley, or introspecting the Qparser context in some way to determine that neccessary info has been provided to know how to use the (hypothetical) multivalued FieldCache (hard t ospeculate at this point)

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development