Solr
  1. Solr
  2. SOLR-2339

No error reported when sorting on a multiValued field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      In the past, Solr has relied on the underlying FieldCache to throw an error in situations where sorting on a field was not possible. however LUCENE-2142 has changed this, so that FieldCache never throws an error.

      In order to maintain the functionality of past Solr releases (ie: error when users attempt to sort on a field that we known will produce meaningless results) we should add some sort of check at the Solr level.

      1. SOLR-2339.patch
        56 kB
        Hoss Man
      2. SOLR-2339.patch
        51 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          I'm not sure the best way to go about this ... we could reimplement the basic termcount vs doccount check that use to be done at a higher level before using the FieldCache – but that may be overkill and i'm not convinced we need to behave in exactly the same way as before.

          At a minimum we should probably start logging a warning if someone does a sort on a field that is configured to be multiValued=true.

          Alternately: we could consider adding a property to the StringIndex and DocTermsIndex classes (on their respective branches) so that FieldCache could indicate when it suspects there may be a problem (using simple doccount < termcount, or something more complex if it's easy), and Solr could error/log if that property is set.

          Show
          Hoss Man added a comment - I'm not sure the best way to go about this ... we could reimplement the basic termcount vs doccount check that use to be done at a higher level before using the FieldCache – but that may be overkill and i'm not convinced we need to behave in exactly the same way as before. At a minimum we should probably start logging a warning if someone does a sort on a field that is configured to be multiValued=true. Alternately: we could consider adding a property to the StringIndex and DocTermsIndex classes (on their respective branches) so that FieldCache could indicate when it suspects there may be a problem (using simple doccount < termcount, or something more complex if it's easy), and Solr could error/log if that property is set.
          Hide
          Hoss Man added a comment -

          yonik pointed out on IRC that the main motivation behind LUCENE-2142 could easily apply to solr users as well...

          So this means... you could happily think all is good, until some day, well into production, once you've updated enough docs, suddenly the check will catch you and throw an unhandled exception, stopping all searches [that need to sort by this string field] in their tracks. It's not gracefully degrading.

          ...arguing that for solr users as well: the loss of this particular check is a feature and not a bug.

          i feel like the user bases and use cases are different enough that the "default" behavior should be different – but not enough to fight strongly for it.

          if anyone else has an opinion, please comment – for now i'm going to just tackle the case where there is not doubt you have a problem: sorting on a multiValued field

          Show
          Hoss Man added a comment - yonik pointed out on IRC that the main motivation behind LUCENE-2142 could easily apply to solr users as well... So this means... you could happily think all is good, until some day, well into production, once you've updated enough docs, suddenly the check will catch you and throw an unhandled exception, stopping all searches [that need to sort by this string field] in their tracks. It's not gracefully degrading. ...arguing that for solr users as well: the loss of this particular check is a feature and not a bug. i feel like the user bases and use cases are different enough that the "default" behavior should be different – but not enough to fight strongly for it. if anyone else has an opinion, please comment – for now i'm going to just tackle the case where there is not doubt you have a problem: sorting on a multiValued field
          Hide
          Hoss Man added a comment -

          patch that forces failure for multivalued fields, along with an example of how i think we should call this behavior change out in CHANGES.txt

          this uncovered a lot of tests that were missusing fields declared as multivalued – BasicDistributedZkTest and TestDistributedSearch are still failing in this patch, largely because i'm so confused about how they work it's not even clear to me what schema.xml file they are using to try and fix it.

          Show
          Hoss Man added a comment - patch that forces failure for multivalued fields, along with an example of how i think we should call this behavior change out in CHANGES.txt this uncovered a lot of tests that were missusing fields declared as multivalued – BasicDistributedZkTest and TestDistributedSearch are still failing in this patch, largely because i'm so confused about how they work it's not even clear to me what schema.xml file they are using to try and fix it.
          Hide
          Hoss Man added a comment -

          I should have mentioned when i attached the patch: the approach i took here was to add a "void checkSortability()" method to SchemaField that tests isMultivalued – i also moved the "is this field indexed?" check here (it use to be part of QueryParsing.parseSortSpec) and changed all the out of hte box FieldTypes to call this method in their getSortField implementation – this way we have acommon (documented) hook point where we can add future helper tests for things that don't make sense of in sorting, but it's still up to the FieldType to decide what to do if someone wants to make a really exotic one (RandomSortField for example doesn't even care if it's indexed).

          If anyone has any better suggestions please chime in.

          Show
          Hoss Man added a comment - I should have mentioned when i attached the patch: the approach i took here was to add a "void checkSortability()" method to SchemaField that tests isMultivalued – i also moved the "is this field indexed?" check here (it use to be part of QueryParsing.parseSortSpec) and changed all the out of hte box FieldTypes to call this method in their getSortField implementation – this way we have acommon (documented) hook point where we can add future helper tests for things that don't make sense of in sorting, but it's still up to the FieldType to decide what to do if someone wants to make a really exotic one (RandomSortField for example doesn't even care if it's indexed). If anyone has any better suggestions please chime in.
          Hide
          Hoss Man added a comment -

          It also occurs to me that we should probably put this same check in for dealing with ValueSources that are FieldCache based ... not in ValueSource.getSortField, but in FieldType.getValueSource for the affected FieldTypes.

          Show
          Hoss Man added a comment - It also occurs to me that we should probably put this same check in for dealing with ValueSources that are FieldCache based ... not in ValueSource.getSortField, but in FieldType.getValueSource for the affected FieldTypes.
          Hide
          Hoss Man added a comment -

          Updated patch that fixes the remaining sort related test failures.

          TestDistributedSearch and BasicDistributedZkTest were fundementally flawed in that they expected to get deterministic sort orderings on fields that they were deliberately putting multiple values into (ie: not just sorting on a multivalued field; sorting on a multivalued field that was actually used to store multiple values for each doc)

          I'm amazed they ever passed.

          Show
          Hoss Man added a comment - Updated patch that fixes the remaining sort related test failures. TestDistributedSearch and BasicDistributedZkTest were fundementally flawed in that they expected to get deterministic sort orderings on fields that they were deliberately putting multiple values into (ie: not just sorting on a multivalued field; sorting on a multivalued field that was actually used to store multiple values for each doc) I'm amazed they ever passed.
          Hide
          Hoss Man added a comment -

          i spun of the ValueSource issue into SOLR-2348

          Show
          Hoss Man added a comment - i spun of the ValueSource issue into SOLR-2348
          Hide
          Hoss Man added a comment -

          narrowing issue summary to be specific about multiValued fields since that's all we've decided to tackle for now.

          Show
          Hoss Man added a comment - narrowing issue summary to be specific about multiValued fields since that's all we've decided to tackle for now.
          Hide
          Hoss Man added a comment -

          Committed revision 1067030. - trunk

          Still working on the backport to 3x (this is hte first time svn has ever given me a "Tree conflict" for a file)

          Show
          Hoss Man added a comment - Committed revision 1067030. - trunk Still working on the backport to 3x (this is hte first time svn has ever given me a "Tree conflict" for a file)
          Hide
          Hoss Man added a comment -

          Committed revision 1067042.

          backported to 3x

          Show
          Hoss Man added a comment - Committed revision 1067042. backported to 3x
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Hide
          Bill Bell added a comment -

          Guys,

          How are we going to support sorting on multiValued fields?

          Would a function work for this?

          Show
          Bill Bell added a comment - Guys, How are we going to support sorting on multiValued fields? Would a function work for this?

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development