Solr
  1. Solr
  2. SOLR-6168

enhance collapse QParser so that "group head" documents can be selected by more complex sort options

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.7.1, 4.8.1
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      The fundemental goal of this issue is add additional support to the CollapseQParser so that as an alternative to the existing min/max localparam options, more robust sort syntax can be used to sort on multiple criteria when selecting the "group head" documents used to represent each collapsed group.

      Since support for arbitrary, multi-clause, sorting is almost certainly going to require more RAM then the existing min/max functionaly, this new functionality should be in addition to the existing min/max localparam implementation, not a replacement of it.

      (NOTE: early comments made in this jira may be confusing in historical context due to the way this issue was originally filed as a bug report)

      1. CollapsingQParserPlugin-6168.patch.1-1stcut
        84 kB
        Umesh Prasad
      2. SOLR-6168.patch
        94 kB
        Hoss Man
      3. SOLR-6168.patch
        91 kB
        Hoss Man
      4. SOLR-6168.patch
        45 kB
        Hoss Man
      5. SOLR-6168.patch
        33 kB
        Hoss Man
      6. SOLR-6168.patch
        21 kB
        Hoss Man
      7. SOLR-6168-group-head-inconsistent-with-sort.patch
        2 kB
        Umesh Prasad

        Issue Links

          Activity

          Hide
          Umesh Prasad added a comment - - edited

          Test Case / Patch attached for broken sort.

          Show
          Umesh Prasad added a comment - - edited Test Case / Patch attached for broken sort.
          Hide
          Joel Bernstein added a comment - - edited

          Hi Umesh,

          Thanks for reporting the bug and providing the test case. It looks like the test is failing because the collapsing is choosing document 5 rather then document 6 as the group head. So it's not a sorting issue, it's a collapsing issue. Because the collapsing criteria is the test_tl field, which is a tie for documents 5 and 6, you'll need to use a function query as the collapse criteria to break the tie.

          Something like this should do the trick:

           params.add("cf", "sum(field(test_tl),field(id))") 
           params.add("fq", "{!collapse field=group_s max=$cf)}");
          
          Show
          Joel Bernstein added a comment - - edited Hi Umesh, Thanks for reporting the bug and providing the test case. It looks like the test is failing because the collapsing is choosing document 5 rather then document 6 as the group head. So it's not a sorting issue, it's a collapsing issue. Because the collapsing criteria is the test_tl field, which is a tie for documents 5 and 6, you'll need to use a function query as the collapse criteria to break the tie. Something like this should do the trick: params.add( "cf" , "sum(field(test_tl),field(id))" ) params.add( "fq" , "{!collapse field=group_s max=$cf)}" );
          Hide
          Umesh Prasad added a comment -

          Hi Joel,
          Actually the issue is with Collapsing picking up group head. which is inconsistent with original grouping implementation. (manually verified). We have been using solr's grouping specification for collapsing our search results and it picks up the group head which would have come first as per sorting.
          The trick to combine fields with sum/exponential etc will not work if field1 and field2's range differ drastically. Which is true for my case.

          PS :
          1.The original test case is incorrect. I am replacing it with another test case patch.
          2. I have made progress by creating array of FieldValueCollapse from sort/min/max spec. Trying to get all test cases to work. Will attach it once ready.

          Show
          Umesh Prasad added a comment - Hi Joel, Actually the issue is with Collapsing picking up group head. which is inconsistent with original grouping implementation. (manually verified). We have been using solr's grouping specification for collapsing our search results and it picks up the group head which would have come first as per sorting. The trick to combine fields with sum/exponential etc will not work if field1 and field2's range differ drastically. Which is true for my case. PS : 1.The original test case is incorrect. I am replacing it with another test case patch. 2. I have made progress by creating array of FieldValueCollapse from sort/min/max spec. Trying to get all test cases to work. Will attach it once ready.
          Hide
          Umesh Prasad added a comment -

          Test case that demonstrates that group head picked by collapsing qparser is inconsistent with sort spec

          Show
          Umesh Prasad added a comment - Test case that demonstrates that group head picked by collapsing qparser is inconsistent with sort spec
          Hide
          Joel Bernstein added a comment -

          Umesh,

          The group head selection is independent of the sort criteria with the CollapsingQParserPlugin. So it's working as designed. Let's close this ticket out and move the discussion to the users list. I'd like to look closely at your use case and see if we can come up with a way to make it work with how the CollapsingQParserPlugin currently operates.

          Show
          Joel Bernstein added a comment - Umesh, The group head selection is independent of the sort criteria with the CollapsingQParserPlugin. So it's working as designed. Let's close this ticket out and move the discussion to the users list. I'd like to look closely at your use case and see if we can come up with a way to make it work with how the CollapsingQParserPlugin currently operates.
          Hide
          Joel Bernstein added a comment -

          Let's also consider how the recip() function query might be useful in a situation where the ranges differ drastically as you describe.

          for example:

          fq={!collapse field=x max='sub(prod(field(y),10), recip(field(x),1,1000,1000))'}
          
          Show
          Joel Bernstein added a comment - Let's also consider how the recip() function query might be useful in a situation where the ranges differ drastically as you describe. for example: fq={!collapse field=x max='sub(prod(field(y),10), recip(field(x),1,1000,1000))'}
          Hide
          Umesh Prasad added a comment - - edited

          Hi Joel,
          Sure. we can take this forward on mailing list.

          To be explicit :
          1. min/max param is equivalent to group.sort (both pick up the group heads)
          2. group.sort defaults to sort if not specified. (standard collapsing)
          3. min/max defaults to score if not specified (CollapsingQParser)
          4. sort defaults to score if not specified and grouping is on ( as per https://cwiki.apache.org/confluence/display/solr/Result+Grouping)

          As noted in 2 vs 3, the defaults are different. This is a very important difference.

          So something like sort = seller_rank asc, ratings desc, function_query desc, popularity desc, score desc might end up giving completely different results.

          Show
          Umesh Prasad added a comment - - edited Hi Joel, Sure. we can take this forward on mailing list. To be explicit : 1. min/max param is equivalent to group.sort (both pick up the group heads) 2. group.sort defaults to sort if not specified. (standard collapsing) 3. min/max defaults to score if not specified (CollapsingQParser) 4. sort defaults to score if not specified and grouping is on ( as per https://cwiki.apache.org/confluence/display/solr/Result+Grouping ) As noted in 2 vs 3, the defaults are different. This is a very important difference. So something like sort = seller_rank asc, ratings desc, function_query desc, popularity desc, score desc might end up giving completely different results.
          Hide
          Umesh Prasad added a comment - - edited

          Also, under the hood, FieldValueCollapse actually uses a comparator for collapsing, which reads its value from a single source. (score/field/function). If collapsing comparator is composed out of multiple sub comparators (taken from sort spec), then min/max can default to sort field itself.
          This will avoid translating sort, which is essentially a series of simple field/function based comparator, to a single valued consistent function. (Not an easy task). Plus comparator based sort is more intuitive.

          Show
          Umesh Prasad added a comment - - edited Also, under the hood, FieldValueCollapse actually uses a comparator for collapsing, which reads its value from a single source. (score/field/function). If collapsing comparator is composed out of multiple sub comparators (taken from sort spec), then min/max can default to sort field itself. This will avoid translating sort, which is essentially a series of simple field/function based comparator, to a single valued consistent function. (Not an easy task). Plus comparator based sort is more intuitive.
          Hide
          Umesh Prasad added a comment -

          Okay .. Here is my first attempt at this .. I am attaching this for comments if I am going in right direction

          Changes
          1. Added two more implementations of FieldValueCollapse
          ScoreValueCollapse for collapsing on score
          StringValueCollapse for collapsing on global string ords.
          2. Implemented CompositeValueCollapse,
          It gets sorts params and creates an array of FieldValueCollapse , which it calls in sequence.
          3. collapse, returns NEXT_ACTION which can take ALIGN, BREAK or CONTINUE and is used by CompositiveValueCollapse to enable stable sort.
          4. Added updateOrd(int ord, int contextDoc, int globalDoc) ,so that composite collapse can use to update its constituent ords.

          The test cases pass mostly. However code is quite hacked as of now and there are no tests for testing sorting on string. Sharing of ords[]/scores[] between the different instances of FieldValueCollapse breaks encapsulation. I think FieldValueCollapse can be better replaced with CollapasingComparator in line of FieldComparator ..

          Show
          Umesh Prasad added a comment - Okay .. Here is my first attempt at this .. I am attaching this for comments if I am going in right direction Changes 1. Added two more implementations of FieldValueCollapse ScoreValueCollapse for collapsing on score StringValueCollapse for collapsing on global string ords. 2. Implemented CompositeValueCollapse, It gets sorts params and creates an array of FieldValueCollapse , which it calls in sequence. 3. collapse, returns NEXT_ACTION which can take ALIGN, BREAK or CONTINUE and is used by CompositiveValueCollapse to enable stable sort. 4. Added updateOrd(int ord, int contextDoc, int globalDoc) ,so that composite collapse can use to update its constituent ords. The test cases pass mostly. However code is quite hacked as of now and there are no tests for testing sorting on string. Sharing of ords[]/scores[] between the different instances of FieldValueCollapse breaks encapsulation. I think FieldValueCollapse can be better replaced with CollapasingComparator in line of FieldComparator ..
          Hide
          Hoss Man added a comment -

          I was recently asked about this issue, and when i initially started digging into it got more and more confused.

          It seems that fundementally, what happened here is that Umesh initially filled a bug regarding the way the collapse QParser selects the "group head" – but this bug report was based on a missunderstanding about what default behavior of CollapseQParser is when dealing with a sort param (as compared to the older GroupingCOmponent).

          There was some key discussiong about this issue on the solr-user mailing list, which did not result in updating the summary/description of this issue, followed by Umesh attaching a patch ettempting to implement some changes in behavior.

          I have some thoughts on Umesh's approach, and my own suggestions, but before I get into that i want to make sure the situation is accurately represented in this Jira


          First off, some key discussion from the solr-user mailing list circa June 2014 that should really be captured directly in this issue.

          In particular these comments from Joel...

          So, the question is what is the cost (performance and memory) of having the
          CollapsingQParserPlugin choose the group head by using the Solr sort
          criteria?

          Keep in mind that the CollapsingQParserPlugin's main design goal is to
          provide fast performance when collapsing on a high cardinality field. How
          you choose the group head can have a big impact here, both on memory
          consumption performance.

          The function query collapse criteria was added to allow you to come up with
          custom formulas for selecting the group head, with little or no impact on
          performance and memory. Using Solr's recip() function query it seems like
          you could come up with some nice scenarios where two variables could be
          used to select the group head. For example:

          ...

          And this respons from Umesh...

          ...

          I agree 200 MB per request just for collapsing the search results is huge
          but at least it increases linearly with number of sort fields.. For my use
          case, I am willing to pay the linear cost specially when I can't combine
          the sort fields intelligently into a sort function. Plus it allows me to
          sort by String/Text fields also which is a big win.

          ...


          Based on the total comments regarding this issue, including the email discussion, i've revised the summary & description to make it clear:

          • this is a feature request
          • that the goal is to expand the options available to users of the collapse QParser by allowing "group head" documents to be selected by more complex sort options
          Show
          Hoss Man added a comment - I was recently asked about this issue, and when i initially started digging into it got more and more confused. It seems that fundementally, what happened here is that Umesh initially filled a bug regarding the way the collapse QParser selects the "group head" – but this bug report was based on a missunderstanding about what default behavior of CollapseQParser is when dealing with a sort param (as compared to the older GroupingCOmponent). There was some key discussiong about this issue on the solr-user mailing list, which did not result in updating the summary/description of this issue, followed by Umesh attaching a patch ettempting to implement some changes in behavior. I have some thoughts on Umesh's approach, and my own suggestions, but before I get into that i want to make sure the situation is accurately represented in this Jira First off, some key discussion from the solr-user mailing list circa June 2014 that should really be captured directly in this issue. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAJc64EXgnPn-RiqgUYn=S_Wn5wPZsvtirEHP_nctZ-AFa=AxEw@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAE4tqLP-jqBjrWB0Yr2vNs8J15qW8BwVK61hZOG=__EjFpJJgQ@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3CCAJc64EVQP=aSa6OfDSvPUdcOEA+-mO1USmLNfAFJgP4OeVbdSQ@mail.gmail.com%3E In particular these comments from Joel... So, the question is what is the cost (performance and memory) of having the CollapsingQParserPlugin choose the group head by using the Solr sort criteria? Keep in mind that the CollapsingQParserPlugin's main design goal is to provide fast performance when collapsing on a high cardinality field. How you choose the group head can have a big impact here, both on memory consumption performance. The function query collapse criteria was added to allow you to come up with custom formulas for selecting the group head, with little or no impact on performance and memory. Using Solr's recip() function query it seems like you could come up with some nice scenarios where two variables could be used to select the group head. For example: ... And this respons from Umesh... ... I agree 200 MB per request just for collapsing the search results is huge but at least it increases linearly with number of sort fields.. For my use case, I am willing to pay the linear cost specially when I can't combine the sort fields intelligently into a sort function. Plus it allows me to sort by String/Text fields also which is a big win. ... Based on the total comments regarding this issue, including the email discussion, i've revised the summary & description to make it clear: this is a feature request that the goal is to expand the options available to users of the collapse QParser by allowing "group head" documents to be selected by more complex sort options
          Hide
          Hoss Man added a comment -

          Now, in the context of this being a feature request, some comments based on my notes from the last few days...


          I have some concerns about hte appraoch taken in Umesh's patch (NOTE: patch appears to have been generated in reverse)...

          • Instead of introducing a new option for letting users pick arbitrary sort criteria to select the group heads, it only allows using the top level 'sort' param for this (if the min/max local params are not specified)
            • This means that it breaks backcompat for existing users who use the collapse QParsers current default behavior of "pick groupHead having highest score" if they are alreayd using some other sort.
              • This is evident in some existing tests that the patch changes
            • This also means that it doesn't give users any (new) method of picking the group heads using a complex sort different from the sort that is applied after the collapse – so if you want the collapsed docs sorted by X, your only options for picking the groupHeads is to use X, or to use the min/max of a single field like today
          • Rather then re-using any of the existing document sort param parsing logic, or internal Sorting code from Solr/Lucene, Umesh's patch adds it's own (limited) parsing of the 'sort' param, to build up a data structure wrapped arround the existing simplistic min/max value groupHead selector logic
            • this means a lot of new hairy code
            • this also means Sorts with non-trivial clauses won't work at all, examples:
              • function clauses
              • fields that use sortMissingFirst or sortMissingLast
              • custom FieldType implementations that define their own sort logic.

          After reviewing Umesh's patch, and thinking about it a bit, I spent some time the past few days working up a straw man for an alternate approach that adds a new sort "localparam" and leverages the existing Lucene/Solr Sorting code to parse it and process it. The idea being you could do something like this to select the group head docs based on a complex sort which is independent of the final sort used for the collapsed docs...

          {!collapse field=manu_id_s sort='price asc, popularity desc%'}
          

          ...but you can still trivially use the same sort for both (like in Umesh's patch) by refrencing the top level sort as a variable...

          {!collapse field=manu_id_s sort=$sort}
          

          The attached patch has the bare bones beginings of this approach, along with a simple test (addapted from Umesh's original patch). It only supports collapsing on String fields at the moment, but the sort you choose to collapse on can be (almost) anything – So both of the above examples will/should work as expected using the "techproducts" example if you apply this patch to trunk...

          From what i can tell, this general approach should still work fairly consisntly when collapsing on non-string field types, it's just going to require some (seemingly) straight forward new code & refactoring. (and of course lots of tests). There's also several nocommits in the patch related to some refactoring that i think will be neccessary to address some edge cases issues (notably arround using "cscore()" inside of an arbitrary sort expression), but i'm pretty confident that the general appraoch is sound.

          I plan to continue working on this approach over the next few weeks, focusing first on more test cases – but I wanted to get this initial patch out there for discussion in case anyone had strong opinions about it, or sees any fundemental flaws with the design.

          Show
          Hoss Man added a comment - Now, in the context of this being a feature request, some comments based on my notes from the last few days... I have some concerns about hte appraoch taken in Umesh's patch (NOTE: patch appears to have been generated in reverse)... Instead of introducing a new option for letting users pick arbitrary sort criteria to select the group heads, it only allows using the top level 'sort' param for this (if the min/max local params are not specified) This means that it breaks backcompat for existing users who use the collapse QParsers current default behavior of "pick groupHead having highest score" if they are alreayd using some other sort. This is evident in some existing tests that the patch changes This also means that it doesn't give users any (new) method of picking the group heads using a complex sort different from the sort that is applied after the collapse – so if you want the collapsed docs sorted by X, your only options for picking the groupHeads is to use X, or to use the min/max of a single field like today Rather then re-using any of the existing document sort param parsing logic, or internal Sorting code from Solr/Lucene, Umesh's patch adds it's own (limited) parsing of the 'sort' param, to build up a data structure wrapped arround the existing simplistic min/max value groupHead selector logic this means a lot of new hairy code this also means Sorts with non-trivial clauses won't work at all, examples: function clauses fields that use sortMissingFirst or sortMissingLast custom FieldType implementations that define their own sort logic. After reviewing Umesh's patch, and thinking about it a bit, I spent some time the past few days working up a straw man for an alternate approach that adds a new sort "localparam" and leverages the existing Lucene/Solr Sorting code to parse it and process it. The idea being you could do something like this to select the group head docs based on a complex sort which is independent of the final sort used for the collapsed docs... {!collapse field=manu_id_s sort='price asc, popularity desc%'} ...but you can still trivially use the same sort for both (like in Umesh's patch) by refrencing the top level sort as a variable... {!collapse field=manu_id_s sort=$sort} The attached patch has the bare bones beginings of this approach, along with a simple test (addapted from Umesh's original patch). It only supports collapsing on String fields at the moment, but the sort you choose to collapse on can be (almost) anything – So both of the above examples will/should work as expected using the "techproducts" example if you apply this patch to trunk... http://localhost:8983/solr/techproducts/select?fl=id,manu_id_s,price,weight,popularity&q=*:*&sort=weight+asc&fq=%7b!collapse%20field=manu_id_s%20sort=%27price%20asc,%20popularity%20desc%27%7d http://localhost:8983/solr/techproducts/select?fl=id,manu_id_s,weight&q=*:*&sort=weight+asc&fq=%7b!collapse%20field=manu_id_s%20sort=$sort%7d From what i can tell, this general approach should still work fairly consisntly when collapsing on non-string field types, it's just going to require some (seemingly) straight forward new code & refactoring. (and of course lots of tests). There's also several nocommits in the patch related to some refactoring that i think will be neccessary to address some edge cases issues (notably arround using "cscore()" inside of an arbitrary sort expression), but i'm pretty confident that the general appraoch is sound. I plan to continue working on this approach over the next few weeks, focusing first on more test cases – but I wanted to get this initial patch out there for discussion in case anyone had strong opinions about it, or sees any fundemental flaws with the design.
          Hide
          Hoss Man added a comment -

          Updated patch.

          No major functionality additions/improvements, just new tests – notably some new randomized testing that uncovered LUCENE-6808, so this patch actaully includes the patch I posted there in it's entirety for easy testing.

          Show
          Hoss Man added a comment - Updated patch. No major functionality additions/improvements, just new tests – notably some new randomized testing that uncovered LUCENE-6808 , so this patch actaully includes the patch I posted there in it's entirety for easy testing.
          Hide
          Hoss Man added a comment - - edited

          EDIT: Deleted comment that was ment for LUCENE-6808 (and ultimately posted there correctly)

          Show
          Hoss Man added a comment - - edited EDIT: Deleted comment that was ment for LUCENE-6808 (and ultimately posted there correctly)
          Hide
          Hoss Man added a comment -

          Updated patch...

          • collapsing on int & float fields now works
            • NOTE: randomized testing of collapsing on docValues float fields is disabled with nocommit
            • SOLR-8082 prevents the verification queries from matching any docs when quering the float field
          • more testing of various nullPolicies and initial "size" localparams
          • more test coverage in TestCollapseQParserPlugin
            • some of this new testing is explicitly of the new sort local param
            • some general tweaking of testNumericCollapse & testStringCollapse to test all of the specified field names every time, not just 1 at random (only fractions of a second slower then before
          Show
          Hoss Man added a comment - Updated patch... collapsing on int & float fields now works NOTE: randomized testing of collapsing on docValues float fields is disabled with nocommit SOLR-8082 prevents the verification queries from matching any docs when quering the float field more testing of various nullPolicies and initial "size" localparams more test coverage in TestCollapseQParserPlugin some of this new testing is explicitly of the new sort local param some general tweaking of testNumericCollapse & testStringCollapse to test all of the specified field names every time, not just 1 at random (only fractions of a second slower then before
          Hide
          David Boychuck added a comment -

          Does it make sense to include SOLR-6345 in this work?

          Show
          David Boychuck added a comment - Does it make sense to include SOLR-6345 in this work?
          Hide
          Hoss Man added a comment -

          Does it make sense to include SOLR-6345 in this work?

          I don't think so?

          1. Nothing about this improvement relates to faceting or tagging/excluding, nor should any of the changes impact any existing code related to the faceting/tagging/excluding (unless i'm missing something, that's all external to the implemenation of the CollapseFilter)
          2. SOLR-6345 doesn't even seem to even have an identified root cause, so trying to fold in a fix for that with this issue seems like it would just unneccessarily stall this improvement.

          Updated patch...

          • added randomized test coverage of nullPolicy + sort localparam
          • updated randomized test with a (verification) work around for SOLR-8082
          • added testing of sort param with elevation componanet
          • added error checking + tests for tying to use more then one of min/max/sort localparams
          • made all of CollapsingQParserPlugin's nested classes static
            • not specific to this issue, but a worthwhile code cleanup – there was no reason for them not to be static before and it was really bugging me while reading/debugging
          • created a new GroupHeadSelector class to encapsulate min vs max vs sort options
            • refactored existing code to pass down GroupHeadSelector instances instead of "String min|max" and "boolean max" args
          • cleanedup & refactored the needsScores init logic to use existing helper methods in ReturnFields & SortSpec and fixed it to ensure the IndexSearcher flag would always be set
            • solves the case of a 'sort' local parmam needing scores even nothing about top level request does (ie: no scores used in top sort or fl)
          • refactored existing "cscore" init logic into a helper method that inspects GroupHeadSelector

          ...that last bit of refactoring was done with the intention of re-using the cscore method/logic to suport (group head selecting) Sorts that included a function which include/wrap "cscore()" ... but the basic tests i'd tried to write to cover this case didn't work, and the more i looked into it and thought about it i realized this is actualy very tricky. The way "Sort" objects are rewritten relative to a Searcher delegates the rewriting to each individual SortField. If a SortField is rewritable, then it internally does it's rewriting and constructs a new clean (Map) context in the individual SortField clauses. Even if we wanted to keep track of all the maps from every SortField and put the same CollapseScore object in them, there's no uniform/generic way to access those context Maps from an arbitrary SortField. I don't have any clean sollutions in mind, so for now i've punted on this and made the code throw an explicit error if you try to use "cscore()" with the sort local param.

          I think this patch is pretty much ready to go, but i want to sleep on it and give it at least one more full read through before committing.

          Show
          Hoss Man added a comment - Does it make sense to include SOLR-6345 in this work? I don't think so? Nothing about this improvement relates to faceting or tagging/excluding, nor should any of the changes impact any existing code related to the faceting/tagging/excluding (unless i'm missing something, that's all external to the implemenation of the CollapseFilter) SOLR-6345 doesn't even seem to even have an identified root cause, so trying to fold in a fix for that with this issue seems like it would just unneccessarily stall this improvement. Updated patch... added randomized test coverage of nullPolicy + sort localparam updated randomized test with a (verification) work around for SOLR-8082 added testing of sort param with elevation componanet added error checking + tests for tying to use more then one of min/max/sort localparams made all of CollapsingQParserPlugin's nested classes static not specific to this issue, but a worthwhile code cleanup – there was no reason for them not to be static before and it was really bugging me while reading/debugging created a new GroupHeadSelector class to encapsulate min vs max vs sort options refactored existing code to pass down GroupHeadSelector instances instead of "String min|max" and "boolean max" args cleanedup & refactored the needsScores init logic to use existing helper methods in ReturnFields & SortSpec and fixed it to ensure the IndexSearcher flag would always be set solves the case of a 'sort' local parmam needing scores even nothing about top level request does (ie: no scores used in top sort or fl) refactored existing "cscore" init logic into a helper method that inspects GroupHeadSelector ...that last bit of refactoring was done with the intention of re-using the cscore method/logic to suport (group head selecting) Sorts that included a function which include/wrap "cscore()" ... but the basic tests i'd tried to write to cover this case didn't work, and the more i looked into it and thought about it i realized this is actualy very tricky. The way "Sort" objects are rewritten relative to a Searcher delegates the rewriting to each individual SortField. If a SortField is rewritable, then it internally does it's rewriting and constructs a new clean (Map) context in the individual SortField clauses. Even if we wanted to keep track of all the maps from every SortField and put the same CollapseScore object in them, there's no uniform/generic way to access those context Maps from an arbitrary SortField. I don't have any clean sollutions in mind, so for now i've punted on this and made the code throw an explicit error if you try to use "cscore()" with the sort local param. I think this patch is pretty much ready to go, but i want to sleep on it and give it at least one more full read through before committing.
          Hide
          David Smiley added a comment -

          BTW I really appreciate your thorough patch notes. Nice job!

          Show
          David Smiley added a comment - BTW I really appreciate your thorough patch notes. Nice job!
          Hide
          Joel Bernstein added a comment -

          Yes, I second that. I've been following pretty closely but unfortunately haven't had a chance to review. I know this is a pretty big piece of work.

          Show
          Joel Bernstein added a comment - Yes, I second that. I've been following pretty closely but unfortunately haven't had a chance to review. I know this is a pretty big piece of work.
          Hide
          Hoss Man added a comment -

          a few updates...

          • Updated patch to trunk
          • Fixed an NPE in the edge case that sort='score desc' is used (can't believe i missed that before ... bug existed because of existing SortSpec optimizations that recognize that as a default sort)
          • Made some additions to QueryEqualityTest to sanity check equivalence with new sort local param

          ...still testing.

          Show
          Hoss Man added a comment - a few updates... Updated patch to trunk Fixed an NPE in the edge case that sort='score desc' is used (can't believe i missed that before ... bug existed because of existing SortSpec optimizations that recognize that as a default sort) Made some additions to QueryEqualityTest to sanity check equivalence with new sort local param ...still testing.
          Hide
          ASF subversion and git services added a comment -

          Commit 1714133 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1714133 ]

          SOLR-6168: Add a 'sort' local param to the collapse QParser to support using complex sort options to select the representitive doc for each collapsed group

          Show
          ASF subversion and git services added a comment - Commit 1714133 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1714133 ] SOLR-6168 : Add a 'sort' local param to the collapse QParser to support using complex sort options to select the representitive doc for each collapsed group
          Hide
          Hoss Man added a comment -

          Finally getting back to this ...

          Did a bunch of manual testing offline w/o spotting any problems, so i went ahead and commited to trunk.

          backport to 5x was fairly clean, testing/precommiting now.

          Assuming nothing weird jumps out, I'll let jenkins hammer on trunk overnight and then commit the backport tomorrow.

          Show
          Hoss Man added a comment - Finally getting back to this ... Did a bunch of manual testing offline w/o spotting any problems, so i went ahead and commited to trunk. backport to 5x was fairly clean, testing/precommiting now. Assuming nothing weird jumps out, I'll let jenkins hammer on trunk overnight and then commit the backport tomorrow.
          Hide
          ASF subversion and git services added a comment -

          Commit 1714234 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1714234 ]

          SOLR-6168: Add a 'sort' local param to the collapse QParser to support using complex sort options to select the representitive doc for each collapsed group (merge 1714133)

          Show
          ASF subversion and git services added a comment - Commit 1714234 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1714234 ] SOLR-6168 : Add a 'sort' local param to the collapse QParser to support using complex sort options to select the representitive doc for each collapsed group (merge 1714133)
          Show
          Hoss Man added a comment - Ref guide updated... https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
          Hide
          Steve Rowe added a comment -

          ASF Jenkins found a reproducible NPE in a new test committed under this issue - see SOLR-8295

          Show
          Steve Rowe added a comment - ASF Jenkins found a reproducible NPE in a new test committed under this issue - see SOLR-8295

            People

            • Assignee:
              Hoss Man
              Reporter:
              Umesh Prasad
            • Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development