Solr
  1. Solr
  2. SOLR-1297

Enable sorting by Function Query

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search

      1. SOLR-1297_weightSorts.patch
        17 kB
        Yonik Seeley
      2. SOLR-1297.better.field.support.patch
        13 kB
        Hoss Man
      3. SOLR-1297.patch
        32 kB
        Yonik Seeley
      4. SOLR-1297.patch
        4 kB
        Grant Ingersoll
      5. SOLR-1297-2.patch
        2 kB
        Koji Sekiguchi
      6. SOLR-1297.patch
        24 kB
        Grant Ingersoll

        Issue Links

          Activity

          Grant Ingersoll created issue -
          Grant Ingersoll made changes -
          Field Original Value New Value
          Link This issue is related to SOLR-773 [ SOLR-773 ]
          Hide
          Grant Ingersoll added a comment -

          Note, there is a temporary workaround for this: (main query)^0 func(...)

          Show
          Grant Ingersoll added a comment - Note, there is a temporary workaround for this: (main query)^0 func(...)
          Grant Ingersoll made changes -
          Assignee Grant Ingersoll [ gsingers ]
          Grant Ingersoll made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Grant Ingersoll committed 889825 (1 file)
          Reviews: none

          SOLR-1297: establish some standalone, baseline tests for parsing the sort strings

          Grant Ingersoll committed 889827 (1 file)
          Hide
          Grant Ingersoll added a comment -

          For this, I think we want to be able to do things like:

          Just functions

          sort=dist(2,x,y, point(0,0)) desc
          

          Multiple sort params, some functions, some fields

          sort=weight asc,dist(2,x,y, point(0,0)) asc
          

          If and when a function result cache exists, we should be able to take advantage of that too, but that is an implementation detail.

          Show
          Grant Ingersoll added a comment - For this, I think we want to be able to do things like: Just functions sort=dist(2,x,y, point(0,0)) desc Multiple sort params, some functions, some fields sort=weight asc,dist(2,x,y, point(0,0)) asc If and when a function result cache exists, we should be able to take advantage of that too, but that is an implementation detail.
          Hide
          Grant Ingersoll added a comment -

          First crack at a patch. I think it's in pretty good shape, but I had to rewrite QueryParsing.parseSort(), so it bears some extra scrutiny (although I put tests in place first).

          Added a bunch of new tests for both parsing and for search (see QueryParsingTest and SortByFunctionTest). All existing and new tests pass.

          Show
          Grant Ingersoll added a comment - First crack at a patch. I think it's in pretty good shape, but I had to rewrite QueryParsing.parseSort(), so it bears some extra scrutiny (although I put tests in place first). Added a bunch of new tests for both parsing and for search (see QueryParsingTest and SortByFunctionTest). All existing and new tests pass.
          Grant Ingersoll made changes -
          Attachment SOLR-1297.patch [ 12427821 ]
          Grant Ingersoll made changes -
          Link This issue relates to SOLR-1650 [ SOLR-1650 ]
          Hide
          Grant Ingersoll added a comment -

          Committed revision 889997.

          Show
          Grant Ingersoll added a comment - Committed revision 889997.
          Grant Ingersoll made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Yonik Seeley added a comment -

          Just curious - what were the reasons behind adding a getSortField() method to ValueSource?

          Show
          Yonik Seeley added a comment - Just curious - what were the reasons behind adding a getSortField() method to ValueSource?
          Hide
          Grant Ingersoll added a comment -

          See the processSort() method in QueryParsing. Seemed like the logical way for the ValueSource to be able to define how sorting should work for the ValueSource, essentially mirroring getSortField from the FieldType.

          Show
          Grant Ingersoll added a comment - See the processSort() method in QueryParsing. Seemed like the logical way for the ValueSource to be able to define how sorting should work for the ValueSource, essentially mirroring getSortField from the FieldType.
          Hide
          Yonik Seeley added a comment -

          function queries aren't weighted... reopening to track this problem.

          Show
          Yonik Seeley added a comment - function queries aren't weighted... reopening to track this problem.
          Yonik Seeley made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Grant Ingersoll added a comment -

          Not following. What do I need to fix?

          Show
          Grant Ingersoll added a comment - Not following. What do I need to fix?
          Hide
          Yonik Seeley added a comment -

          Function queries can contain normal queries, which need weighting to produce correct scores and work properly. This may not be straightforward to get right, hence I've just left this issue open for now.

          Show
          Yonik Seeley added a comment - Function queries can contain normal queries, which need weighting to produce correct scores and work properly. This may not be straightforward to get right, hence I've just left this issue open for now.
          Hide
          Koji Sekiguchi added a comment -

          When I set bit complex function to sort parameter, I got the error:

          Must declare sort field or function
          org.apache.solr.common.SolrException: Must declare sort field or function
          at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
          at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
          at org.apache.solr.search.QueryParsingTest.testSort(QueryParsingTest.java:105)

          Attached the fix and the test case.

          Show
          Koji Sekiguchi added a comment - When I set bit complex function to sort parameter, I got the error: Must declare sort field or function org.apache.solr.common.SolrException: Must declare sort field or function at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281) at org.apache.solr.search.QueryParsingTest.testSort(QueryParsingTest.java:105) Attached the fix and the test case.
          Koji Sekiguchi made changes -
          Attachment SOLR-1297-2.patch [ 12437462 ]
          Hide
          Yonik Seeley added a comment -

          Finally got a chance to check this out a little more... here are the current issues I think:

          • functions need to be weighted (as previously noted)
          • parseSort needs to actually parse the function query with the real function query parsing code, rather than trying to parse it itself first - two sets of parsing code for the same thing will always lead to subtle bugs. For example, things like query($qq) fail to parse, as do functions with string literals that contain markup like parens.
          • parseFunction is called (that should really be deprecated) which doesn't use the right QParser (it just constructs a local one with the incorrect core - SolrCore.getSolrCore())
          Show
          Yonik Seeley added a comment - Finally got a chance to check this out a little more... here are the current issues I think: functions need to be weighted (as previously noted) parseSort needs to actually parse the function query with the real function query parsing code, rather than trying to parse it itself first - two sets of parsing code for the same thing will always lead to subtle bugs. For example, things like query($qq) fail to parse, as do functions with string literals that contain markup like parens. parseFunction is called (that should really be deprecated) which doesn't use the right QParser (it just constructs a local one with the incorrect core - SolrCore.getSolrCore())
          Hide
          Grant Ingersoll added a comment -

          parseSort needs to actually parse the function query with the real function query parsing code, rather than trying to parse it itself first - two sets of parsing code for the same thing will always lead to subtle bugs. For example, things like query($qq) fail to parse, as do functions with string literals that contain markup like parens.

          Not sure I understand this, I didn't write any new parsing code.

          Show
          Grant Ingersoll added a comment - parseSort needs to actually parse the function query with the real function query parsing code, rather than trying to parse it itself first - two sets of parsing code for the same thing will always lead to subtle bugs. For example, things like query($qq) fail to parse, as do functions with string literals that contain markup like parens. Not sure I understand this, I didn't write any new parsing code.
          Hide
          Yonik Seeley added a comment -

          Not sure I understand this, I didn't write any new parsing code.

          You may not be producing a ValueSource, but you added parsing code to parseSort() to tell if something was a function query and find the end of it. And I verified it doesn't work for a couple of cases (the query($qq) and string literals with markup). We should really just use the actual value source parser itself.

          Show
          Yonik Seeley added a comment - Not sure I understand this, I didn't write any new parsing code. You may not be producing a ValueSource, but you added parsing code to parseSort() to tell if something was a function query and find the end of it. And I verified it doesn't work for a couple of cases (the query($qq) and string literals with markup). We should really just use the actual value source parser itself.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hoss Man made changes -
          Fix Version/s 1.5 [ 12313566 ]
          Fix Version/s 3.1 [ 12314371 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s Next [ 12315093 ]
          Hide
          Grant Ingersoll added a comment -

          Here's some more unit tests.

          Show
          Grant Ingersoll added a comment - Here's some more unit tests.
          Grant Ingersoll made changes -
          Attachment SOLR-1297.patch [ 12448512 ]
          Hide
          Scott Kister added a comment -

          Bug report on this feature, if there is a wildcard entry in schema.xml such as the following.

          <!-- Ignore any fields that don't already match an existing field name -->
          <dynamicField name="*" type="ignored" multiValued="true" />

          then this feature does not work and an error is returned, ie

          GET 'http://localhost:8983/solr/select?q=:&sort=sum(1,2)+asc'
          Error 400 can not sort on unindexed field: sum(1,2)

          Show
          Scott Kister added a comment - Bug report on this feature, if there is a wildcard entry in schema.xml such as the following. <!-- Ignore any fields that don't already match an existing field name --> <dynamicField name="*" type="ignored" multiValued="true" /> then this feature does not work and an error is returned, ie GET 'http://localhost:8983/solr/select?q= : &sort=sum(1,2)+asc' Error 400 can not sort on unindexed field: sum(1,2)
          Hide
          Yonik Seeley added a comment -

          Here's a patch that re-implements the sort parsing such that arbitrarily complex function queries can be parsed.

          This does not address the weighting issue (which isn't a parsing issue) that causes relevancy queries not not work correctly.

          Show
          Yonik Seeley added a comment - Here's a patch that re-implements the sort parsing such that arbitrarily complex function queries can be parsed. This does not address the weighting issue (which isn't a parsing issue) that causes relevancy queries not not work correctly.
          Yonik Seeley made changes -
          Attachment SOLR-1297.patch [ 12456017 ]
          Hide
          Hoss Man added a comment -

          FYI: with Yonik's latest commit (#1003107) the problem that Scott mentioned is no longer an issue – but the reason is that Yonik's changes severly limit the set of field names that can legally be sorted on to those that meat the same requirements as a java identifier (ie: "1_s" is no longer a field name that can be sorted on)

          An idea that has been floating around in my head since i saw Scott's comment is that instead of trying to parse a field name first, and then doing function parsing second, we should reverse the order: try to parse as a function first, then as a field name (because we don't allow wild card style function names, but we do allow dynamic field names)

          After looking at Yonik's commit, i brought this up with him on IRC and we came to a consensus that:

          • we should discourage people from having crazy field names
          • it might be expensive to try and parse every sort param as a function if it then turned out to be a simple field name - particularly since some FieldTypes might support SortField but not ValueSource (so trying to parse as a function first would likely result in an exception)
          • we should optimize for the simple common case of simple field names

          So with that in mind, i came up with the attached patch which leaves in the quick short circuit for field names that are really simple, then tries to parse as a function, then as a last resort uses the same basic parsing that Solr 1.4 did: treats everything up to the first whitespace as a potential field name.

          Attached patch shows some new tests that it fixes (i also included some other test additions related to the general concept of localparams in the sort param to ensure i didn't break them)

          Show
          Hoss Man added a comment - FYI: with Yonik's latest commit (#1003107) the problem that Scott mentioned is no longer an issue – but the reason is that Yonik's changes severly limit the set of field names that can legally be sorted on to those that meat the same requirements as a java identifier (ie: "1_s" is no longer a field name that can be sorted on) An idea that has been floating around in my head since i saw Scott's comment is that instead of trying to parse a field name first, and then doing function parsing second, we should reverse the order: try to parse as a function first, then as a field name (because we don't allow wild card style function names, but we do allow dynamic field names) After looking at Yonik's commit, i brought this up with him on IRC and we came to a consensus that: we should discourage people from having crazy field names it might be expensive to try and parse every sort param as a function if it then turned out to be a simple field name - particularly since some FieldTypes might support SortField but not ValueSource (so trying to parse as a function first would likely result in an exception) we should optimize for the simple common case of simple field names So with that in mind, i came up with the attached patch which leaves in the quick short circuit for field names that are really simple, then tries to parse as a function, then as a last resort uses the same basic parsing that Solr 1.4 did: treats everything up to the first whitespace as a potential field name. Attached patch shows some new tests that it fixes (i also included some other test additions related to the general concept of localparams in the sort param to ensure i didn't break them)
          Hoss Man made changes -
          Attachment SOLR-1297.better.field.support.patch [ 12456076 ]
          Hide
          Hoss Man added a comment -

          In dependent of my previous patch, there is one other nuance i discoverd today that i wanted to bring up...

          The way localparams are currently supported in the sort param, they are independent for each "segment" of the sort. The easiest way to make sense of this is to look at the following contrived example URL (you can try it against the example data on trunk) ...

          ?q=*:*&fl=price,id&foo=0&sort={!foo=2}sum($foo,0)+asc,+mul($foo,price)+desc,+id+asc
          

          ...that URL (currently) results in a sort which is equivalent to "id asc" .. the localParam value of "foo=2" at the begining of the sort param is only in scope for the first segment...

          {!foo=2}sum($foo,0) asc
          

          ..the second segment of the sort param winds up getting the "global" value of "foo=0" ...

          mul($foo,price) desc
          

          ...so it becomes a NOOP (multiplication by zero)

          This limited scoping of localParams in the sort param seems counter intuitive to me. I don't know that i have a better suggestion for how it should work, but i wanted to draw attention to it in case other folks had any ideas

          Show
          Hoss Man added a comment - In dependent of my previous patch, there is one other nuance i discoverd today that i wanted to bring up... The way localparams are currently supported in the sort param, they are independent for each "segment" of the sort. The easiest way to make sense of this is to look at the following contrived example URL (you can try it against the example data on trunk) ... ?q=*:*&fl=price,id&foo=0&sort={!foo=2}sum($foo,0)+asc,+mul($foo,price)+desc,+id+asc ...that URL (currently) results in a sort which is equivalent to "id asc" .. the localParam value of "foo=2" at the begining of the sort param is only in scope for the first segment... {!foo=2}sum($foo,0) asc ..the second segment of the sort param winds up getting the "global" value of "foo=0" ... mul($foo,price) desc ...so it becomes a NOOP (multiplication by zero) This limited scoping of localParams in the sort param seems counter intuitive to me. I don't know that i have a better suggestion for how it should work, but i wanted to draw attention to it in case other folks had any ideas
          Koji Sekiguchi committed 1003745 (32 files)
          Reviews: none

          SOLR-1297: fix sort by function parsing (merge 1003107 and part of 997870,998707(JSONTestUtil.java) to pass TestFunctionQuery)

          Lucene branch_3x
          Hide
          Yonik Seeley added a comment -

          Last patch looks good (aside from the javadoc typos

          Show
          Yonik Seeley added a comment - Last patch looks good (aside from the javadoc typos
          Hide
          Grant Ingersoll added a comment -

          Hoss, you going to commit? Please backport to 3.x, too, as all of the other stuff has been.

          Show
          Grant Ingersoll added a comment - Hoss, you going to commit? Please backport to 3.x, too, as all of the other stuff has been.
          Hoss Man committed 1045253 (2 files)
          Reviews: none

          SOLR-1297: improvements to sort param parsing code so more fields with exentric names (that were supported for sorting in older versions of solr) will be checked for after attemptint to parse as a function

          Hide
          Hoss Man added a comment -

          Sorry, finally circled back on my SOLR-1297.better.field.support.patch ...

          trunk: Committed revision 1045253.
          3x: Committed revision 1045256.

          Show
          Hoss Man added a comment - Sorry, finally circled back on my SOLR-1297 .better.field.support.patch ... trunk: Committed revision 1045253. 3x: Committed revision 1045256.
          Yonik Seeley made changes -
          Assignee Grant Ingersoll [ gsingers ] Yonik Seeley [ yseeley@gmail.com ]
          Hide
          Yonik Seeley added a comment -

          Here's a patch for trunk that fixes the issue with function queries used as sorts not being weighted.

          Show
          Yonik Seeley added a comment - Here's a patch for trunk that fixes the issue with function queries used as sorts not being weighted.
          Yonik Seeley made changes -
          Attachment SOLR-1297_weightSorts.patch [ 12468791 ]
          Hide
          Yonik Seeley added a comment -

          Committed to trunk and backported to 3x.
          I think that was the last thing holding this issue open.

          Show
          Yonik Seeley added a comment - Committed to trunk and backported to 3x. I think that was the last thing holding this issue open.
          Yonik Seeley made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Fix Version/s 1.5 [ 12313566 ]
          Resolution Fixed [ 1 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Grant Ingersoll
            • Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development