Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 1.2
    • Component/s: search
    • Labels:
      None

      Description

      The current search throws an "ArrayIndexOutOfBoundsException" if you specify a sort field but do not include an order. This is anoying and difficult to debug (when you are starting)

      Here is a patch to avoid the exception and use the default sort order if you only specify a field. I'm not sure the 'null' case was even possible in the current code:

      Index: QueryParsing.java
      ===================================================================
      — QueryParsing.java (revision 494681)
      +++ QueryParsing.java (working copy)
      @@ -186,13 +186,12 @@
      }

      // get the direction of the sort

      • str=parts[pos];
      • if ("top".equals(str) || "desc".equals(str)) { - top=true; - }

        else if ("bottom".equals(str) || "asc".equals(str))

        { - top=false; - }

        else {

      • return null; // must not be a sort command
        + // by default, top is true, only change it if it is "bottom" or "asc"
        + if( parts.length > pos )
        Unknown macro: {+ str=parts[pos];+ if ("bottom".equals(str) || "asc".equals(str)) { + top=false; + } }
      1. DefaultSortOrder.patch
        8 kB
        Ryan McKinley
      2. DefaultSortOrder.patch
        5 kB
        Ryan McKinley
      3. DefaultSortOrder.patch
        0.7 kB
        Ryan McKinley

        Activity

        Hide
        Ryan McKinley added a comment -

        I apologize... the first version does not pass ConvertedLegacyTest.testABunchOfConvertedStuff()!

        This second version rewrites most of parseSort() and passes the tests.

        I am a bit confused by the old parser because it seems to be handling a legacy paging system. The javadocs show the sort formats as:

        • Examples:
        • <pre>
        • top 10 #take the top 10 by score
        • desc 10 #take the top 10 by score
        • score desc 10 #take the top 10 by score
        • weight bottom 10 #sort by weight ascending and take the first 10
        • weight desc #sort by weight descending
        • height desc,weight desc #sort by height descending, and use weight descending to break any ties
        • height desc,weight asc top 20 #sort by height descending, using weight ascending as a tiebreaker
          *</pre>

        This puts the "count" in the SortSpec.num field.

        SortSpec.getCount() is only called from: SolrPluginUtils.doSimpleQuery(). Is this a mistake? Isn't the count specified from rows=XXX?

        Show
        Ryan McKinley added a comment - I apologize... the first version does not pass ConvertedLegacyTest.testABunchOfConvertedStuff()! This second version rewrites most of parseSort() and passes the tests. I am a bit confused by the old parser because it seems to be handling a legacy paging system. The javadocs show the sort formats as: Examples: <pre> top 10 #take the top 10 by score desc 10 #take the top 10 by score score desc 10 #take the top 10 by score weight bottom 10 #sort by weight ascending and take the first 10 weight desc #sort by weight descending height desc,weight desc #sort by height descending, and use weight descending to break any ties height desc,weight asc top 20 #sort by height descending, using weight ascending as a tiebreaker *</pre> This puts the "count" in the SortSpec.num field. SortSpec.getCount() is only called from: SolrPluginUtils.doSimpleQuery(). Is this a mistake? Isn't the count specified from rows=XXX?
        Hide
        Hoss Man added a comment -

        specifying the number of results to return as part of the sort information is in fact some fairly "legacy" behavior that we probably don't need to worry about supporting (i don't think anyone has ever used it in the history of Solar/Solr ... but in general i don't think the proper way to address this bug is to "assume" a default direction if all the user does is specify a field name ... mainly because i don't know that we can really assume a particular "default" ... failing with an error at query time definitely seems like the right behavior to me ... but we should certainly return a usefull eror message instead of an ArrayIndexOutOfBoundsException.

        In the same way a query which can't be parsed cleanly generates a ParseException, a sort specification which can't be parsed cleanly should generate a ParseException (or soemthing like it)

        Show
        Hoss Man added a comment - specifying the number of results to return as part of the sort information is in fact some fairly "legacy" behavior that we probably don't need to worry about supporting (i don't think anyone has ever used it in the history of Solar/Solr ... but in general i don't think the proper way to address this bug is to "assume" a default direction if all the user does is specify a field name ... mainly because i don't know that we can really assume a particular "default" ... failing with an error at query time definitely seems like the right behavior to me ... but we should certainly return a usefull eror message instead of an ArrayIndexOutOfBoundsException. In the same way a query which can't be parsed cleanly generates a ParseException, a sort specification which can't be parsed cleanly should generate a ParseException (or soemthing like it)
        Hide
        Yonik Seeley added a comment -

        The number was there because I originally thought about allowing a chain of sorts to narrow results, so you could do things like:

        give me TVs with a 50" screen or bigger, then take the bottom 50% by weight, then take the bottom 50% by depth, then give me the top 10 by review rating.

        expressible by the following syntax:

        inches:[50 TO *]; weight bottom 50%; depth bottom 50%; rating top 10

        But I never really got around to implementing that

        Related bugs: https://issues.apache.org/jira/browse/SOLR-9

        Show
        Yonik Seeley added a comment - The number was there because I originally thought about allowing a chain of sorts to narrow results, so you could do things like: give me TVs with a 50" screen or bigger, then take the bottom 50% by weight, then take the bottom 50% by depth, then give me the top 10 by review rating. expressible by the following syntax: inches: [50 TO *] ; weight bottom 50%; depth bottom 50%; rating top 10 But I never really got around to implementing that Related bugs: https://issues.apache.org/jira/browse/SOLR-9
        Hide
        Yonik Seeley added a comment -

        I agree that throwing a ParseException (ideally a SolrException with a code of 400) with a meaningful error message to help out the user is the right way to go.

        Show
        Yonik Seeley added a comment - I agree that throwing a ParseException (ideally a SolrException with a code of 400) with a meaningful error message to help out the user is the right way to go.
        Hide
        Ryan McKinley added a comment -

        Ok, we all agree, that it should
        1. Throw a reasonable error if something goes wrong
        2. Throw an error if you sort on a non-indexed field (SOLR-9)

        I still think it should have a default sort order (SQL does) but as long as it does not throw an ArrayIndexOutOfBoundsException exception I am happy

        What should happen with the 'legacy' format?
        a. leave it as is. (perhaps add a comment for the next person who can't figure out what it does)
        b. get rid of it.

        Show
        Ryan McKinley added a comment - Ok, we all agree, that it should 1. Throw a reasonable error if something goes wrong 2. Throw an error if you sort on a non-indexed field ( SOLR-9 ) I still think it should have a default sort order (SQL does) but as long as it does not throw an ArrayIndexOutOfBoundsException exception I am happy What should happen with the 'legacy' format? a. leave it as is. (perhaps add a comment for the next person who can't figure out what it does) b. get rid of it.
        Hide
        Hoss Man added a comment -

        the problem with a default sort order (which ANSI SQL does wrong in my opinion) is that it can lead to "buggy" behavior without making it clear to the user what happened ... users can test their app, get results, and those results might look like they are in the correct order because of the nature of the data (ie: they sort by popularity and all documents currently have no popularity because their application isn't live yet) and only later do they discover that it's acctually "defaulting" and the default isn't what they want.

        as for the legacy format: if it's not hurting anything, i don't see any reason to remove it. if it's easy to catch the ArrayIndexOutOfBoundsException and rethrow a better aexception without gutting the legacy parsing code, we might as well leave it in.

        if that parsing code is buggy and would require a lot of additional work to get it's parsing/error handling equally robust, then purning it is probably fine.

        Show
        Hoss Man added a comment - the problem with a default sort order (which ANSI SQL does wrong in my opinion) is that it can lead to "buggy" behavior without making it clear to the user what happened ... users can test their app, get results, and those results might look like they are in the correct order because of the nature of the data (ie: they sort by popularity and all documents currently have no popularity because their application isn't live yet) and only later do they discover that it's acctually "defaulting" and the default isn't what they want. as for the legacy format: if it's not hurting anything, i don't see any reason to remove it. if it's easy to catch the ArrayIndexOutOfBoundsException and rethrow a better aexception without gutting the legacy parsing code, we might as well leave it in. if that parsing code is buggy and would require a lot of additional work to get it's parsing/error handling equally robust, then purning it is probably fine.
        Hide
        Ryan McKinley added a comment -

        I just attached a new version. It throws an exception when:
        1) missing sort order
        2) sort on unindexed field (SOLR-6)
        3) sort on a field it can not find

        It does not assume a default sort order.

        I added some tests. I did not see any general way to test stuff that should throw exceptions, so i added assertQEx() to AbstractSolrTestCase.

        I added the tests to ConvertedLegacyTest because that is where the other stuff testing sorting happens.

        Show
        Ryan McKinley added a comment - I just attached a new version. It throws an exception when: 1) missing sort order 2) sort on unindexed field ( SOLR-6 ) 3) sort on a field it can not find It does not assume a default sort order. I added some tests. I did not see any general way to test stuff that should throw exceptions, so i added assertQEx() to AbstractSolrTestCase. I added the tests to ConvertedLegacyTest because that is where the other stuff testing sorting happens.
        Hide
        Ryan McKinley added a comment -

        oops, by SOLR-6, i really mean SOLR-9

        I appologize in advance for my dyslexia. Thank goodness for compilers!

        Show
        Ryan McKinley added a comment - oops, by SOLR-6 , i really mean SOLR-9 I appologize in advance for my dyslexia. Thank goodness for compilers!
        Hide
        Yonik Seeley added a comment -

        Committed. Thanks Ryan!

        Show
        Yonik Seeley added a comment - Committed. Thanks Ryan!

          People

          • Assignee:
            Unassigned
            Reporter:
            Ryan McKinley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development