Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4.1
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      If documents sorted have the same sort value, they are sorted in the same way even if the reverse flag is true.

        Activity

        Hide
        Earwin Burrfoot added a comment -

        If you're not using a deterministic tie-breaker, in case of a tie you'll get random results even doing the same search over and over again. That has a chance to happen exactly after the next index segment merge. On merge, documents change their docIds and can even be reordered depending on how the segments are being merged. Seeing as you're not getting stable results for forward sort, I can't fathom how you're expecting reverse sort to give you something predictable.

        I used two solutions to this problem - use docId as a tie-breaker, reversing it together with the primary field. That gave me some stability inbetween merges and nice sort reversal. Then I switched to using my application document id - it is constant during the life of a document, I need to retrieve it anyway for all the documents found, and for this exact reason it rests in a cache that makes sorting on that field lightning fast.

        Show
        Earwin Burrfoot added a comment - If you're not using a deterministic tie-breaker, in case of a tie you'll get random results even doing the same search over and over again. That has a chance to happen exactly after the next index segment merge. On merge, documents change their docIds and can even be reordered depending on how the segments are being merged. Seeing as you're not getting stable results for forward sort, I can't fathom how you're expecting reverse sort to give you something predictable. I used two solutions to this problem - use docId as a tie-breaker, reversing it together with the primary field. That gave me some stability inbetween merges and nice sort reversal. Then I switched to using my application document id - it is constant during the life of a document, I need to retrieve it anyway for all the documents found, and for this exact reason it rests in a cache that makes sorting on that field lightning fast.
        Hide
        Hoss Man added a comment -

        If this assumption holds for your application, you can use the approach suggested by Michael and explicitly express it in the query. But to force it onto all users of Lucene doesn't seem to be a good solution.

        exactly. by being undefined Lucene doesn't have to explicitly compute a secondary sort when the client doesn't ask for it...

        // sorts on field F ascending, behavior in ties in undefined
        Sort a1 = new Sort(new SortField[] { new SortField(F, SortField.STRING, false) });
        // sorts on field F descending, behavior in ties is undefined.
        Sort d1 = new Sort(new SortField[] { new SortField(F, SortField.STRING, true) });
        
        // sorts on field F ascending, behavior in ties is ascending sort by internal id 
        Sort a2 = new Sort(new SortField[] { 
           new SortField(F, SortField.STRING, false), new SortField(null, SortField.DOC, false) });
        // sorts on field F descending, behavior in ties is descending sort by internal id
        Sort d2 = new Sort(new SortField[] { 
           new SortField(F, SortField.STRING, true), new SortField(null, SortField.DOC, true) });
        
        Show
        Hoss Man added a comment - If this assumption holds for your application, you can use the approach suggested by Michael and explicitly express it in the query. But to force it onto all users of Lucene doesn't seem to be a good solution. exactly. by being undefined Lucene doesn't have to explicitly compute a secondary sort when the client doesn't ask for it... // sorts on field F ascending, behavior in ties in undefined Sort a1 = new Sort( new SortField[] { new SortField(F, SortField.STRING, false ) }); // sorts on field F descending, behavior in ties is undefined. Sort d1 = new Sort( new SortField[] { new SortField(F, SortField.STRING, true ) }); // sorts on field F ascending, behavior in ties is ascending sort by internal id Sort a2 = new Sort( new SortField[] { new SortField(F, SortField.STRING, false ), new SortField( null , SortField.DOC, false ) }); // sorts on field F descending, behavior in ties is descending sort by internal id Sort d2 = new Sort( new SortField[] { new SortField(F, SortField.STRING, true ), new SortField( null , SortField.DOC, true ) });
        Hide
        Wolf Siberski added a comment -

        The usual semantics for sort is that items with the same score are in random order (not only for Lucene). The semantics you propose essentially model the assumption that users prefer documents with lower id when they have the same score. If this assumption holds for your application, you can use the approach suggested by Michael and explicitly express it in the query. But to force it onto all users of Lucene doesn't seem to be a good solution.

        Show
        Wolf Siberski added a comment - The usual semantics for sort is that items with the same score are in random order (not only for Lucene). The semantics you propose essentially model the assumption that users prefer documents with lower id when they have the same score. If this assumption holds for your application, you can use the approach suggested by Michael and explicitly express it in the query. But to force it onto all users of Lucene doesn't seem to be a good solution.
        Hide
        Jiri Kuhn added a comment -

        Well, consider this. You have

        • a search result displayed as web page
        • the search has ability to show it reversed
        • the documents found are all equal (in the sense of lucene sort )

        Now you click on the link to see result reversed, what happens? Nothing. The documents are orderer in the same way as before. But one would expect that first document became last etc. This is natural expectation which is covered by the test.

        You said - the behaviour is undefined, let define it!

        Show
        Jiri Kuhn added a comment - Well, consider this. You have a search result displayed as web page the search has ability to show it reversed the documents found are all equal (in the sense of lucene sort ) Now you click on the link to see result reversed, what happens? Nothing. The documents are orderer in the same way as before. But one would expect that first document became last etc. This is natural expectation which is covered by the test. You said - the behaviour is undefined, let define it!
        Hide
        Hoss Man added a comment -

        my take is similar to Michael...

        The test in the patch doesn't seem valid to me. Specifying a reverse sort on FIELD_1 just means that the documents must be returned in descending order of the value in FIELD_1 – it does not say anything about breaking ties when two docs have equal values for FIELD_1, the behavior in that case is undefined (and just so happens to be consistent)

        Show
        Hoss Man added a comment - my take is similar to Michael... The test in the patch doesn't seem valid to me. Specifying a reverse sort on FIELD_1 just means that the documents must be returned in descending order of the value in FIELD_1 – it does not say anything about breaking ties when two docs have equal values for FIELD_1, the behavior in that case is undefined (and just so happens to be consistent)
        Hide
        Michael McCandless added a comment -

        I can appreciate the motivation to fix this, but I don't see a
        reliable way to do so.

        EG say I sort first by Title reversed and then document size
        non-reversed. In that case, should the tie-breaker (sort by docID) be
        reversed or not? (Or, vice/versa).

        I don't think it's well defined, because Lucene doesn't have a
        toplevel (in the Sort object) reversed boolean (it's per-SortField).

        That fallback ("compare by docID") is sort of an emergency
        tie-breaker, to make sure you get deterministic results when your sort
        leaves ambiguity.

        One simple way to get the behavior you want is to disambiguate
        your sort by adding SortField.FIELD_DOC at the end. Then you can
        explicitly control whether it's reversed or not...

        Show
        Michael McCandless added a comment - I can appreciate the motivation to fix this, but I don't see a reliable way to do so. EG say I sort first by Title reversed and then document size non-reversed. In that case, should the tie-breaker (sort by docID) be reversed or not? (Or, vice/versa). I don't think it's well defined, because Lucene doesn't have a toplevel (in the Sort object) reversed boolean (it's per-SortField). That fallback ("compare by docID") is sort of an emergency tie-breaker, to make sure you get deterministic results when your sort leaves ambiguity. One simple way to get the behavior you want is to disambiguate your sort by adding SortField.FIELD_DOC at the end. Then you can explicitly control whether it's reversed or not...
        Hide
        Jiri Kuhn added a comment -

        The patch comes with unit test which demonstrates the bug. All other tests pass.

        I tried to follow how to contribute guidelines, but I was not able to run successfuly compatibility tests (ant clean test-tag) even without my modifications...

        Show
        Jiri Kuhn added a comment - The patch comes with unit test which demonstrates the bug. All other tests pass. I tried to follow how to contribute guidelines, but I was not able to run successfuly compatibility tests (ant clean test-tag) even without my modifications...

          People

          • Assignee:
            Unassigned
            Reporter:
            Jiri Kuhn
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development