Lucene - Core
  1. Lucene - Core
  2. LUCENE-5493

Rename Sorter, NumericDocValuesSorter, and fix javadocs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Its not clear to users that these are for this super-expert thing of pre-sorting the index. From the names and documentation they think they should use them instead of Sort/SortField.

      These need to be renamed or, even better, the API fixed so they aren't public classes.

      1. LUCENE-5493.patch
        84 kB
        Robert Muir
      2. LUCENE-5493-poc.patch
        11 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          I agree. Intially when I read the mail on the mailing list I was confused about what the user was doing! I had no idea, that he tried to mix IndexSorter's APIs with a custom collector, which is likely to fail.

          Show
          Uwe Schindler added a comment - I agree. Intially when I read the mail on the mailing list I was confused about what the user was doing! I had no idea, that he tried to mix IndexSorter's APIs with a custom collector, which is likely to fail.
          Hide
          Robert Muir added a comment -

          If you look at the javadocs for all of lucene and then you read the description of these classes, you can see how the user was easily confused.

          Because of the problems here, my first plan of attack will be to remove these classes from public view completely. If i cannot do that, I will rename them and add warnings.

          Show
          Robert Muir added a comment - If you look at the javadocs for all of lucene and then you read the description of these classes, you can see how the user was easily confused. Because of the problems here, my first plan of attack will be to remove these classes from public view completely. If i cannot do that, I will rename them and add warnings.
          Hide
          Robert Muir added a comment -

          Upon review of the APIs, I think ideal to the user is to remove these current "sorter/comparators" so that when you want to use sorting mergepolicy, you just pass it a normal org.apache.lucene.search.Sort.

          I know it seems a little crazy, but IMO the logic is duplicated. So someone should just be doing:

          Sort sort = new Sort(new SortField("field1", SortField.Type.DOUBLE), new SortField(....));
          iwc.setMergePolicy(mp, new SortingMergePolicy(sort));
          

          This would let people be able to sort in reverse, by doubles/floats, by a combination of fields, expressions, whatever. And would deconfuse the API.

          Show
          Robert Muir added a comment - Upon review of the APIs, I think ideal to the user is to remove these current "sorter/comparators" so that when you want to use sorting mergepolicy, you just pass it a normal org.apache.lucene.search.Sort. I know it seems a little crazy, but IMO the logic is duplicated. So someone should just be doing: Sort sort = new Sort( new SortField( "field1" , SortField.Type.DOUBLE), new SortField(....)); iwc.setMergePolicy(mp, new SortingMergePolicy(sort)); This would let people be able to sort in reverse, by doubles/floats, by a combination of fields, expressions, whatever. And would deconfuse the API.
          Hide
          Michael McCandless added a comment -

          That would be great, if we could just use Sort here!

          +1 to deconfuse the API.

          Show
          Michael McCandless added a comment - That would be great, if we could just use Sort here! +1 to deconfuse the API.
          Hide
          Robert Muir added a comment -

          Another reason is that IndexSearcher knows about Sort already. so this would give us a path if we want a better integration here in the future.

          If we did it right, no additional info/apis from the user would be needed other than setting the mergePolicy at index-time: indexSearcher.search(query, filter, int, sort) for example could do the right thing for a segment, if the passed in query-time sort is "covered" by the sort order of the index. But thats for the future.

          Show
          Robert Muir added a comment - Another reason is that IndexSearcher knows about Sort already. so this would give us a path if we want a better integration here in the future. If we did it right, no additional info/apis from the user would be needed other than setting the mergePolicy at index-time: indexSearcher.search(query, filter, int, sort) for example could do the right thing for a segment, if the passed in query-time sort is "covered" by the sort order of the index. But thats for the future.
          Hide
          Robert Muir added a comment -

          Here is a very simple proof of concept patch.

          I made a SortSorter (extends existing Sorter api and takes o.a.l.s.Sort), removed NumericDocValuesSorter, and replaced it with this more general Sort-Sorter in all tests and they pass.

          So my next step would be to remove public apis like Sorter/DocMap and make that all internal. SortingMP and EarlyTerminatingSortingCollector would just take Sort directly.

          BlockJoinSorter needs to be cutover to a regular comparator. And in suggest/ there is a custom comparator... that i think doesnt need to be custom and is just sorting on a dv field.

          Show
          Robert Muir added a comment - Here is a very simple proof of concept patch. I made a SortSorter (extends existing Sorter api and takes o.a.l.s.Sort), removed NumericDocValuesSorter, and replaced it with this more general Sort-Sorter in all tests and they pass. So my next step would be to remove public apis like Sorter/DocMap and make that all internal. SortingMP and EarlyTerminatingSortingCollector would just take Sort directly. BlockJoinSorter needs to be cutover to a regular comparator. And in suggest/ there is a custom comparator... that i think doesnt need to be custom and is just sorting on a dv field.
          Hide
          ASF subversion and git services added a comment -

          Commit 1574866 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574866 ]

          LUCENE-5493: create branch

          Show
          ASF subversion and git services added a comment - Commit 1574866 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574866 ] LUCENE-5493 : create branch
          Hide
          ASF subversion and git services added a comment -

          Commit 1574867 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574867 ]

          LUCENE-5493: commit current state

          Show
          ASF subversion and git services added a comment - Commit 1574867 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574867 ] LUCENE-5493 : commit current state
          Hide
          Adrien Grand added a comment -

          +1 on making those classes wrap a `Sort`. I had started working on it for LUCENE-5314 but never got a chance to get a patch ready.

          Show
          Adrien Grand added a comment - +1 on making those classes wrap a `Sort`. I had started working on it for LUCENE-5314 but never got a chance to get a patch ready.
          Hide
          ASF subversion and git services added a comment -

          Commit 1574909 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574909 ]

          LUCENE-5493: make BlockJoinSorter a ComparatorSource taking parent/child Sort

          Show
          ASF subversion and git services added a comment - Commit 1574909 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574909 ] LUCENE-5493 : make BlockJoinSorter a ComparatorSource taking parent/child Sort
          Hide
          ASF subversion and git services added a comment -

          Commit 1574918 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574918 ]

          LUCENE-5493: hide Sorter, SortSorter, fix tests, change suggest to use public Sort API, cut over collector to take Sort

          Show
          ASF subversion and git services added a comment - Commit 1574918 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574918 ] LUCENE-5493 : hide Sorter, SortSorter, fix tests, change suggest to use public Sort API, cut over collector to take Sort
          Hide
          ASF subversion and git services added a comment -

          Commit 1574925 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574925 ]

          LUCENE-5493: remove dead code

          Show
          ASF subversion and git services added a comment - Commit 1574925 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574925 ] LUCENE-5493 : remove dead code
          Hide
          ASF subversion and git services added a comment -

          Commit 1574926 from Michael McCandless in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574926 ]

          LUCENE-5493: small clean ups

          Show
          ASF subversion and git services added a comment - Commit 1574926 from Michael McCandless in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574926 ] LUCENE-5493 : small clean ups
          Hide
          ASF subversion and git services added a comment -

          Commit 1574928 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574928 ]

          LUCENE-5493: minor cleanups/opto

          Show
          ASF subversion and git services added a comment - Commit 1574928 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574928 ] LUCENE-5493 : minor cleanups/opto
          Hide
          ASF subversion and git services added a comment -

          Commit 1574945 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574945 ]

          LUCENE-5493: simplify this test

          Show
          ASF subversion and git services added a comment - Commit 1574945 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574945 ] LUCENE-5493 : simplify this test
          Hide
          ASF subversion and git services added a comment -

          Commit 1574949 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574949 ]

          LUCENE-5493: merge Sorter and SortSorter (in progress)

          Show
          ASF subversion and git services added a comment - Commit 1574949 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574949 ] LUCENE-5493 : merge Sorter and SortSorter (in progress)
          Hide
          ASF subversion and git services added a comment -

          Commit 1574954 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574954 ]

          LUCENE-5493: javadocs

          Show
          ASF subversion and git services added a comment - Commit 1574954 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574954 ] LUCENE-5493 : javadocs
          Hide
          ASF subversion and git services added a comment -

          Commit 1574962 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574962 ]

          LUCENE-5493: fix precommit

          Show
          ASF subversion and git services added a comment - Commit 1574962 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574962 ] LUCENE-5493 : fix precommit
          Hide
          ASF subversion and git services added a comment -

          Commit 1574965 from Michael McCandless in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574965 ]

          LUCENE-5493: don't do forceMerge on initital build of AnalyzingInfixSuggester

          Show
          ASF subversion and git services added a comment - Commit 1574965 from Michael McCandless in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574965 ] LUCENE-5493 : don't do forceMerge on initital build of AnalyzingInfixSuggester
          Hide
          ASF subversion and git services added a comment -

          Commit 1574969 from Michael McCandless in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574969 ]

          LUCENE-5493: fix solr

          Show
          ASF subversion and git services added a comment - Commit 1574969 from Michael McCandless in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574969 ] LUCENE-5493 : fix solr
          Hide
          ASF subversion and git services added a comment -

          Commit 1574972 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1574972 ]

          LUCENE-5493: add CHANGES

          Show
          ASF subversion and git services added a comment - Commit 1574972 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1574972 ] LUCENE-5493 : add CHANGES
          Hide
          Robert Muir added a comment -

          Here is a patch for review. The public API is much simpler and I think it makes the SortingMP a lot more flexible and easier to use.

          Show
          Robert Muir added a comment - Here is a patch for review. The public API is much simpler and I think it makes the SortingMP a lot more flexible and easier to use.
          Hide
          Michael McCandless added a comment -

          +1, looks great.

          This also makes it trivial to do impact-sorted postings by an arbitrary expression.

          Show
          Michael McCandless added a comment - +1, looks great. This also makes it trivial to do impact-sorted postings by an arbitrary expression.
          Hide
          ASF subversion and git services added a comment -

          Commit 1575008 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1575008 ]

          LUCENE-5493: javadocs cleanups

          Show
          ASF subversion and git services added a comment - Commit 1575008 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1575008 ] LUCENE-5493 : javadocs cleanups
          Hide
          ASF subversion and git services added a comment -

          Commit 1575017 from Robert Muir in branch 'dev/branches/lucene5493'
          [ https://svn.apache.org/r1575017 ]

          LUCENE-5493: add missing experimental tag

          Show
          ASF subversion and git services added a comment - Commit 1575017 from Robert Muir in branch 'dev/branches/lucene5493' [ https://svn.apache.org/r1575017 ] LUCENE-5493 : add missing experimental tag
          Hide
          David Smiley added a comment -

          What is meant by "impact sorted postings"?

          Show
          David Smiley added a comment - What is meant by "impact sorted postings"?
          Hide
          David Smiley added a comment -

          +1. Beatuful API change Rob!

          Show
          David Smiley added a comment - +1. Beatuful API change Rob!
          Hide
          Adrien Grand added a comment -

          This is a very nice cleanup, and the ability to use any Sort object including expressions makes it very flexible. +1 to commit

          Show
          Adrien Grand added a comment - This is a very nice cleanup, and the ability to use any Sort object including expressions makes it very flexible. +1 to commit
          Hide
          Michael McCandless added a comment -

          What is meant by "impact sorted postings"?

          It's when you sort your documents according to biggest "impact" which is your own measure, and which you intend to sort by at search time. AnalyzingInfixSuggester uses this, to sort the suggestions by their weight. This way if you are looking for 5 suggestions, you can stop searching after collecting 5 hits, which is an enormous speedup when the query would have otherwise matched many documents.

          See e.g. http://nlp.stanford.edu/IR-book/html/htmledition/impact-ordering-1.html

          Show
          Michael McCandless added a comment - What is meant by "impact sorted postings"? It's when you sort your documents according to biggest "impact" which is your own measure, and which you intend to sort by at search time. AnalyzingInfixSuggester uses this, to sort the suggestions by their weight. This way if you are looking for 5 suggestions, you can stop searching after collecting 5 hits, which is an enormous speedup when the query would have otherwise matched many documents. See e.g. http://nlp.stanford.edu/IR-book/html/htmledition/impact-ordering-1.html
          Hide
          Michael McCandless added a comment -

          Another reason to sort postings is to get better compression, e.g. if you index many web sites, then sorting so that docs from the same web site are together, can sometimes give better compression, maybe

          In that case, you don't use an EarlyTerminatingCollector; you just search "like normal".

          Show
          Michael McCandless added a comment - Another reason to sort postings is to get better compression, e.g. if you index many web sites, then sorting so that docs from the same web site are together, can sometimes give better compression, maybe In that case, you don't use an EarlyTerminatingCollector; you just search "like normal".
          Hide
          Uwe Schindler added a comment -

          Beautiful! I like the linest starting with "- "

          Show
          Uwe Schindler added a comment - Beautiful! I like the linest starting with "- "
          Hide
          ASF subversion and git services added a comment -

          Commit 1575248 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1575248 ]

          LUCENE-5493: cut over index sorting to use Sort api for specifying the order

          Show
          ASF subversion and git services added a comment - Commit 1575248 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1575248 ] LUCENE-5493 : cut over index sorting to use Sort api for specifying the order
          Hide
          ASF subversion and git services added a comment -

          Commit 1575253 from Robert Muir in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1575253 ]

          LUCENE-5493: cut over index sorting to use Sort api for specifying the order

          Show
          ASF subversion and git services added a comment - Commit 1575253 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1575253 ] LUCENE-5493 : cut over index sorting to use Sort api for specifying the order
          Hide
          Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          Show
          Uwe Schindler added a comment - Close issue after release of 4.8.0

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development