Lucene - Core
  1. Lucene - Core
  2. LUCENE-5399

PagingFieldCollector is very slow with String fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      PagingFieldCollector (sort comparator) is significantly slower with string fields, because of how its "seen on a previous page" works: it calls compareDocToValue(int doc, T t) first to check this. (its the only user of this method)

      This is very slow with String, because no ordinals are used. so each document must lookup ord, then lookup bytes, then compare bytes.

      I think maybe we should replace this method with an 'after' slot, and just have compareDocToAfter or something.

      Otherwise we could use a hack-patch like the one i will upload (i did this just to test the performance, although tests do pass).

      1. LUCENE-5399.patch
        140 kB
        Michael McCandless
      2. LUCENE-5399.patch
        71 kB
        Robert Muir
      3. LUCENE-5399.patch
        71 kB
        Robert Muir
      4. LUCENE-5399.patch
        71 kB
        Michael McCandless
      5. LUCENE-5399.patch
        57 kB
        Michael McCandless
      6. LUCENE-5399.patch
        30 kB
        Michael McCandless
      7. LUCENE-5399.patch
        30 kB
        Michael McCandless
      8. LUCENE-5399.patch
        28 kB
        Michael McCandless
      9. LUCENE-5399.patch
        5 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        here's my patch (the hack version).

        I found this because solr's searchAfter integration (SOLR-5463) requires a sort on the unique key, so basically that means its tripping this worst case.

        Show
        Robert Muir added a comment - here's my patch (the hack version). I found this because solr's searchAfter integration ( SOLR-5463 ) requires a sort on the unique key, so basically that means its tripping this worst case.
        Hide
        Michael McCandless added a comment -

        Good catch Rob; I like your solution.

        It's crazy we have this code dup (the duplicate, I think out of date, copy
        of TermOrdValComparator in Solr) ... I think we should refactor here
        and fix Solr to use Lucene's improved version. You shouldn't have to
        fix issues like this in N places.

        So I started from your patch, and got missing last/first support
        working in Lucene ... but I haven't dedup'd Solr's copy yet.

        I also want to try reversing the "check bottom" and "check top" in
        PagingFieldCollector; I think that might give some speedups too.

        Show
        Michael McCandless added a comment - Good catch Rob; I like your solution. It's crazy we have this code dup (the duplicate, I think out of date, copy of TermOrdValComparator in Solr) ... I think we should refactor here and fix Solr to use Lucene's improved version. You shouldn't have to fix issues like this in N places. So I started from your patch, and got missing last/first support working in Lucene ... but I haven't dedup'd Solr's copy yet. I also want to try reversing the "check bottom" and "check top" in PagingFieldCollector; I think that might give some speedups too.
        Hide
        Robert Muir added a comment -

        oh this is great, this was really frustrating when trying to test the missing value support in DV.

        as far as "my solution", i think its totally bogus actually: its caching a reference to a mutable thing (BytesRef)... I just did it this way because exactly one and only one thing calls this method (PagingFIeldCollector) and so I knew i could do benchmarks and so on (i only investigated this as some 'page 2' results with searchAfter were like 8x slower than if you didnt use searchAfter, with current patch like 2x).

        as far as reversing the checks, that sounds fantastic, I think that explains why my page 2 results are still 2x slower: 2x the work because they read the ordinal twice, since most hits are neither competitive nor previously visited.

        Show
        Robert Muir added a comment - oh this is great, this was really frustrating when trying to test the missing value support in DV. as far as "my solution", i think its totally bogus actually: its caching a reference to a mutable thing (BytesRef)... I just did it this way because exactly one and only one thing calls this method (PagingFIeldCollector) and so I knew i could do benchmarks and so on (i only investigated this as some 'page 2' results with searchAfter were like 8x slower than if you didnt use searchAfter, with current patch like 2x). as far as reversing the checks, that sounds fantastic, I think that explains why my page 2 results are still 2x slower: 2x the work because they read the ordinal twice, since most hits are neither competitive nor previously visited.
        Hide
        Michael McCandless added a comment -

        New patch, reversing the order of "checking bottom" and "checking top"; I think this should be a good speedup: it should greatly reduce how many times we call FC.compareDocToValue for typical cases.

        Show
        Michael McCandless added a comment - New patch, reversing the order of "checking bottom" and "checking top"; I think this should be a good speedup: it should greatly reduce how many times we call FC.compareDocToValue for typical cases.
        Hide
        Robert Muir added a comment -

        I benchmarked the patch, it seems to erase that 2x performance difference, with just some small % overhead. Thanks Mike! I actually think your opto is way more potent than mine, but it would still be great to keep both of them, to keep speed high for the deeper pages too.

        i think now we should just investigate a simple slot api, it would remove some per-doc checks too (since the after value is not changing), but mainly will be cleaner and not have the bogus caching of the bytesref object reference...

        Show
        Robert Muir added a comment - I benchmarked the patch, it seems to erase that 2x performance difference, with just some small % overhead. Thanks Mike! I actually think your opto is way more potent than mine, but it would still be great to keep both of them, to keep speed high for the deeper pages too. i think now we should just investigate a simple slot api, it would remove some per-doc checks too (since the after value is not changing), but mainly will be cleaner and not have the bogus caching of the bytesref object reference...
        Hide
        Michael McCandless added a comment -

        New patch, cutting over to explicit up-front slot instead of the slower compareDocToValue.

        Show
        Michael McCandless added a comment - New patch, cutting over to explicit up-front slot instead of the slower compareDocToValue.
        Hide
        Robert Muir added a comment -

        Are you sure its not the same copy of the old patch?

        Show
        Robert Muir added a comment - Are you sure its not the same copy of the old patch?
        Hide
        Michael McCandless added a comment -

        Argh, trying again...

        Show
        Michael McCandless added a comment - Argh, trying again...
        Hide
        Michael McCandless added a comment -

        Another iteration, adding fangs to TestSearchAfter, fixing some crabs in StringValComparator, and fixing some nocommits ...

        Show
        Michael McCandless added a comment - Another iteration, adding fangs to TestSearchAfter, fixing some crabs in StringValComparator, and fixing some nocommits ...
        Hide
        Robert Muir added a comment -

        same as mike's patch, but compiles with (at least my version) of java7.

        would be great if it could be setTopValue(T value)

        Show
        Robert Muir added a comment - same as mike's patch, but compiles with (at least my version) of java7. would be great if it could be setTopValue(T value)
        Hide
        Uwe Schindler added a comment -

        You could do:

              // Tell all comparators their top value:
              for(int i=0;i<comparators.length;i++) {
                @SuppressWarnings("unchecked")
                FieldComparator<Object> comp = (FieldComparator<Object>) comparators;
                comp.setTopValue(after.fields);
              }
        
        Show
        Uwe Schindler added a comment - You could do: // Tell all comparators their top value: for ( int i=0;i<comparators.length;i++) { @SuppressWarnings( "unchecked" ) FieldComparator< Object > comp = (FieldComparator< Object >) comparators; comp.setTopValue(after.fields); }
        Hide
        Robert Muir added a comment -

        MIke what java compiler are you using?

        Show
        Robert Muir added a comment - MIke what java compiler are you using?
        Hide
        Robert Muir added a comment -

        patch with Uwe's idea.

        Show
        Robert Muir added a comment - patch with Uwe's idea.
        Hide
        ASF subversion and git services added a comment -

        Commit 1558450 from Robert Muir in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558450 ]

        LUCENE-5399: make branch

        Show
        ASF subversion and git services added a comment - Commit 1558450 from Robert Muir in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558450 ] LUCENE-5399 : make branch
        Hide
        ASF subversion and git services added a comment -

        Commit 1558451 from Robert Muir in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558451 ]

        LUCENE-5399: current state

        Show
        ASF subversion and git services added a comment - Commit 1558451 from Robert Muir in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558451 ] LUCENE-5399 : current state
        Hide
        ASF subversion and git services added a comment -

        Commit 1558516 from Michael McCandless in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558516 ]

        LUCENE-5399: add fangs, fix 2 bugs

        Show
        ASF subversion and git services added a comment - Commit 1558516 from Michael McCandless in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558516 ] LUCENE-5399 : add fangs, fix 2 bugs
        Hide
        ASF subversion and git services added a comment -

        Commit 1558552 from Robert Muir in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558552 ]

        LUCENE-5399: remove code duplication

        Show
        ASF subversion and git services added a comment - Commit 1558552 from Robert Muir in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558552 ] LUCENE-5399 : remove code duplication
        Hide
        ASF subversion and git services added a comment -

        Commit 1558565 from Michael McCandless in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558565 ]

        LUCENE-5399: add fangs, but no new bugs found...

        Show
        ASF subversion and git services added a comment - Commit 1558565 from Michael McCandless in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558565 ] LUCENE-5399 : add fangs, but no new bugs found...
        Hide
        ASF subversion and git services added a comment -

        Commit 1558618 from Robert Muir in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558618 ]

        LUCENE-5399, SOLR-5354: fix distributed grouping to marshal/unmarshal sort values properly

        Show
        ASF subversion and git services added a comment - Commit 1558618 from Robert Muir in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558618 ] LUCENE-5399 , SOLR-5354 : fix distributed grouping to marshal/unmarshal sort values properly
        Hide
        ASF subversion and git services added a comment -

        Commit 1558621 from Michael McCandless in branch 'dev/branches/lucene539399'
        [ https://svn.apache.org/r1558621 ]

        LUCENE-5399: remove nocommit

        Show
        ASF subversion and git services added a comment - Commit 1558621 from Michael McCandless in branch 'dev/branches/lucene539399' [ https://svn.apache.org/r1558621 ] LUCENE-5399 : remove nocommit
        Hide
        Michael McCandless added a comment -

        I think this is ready ... here's the patch (using diffSources.py).

        Show
        Michael McCandless added a comment - I think this is ready ... here's the patch (using diffSources.py).
        Hide
        ASF subversion and git services added a comment -

        Commit 1558865 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1558865 ]

        LUCENE-5399: add missingFirst/last support when sorting by Type.STRING; speed up deep paging; fix solr's distributed group sort for certain field types

        Show
        ASF subversion and git services added a comment - Commit 1558865 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1558865 ] LUCENE-5399 : add missingFirst/last support when sorting by Type.STRING; speed up deep paging; fix solr's distributed group sort for certain field types
        Hide
        ASF subversion and git services added a comment -

        Commit 1558887 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1558887 ]

        LUCENE-5399: add missingFirst/last support when sorting by Type.STRING; speed up deep paging; fix solr's distributed group sort for certain field types

        Show
        ASF subversion and git services added a comment - Commit 1558887 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1558887 ] LUCENE-5399 : add missingFirst/last support when sorting by Type.STRING; speed up deep paging; fix solr's distributed group sort for certain field types
        Hide
        ASF subversion and git services added a comment -

        Commit 1559196 from Michael McCandless in branch 'dev/branches/lucene5376'
        [ https://svn.apache.org/r1559196 ]

        LUCENE-5376, LUCENE-5399: add missingLast support to lucene server

        Show
        ASF subversion and git services added a comment - Commit 1559196 from Michael McCandless in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1559196 ] LUCENE-5376 , LUCENE-5399 : add missingLast support to lucene server
        Hide
        ASF subversion and git services added a comment -

        Commit 1570569 from Martijn van Groningen in branch 'dev/trunk'
        [ https://svn.apache.org/r1570569 ]

        Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.

        Show
        ASF subversion and git services added a comment - Commit 1570569 from Martijn van Groningen in branch 'dev/trunk' [ https://svn.apache.org/r1570569 ] Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.
        Hide
        ASF subversion and git services added a comment -

        Commit 1570573 from Martijn van Groningen in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1570573 ]

        Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.

        Show
        ASF subversion and git services added a comment - Commit 1570573 from Martijn van Groningen in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1570573 ] Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.
        Hide
        ASF subversion and git services added a comment -

        Commit 1570576 from Martijn van Groningen in branch 'dev/branches/lucene_solr_4_7'
        [ https://svn.apache.org/r1570576 ]

        Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.

        Show
        ASF subversion and git services added a comment - Commit 1570576 from Martijn van Groningen in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1570576 ] Fixed typo in CHANGES.txt for issue LUCENE-5399 and moved that issue under optimizations.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development