Lucene - Core
  1. Lucene - Core
  2. LUCENE-2690

Do MultiTermQuery boolean rewrites per segment

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper on the top-level reader. This is inefficient.

      This patch changes the rewrite modes to do the rewrites per segment and uses some additional datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but FuzzyQuery's tests should not, because it depends for the minimum score handling, that the terms are collected in order..

      Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

      1. LUCENE-2690.patch
        62 kB
        Uwe Schindler
      2. LUCENE-2690.patch
        56 kB
        Uwe Schindler
      3. LUCENE-2690.patch
        49 kB
        Uwe Schindler
      4. LUCENE-2690.patch
        46 kB
        Uwe Schindler
      5. LUCENE-2690.patch
        46 kB
        Uwe Schindler
      6. LUCENE-2690.patch
        46 kB
        Uwe Schindler
      7. LUCENE-2690.patch
        45 kB
        Uwe Schindler
      8. LUCENE-2690.patch
        39 kB
        Robert Muir
      9. LUCENE-2690-attributes.patch
        38 kB
        Uwe Schindler
      10. LUCENE-2690-attributes.patch
        50 kB
        Uwe Schindler
      11. LUCENE-2690-attributes.patch
        50 kB
        Uwe Schindler
      12. LUCENE-2690.patch
        27 kB
        Uwe Schindler
      13. LUCENE-2690.patch
        26 kB
        Simon Willnauer
      14. LUCENE-2690.patch
        26 kB
        Uwe Schindler
      15. LUCENE-2690.patch
        26 kB
        Uwe Schindler
      16. LUCENE-2690.patch
        26 kB
        Simon Willnauer
      17. LUCENE-2690-hack.patch
        23 kB
        Michael McCandless
      18. LUCENE-2690.patch
        19 kB
        Uwe Schindler
      19. LUCENE-2690.patch
        16 kB
        Michael McCandless
      20. LUCENE-2690.patch
        17 kB
        Robert Muir
      21. LUCENE-2690.patch
        13 kB
        Uwe Schindler
      22. LUCENE-2690.patch
        12 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Updated patch, that also checks for duplicate terms in the fuzzy rewrite. This should be fine now, but we need to fix the FuzzyQuery tests to checks for multiple segments with the same terms that should fail with this patch.

          Maybe we need a separate MTQ tests that creates two IndexWriters which add documents with an overlapping term set to both indexes. Queries are then ran using MzultiReader, so we can control merging and make sure the term appears really in two "segments". I will work on a test for that.

          Show
          Uwe Schindler added a comment - Updated patch, that also checks for duplicate terms in the fuzzy rewrite. This should be fine now, but we need to fix the FuzzyQuery tests to checks for multiple segments with the same terms that should fail with this patch. Maybe we need a separate MTQ tests that creates two IndexWriters which add documents with an overlapping term set to both indexes. Queries are then ran using MzultiReader, so we can control merging and make sure the term appears really in two "segments". I will work on a test for that.
          Hide
          Michael McCandless added a comment -

          It'd be nice somehow to have MTQ.getTotalNumberOfTerms return the unique term count instead of the total number of terms visited across all segments...

          Show
          Michael McCandless added a comment - It'd be nice somehow to have MTQ.getTotalNumberOfTerms return the unique term count instead of the total number of terms visited across all segments...
          Hide
          Michael McCandless added a comment -

          We have to sort the terms coming out of the BytesRefHash, else we get bad seek performance because the within-block seek opto will otherwise often fail to apply...

          So I used a TreeMap instead of HashMap.

          Then ran a quick perf test on 10 M Wikipedia index:

          Query QPS clean QPS mtqseg Pct diff
          unit* 11.83 11.80 -0.3%
          un*d 13.64 16.95 24.3%
          u*d 2.67 3.77 41.1%
          un*ed 34.85 74.94 115.0%
          uni*ed 183.37 437.13 138.4%

          So these are good gains! I can't run FuzzyQuery until we fix the tie-break problem...

          I'm really not sure why the prefix query sees no gain yet the others do (I would have actually expected the reverse, because PrefixTermsEnum's accept method is so simple).

          Show
          Michael McCandless added a comment - We have to sort the terms coming out of the BytesRefHash, else we get bad seek performance because the within-block seek opto will otherwise often fail to apply... So I used a TreeMap instead of HashMap. Then ran a quick perf test on 10 M Wikipedia index: Query QPS clean QPS mtqseg Pct diff unit* 11.83 11.80 -0.3% un*d 13.64 16.95 24.3% u*d 2.67 3.77 41.1% un*ed 34.85 74.94 115.0% uni*ed 183.37 437.13 138.4% So these are good gains! I can't run FuzzyQuery until we fix the tie-break problem... I'm really not sure why the prefix query sees no gain yet the others do (I would have actually expected the reverse, because PrefixTermsEnum's accept method is so simple).
          Hide
          Robert Muir added a comment -

          we fixed some bugs in the patch, it is not ready for committing, but the tests now pass.

          Show
          Robert Muir added a comment - we fixed some bugs in the patch, it is not ready for committing, but the tests now pass.
          Hide
          Michael McCandless added a comment -

          Another iteration... same perf as before but more failures!

          Show
          Michael McCandless added a comment - Another iteration... same perf as before but more failures!
          Hide
          Uwe Schindler added a comment -

          Same as Mike's patch, only with some nocommits removed (max clause count increased) and added the missing FloatUtils file.

          Show
          Uwe Schindler added a comment - Same as Mike's patch, only with some nocommits removed (max clause count increased) and added the missing FloatUtils file.
          Hide
          Uwe Schindler added a comment -

          by the way: no more failures - only speed improvements (mostly)! Seems that TestFuzzyQuery2 failed because of the incorrectly increased max clause count!

          The last thing to do is fixing the attribute stuff to separate the two attribute parts.

          Show
          Uwe Schindler added a comment - by the way: no more failures - only speed improvements (mostly)! Seems that TestFuzzyQuery2 failed because of the incorrectly increased max clause count! The last thing to do is fixing the attribute stuff to separate the two attribute parts.
          Hide
          Simon Willnauer added a comment -

          Guys, awesome improvements!! Here are some comments...

          • In CutOffTermCollector:
             final BytesRefHash pendingTerms = new BytesRefHash(new ByteBlockPool(new RecyclingByteBlockAllocator()));

            Sice we do not reuse the allocator we don't need to use the synced one here. There is no reset call anywhere to free the allocated blocks too. We should just use new BytesRefHash() here.

          • BooleanQueryRewrite#rewrite uses a HashMap to keep track of BytesRef and TermFreqBoost. I wonder if we should make use of the ParallelArray technique we us in the indexing chain together with a BytesRefHash which could safe us lots of object creation and GC cost would be lower to once MTQ gets under load. Those MTQ can create a very large amount of objects though and this seems to be a hot spot. I currently have use-cases for direct support of something like a ParallelArray base class in LUCENE-2186 and it seems we can use it here too.
          • In FloatsUtil#nextAfter I wonder if we need the following lines:
            return new Float(direction)
            ...
            return Double.valueOf(direction).floatValue();
            

            since those methods do nothing else than a (float) direction case really.

          Show
          Simon Willnauer added a comment - Guys, awesome improvements!! Here are some comments... In CutOffTermCollector: final BytesRefHash pendingTerms = new BytesRefHash( new ByteBlockPool( new RecyclingByteBlockAllocator())); Sice we do not reuse the allocator we don't need to use the synced one here. There is no reset call anywhere to free the allocated blocks too. We should just use new BytesRefHash() here. BooleanQueryRewrite#rewrite uses a HashMap to keep track of BytesRef and TermFreqBoost. I wonder if we should make use of the ParallelArray technique we us in the indexing chain together with a BytesRefHash which could safe us lots of object creation and GC cost would be lower to once MTQ gets under load. Those MTQ can create a very large amount of objects though and this seems to be a hot spot. I currently have use-cases for direct support of something like a ParallelArray base class in LUCENE-2186 and it seems we can use it here too. In FloatsUtil#nextAfter I wonder if we need the following lines: return new Float (direction) ... return Double .valueOf(direction).floatValue(); since those methods do nothing else than a (float) direction case really.
          Hide
          Michael McCandless added a comment -

          I attached a hacked patch... nowhere near committable, various tests
          fail, etc... yet I think once we clean it up, the approach is viable.

          I started from the patch like 2 iterations ago, and then fixed how the
          MTQ BQ rewrite works so that instead of the two passes (first to
          gather matching terms, second to create weight/scorers & run the BQ),
          it now makes a single pass.

          In that single pass it records which terms matched which segments, and
          creates TermScorer for each.

          After the single pass, once we've summed up the top level docFreq for
          all terms, I go back and reset the weights for all the TermScorers,
          sumSQ them, normalize, etc., and then create a FakeQuery object whose
          only purpose is to remember the per-segment scorers and provide them
          once .scorer(...) is called on each segment.

          The big gain with this approach is you don't waste effort trying to
          seek to non-existent terms in the sub readers. Normally the terms
          cache would save you here, but, we never cache a miss and so when we
          try to look that up again it's always a real (costly) seek.

          With this approach we can disable using the terms cache entirely from
          MTQ.rewrite, which is great.

          I believe the patch works correctly, at least for this test, because
          on my 10M wikipedia index it gets identical top N results as clean
          trunk. Here're the perf gains:

          Query QPS clean QPS mtqseg Pct diff
          state 37.49 37.40 -0.2%
          unit* 11.86 20.23 70.5%
          un*d 13.58 30.85 127.2%
          uni*ed 173.22 535.27 209.0%
          u*d 2.61 9.05 247.3%
          un*ed 33.59 120.32 258.1%

          Note that these gains already include the sizable gains from the
          original patch, but the single pass approach makes further great
          gains, especially eg on the prefix query.

          I don't think we should couple this new patch w/ this issue... this
          issue already has awesome gains with a fairly minor change...
          I'll open a new issue.

          Show
          Michael McCandless added a comment - I attached a hacked patch... nowhere near committable, various tests fail, etc... yet I think once we clean it up, the approach is viable. I started from the patch like 2 iterations ago, and then fixed how the MTQ BQ rewrite works so that instead of the two passes (first to gather matching terms, second to create weight/scorers & run the BQ), it now makes a single pass. In that single pass it records which terms matched which segments, and creates TermScorer for each. After the single pass, once we've summed up the top level docFreq for all terms, I go back and reset the weights for all the TermScorers, sumSQ them, normalize, etc., and then create a FakeQuery object whose only purpose is to remember the per-segment scorers and provide them once .scorer(...) is called on each segment. The big gain with this approach is you don't waste effort trying to seek to non-existent terms in the sub readers. Normally the terms cache would save you here, but, we never cache a miss and so when we try to look that up again it's always a real (costly) seek. With this approach we can disable using the terms cache entirely from MTQ.rewrite, which is great. I believe the patch works correctly, at least for this test, because on my 10M wikipedia index it gets identical top N results as clean trunk. Here're the perf gains: Query QPS clean QPS mtqseg Pct diff state 37.49 37.40 -0.2% unit* 11.86 20.23 70.5% un*d 13.58 30.85 127.2% uni*ed 173.22 535.27 209.0% u*d 2.61 9.05 247.3% un*ed 33.59 120.32 258.1% Note that these gains already include the sizable gains from the original patch, but the single pass approach makes further great gains, especially eg on the prefix query. I don't think we should couple this new patch w/ this issue... this issue already has awesome gains with a fairly minor change... I'll open a new issue.
          Hide
          Michael McCandless added a comment -

          OK I opened LUCENE-2694.

          Show
          Michael McCandless added a comment - OK I opened LUCENE-2694 .
          Hide
          Robert Muir added a comment -

          The big gain with this approach is you don't waste effort trying to
          seek to non-existent terms in the sub readers. Normally the terms
          cache would save you here, but, we never cache a miss and so when we
          try to look that up again it's always a real (costly) seek.

          With this approach we can disable using the terms cache entirely from
          MTQ.rewrite, which is great.

          This is the way to go because its horrible for the MTQ to touch the terms cache at all,
          and depending on it for good performance is even worse.

          I think if you somehow changed the benchmark to use multiple threads and had different
          queries executing at the same time, you would see these guys fighting each other
          over huge amounts of terms with df=1 and slowing each other down... but we wouldnt
          have this problem with them rewriting to FakeQuery

          Show
          Robert Muir added a comment - The big gain with this approach is you don't waste effort trying to seek to non-existent terms in the sub readers. Normally the terms cache would save you here, but, we never cache a miss and so when we try to look that up again it's always a real (costly) seek. With this approach we can disable using the terms cache entirely from MTQ.rewrite, which is great. This is the way to go because its horrible for the MTQ to touch the terms cache at all, and depending on it for good performance is even worse. I think if you somehow changed the benchmark to use multiple threads and had different queries executing at the same time, you would see these guys fighting each other over huge amounts of terms with df=1 and slowing each other down... but we wouldnt have this problem with them rewriting to FakeQuery
          Hide
          Robert Muir added a comment -

          In FloatsUtil#nextAfter I wonder if we need the following lines:

          Simon, this is a good point. I poached this method from harmony's StrictMath.nextAfter
          Its interesting to take a look also at their Math.nextAfter

          The difference is this Double promotion, I don't understand if this affects us at all or what it would change.
          In both cases I do not understand why the boxing is necessary!

          Show
          Robert Muir added a comment - In FloatsUtil#nextAfter I wonder if we need the following lines: Simon, this is a good point. I poached this method from harmony's StrictMath.nextAfter Its interesting to take a look also at their Math.nextAfter The difference is this Double promotion, I don't understand if this affects us at all or what it would change. In both cases I do not understand why the boxing is necessary!
          Hide
          Uwe Schindler added a comment -

          Simon:

          Sice we do not reuse the allocator we don't need to use the synced one here. There is no reset call anywhere to free the allocated blocks too. We should just use new BytesRefHash() here.

          There is no such ctor in trunk. The only available allocator is the used one.

          Show
          Uwe Schindler added a comment - Simon: Sice we do not reuse the allocator we don't need to use the synced one here. There is no reset call anywhere to free the allocated blocks too. We should just use new BytesRefHash() here. There is no such ctor in trunk. The only available allocator is the used one.
          Hide
          Simon Willnauer added a comment -

          There is no such ctor in trunk. The only available allocator is the used one.

          good point there is one in mine - I'm going to upload a patch for this later / tomorrow...

          Show
          Simon Willnauer added a comment - There is no such ctor in trunk. The only available allocator is the used one. good point there is one in mine - I'm going to upload a patch for this later / tomorrow...
          Hide
          Simon Willnauer added a comment -

          Current patch makes use of a DirectAllocator without recycling etc. I remove the unnecessary boxing in FlaotsUtil and replaced the terms HashMap with a BytesRefHash. I skipped the latest patch since mike marked it as a hack and opened a new issue for that. This one is based on uwes latest one. All tests pass for me though.

          Show
          Simon Willnauer added a comment - Current patch makes use of a DirectAllocator without recycling etc. I remove the unnecessary boxing in FlaotsUtil and replaced the terms HashMap with a BytesRefHash. I skipped the latest patch since mike marked it as a hack and opened a new issue for that. This one is based on uwes latest one. All tests pass for me though.
          Hide
          Uwe Schindler added a comment -

          Thanks for the improvements, some comments and changes I did locally:

          • The code in BooleanQueryRewrite uses += for the boost and docFreq in the case of (>=0, no entry in BytesRefHash), but this should only be an assignment. The update and comparison in the assert should be done only when an entry is already in the hash. Boosts should never be sumed up.
          • The parts for update with LUCENE-2702 are marked, they wrap currently with new BytesRef(#get) and should be replaced with code like it was before using PagedBytes
          • The work for creating the BytesStartArray is much to do, maybe we can unfinal the DirectBytesStartArray and reuse the code. This would make it easier to extend it and simply add more parallel arrays. Client code should not need to replcate the code (this is maybe another issue).
          • But there is also a problem with the current code in TermFreqBoostByteStart: The arrays may not use the exact same size as expected (depending how oversize/grow works). As they are parallel arrays, all should be equal size, so we should only use grow/oversize only for the base array and resize the others to same size. Do we have an ArrayUtil method for that? Currently it (may) be broken. Any comments?
          Show
          Uwe Schindler added a comment - Thanks for the improvements, some comments and changes I did locally: The code in BooleanQueryRewrite uses += for the boost and docFreq in the case of (>=0, no entry in BytesRefHash), but this should only be an assignment. The update and comparison in the assert should be done only when an entry is already in the hash. Boosts should never be sumed up. The parts for update with LUCENE-2702 are marked, they wrap currently with new BytesRef(#get ) and should be replaced with code like it was before using PagedBytes The work for creating the BytesStartArray is much to do, maybe we can unfinal the DirectBytesStartArray and reuse the code. This would make it easier to extend it and simply add more parallel arrays. Client code should not need to replcate the code (this is maybe another issue). But there is also a problem with the current code in TermFreqBoostByteStart: The arrays may not use the exact same size as expected (depending how oversize/grow works). As they are parallel arrays, all should be equal size, so we should only use grow/oversize only for the base array and resize the others to same size. Do we have an ArrayUtil method for that? Currently it (may) be broken. Any comments?
          Hide
          Uwe Schindler added a comment -

          Here a patch with the allocation problems resolved. Also the DirectBytesStartArray is public.

          Show
          Uwe Schindler added a comment - Here a patch with the allocation problems resolved. Also the DirectBytesStartArray is public.
          Hide
          Simon Willnauer added a comment -

          The code in BooleanQueryRewrite uses += for the boost and docFreq in the case of (>=0, no entry in BytesRefHash), but this should only be an assignment. The update and comparison in the assert should be done only when an entry is already in the hash. Boosts should never be sumed up.

          ah yeah - true for sure! it did not break since that only happens once when it is initially added. but you are right for sure that this should only be an assignment

          But there is also a problem with the current code in TermFreqBoostByteStart: The arrays may not use the exact same size as expected (depending how oversize/grow works). As they are parallel arrays, all should be equal size, so we should only use grow/oversize only for the base array and resize the others to same size. Do we have an ArrayUtil method for that? Currently it (may) be broken. Any comments?

          good catch man! this won't happen here but its cleaner to use the exact same size. The bigger problem is that I missed to add the right constant to the grow method though. I can fix in a minute

          Show
          Simon Willnauer added a comment - The code in BooleanQueryRewrite uses += for the boost and docFreq in the case of (>=0, no entry in BytesRefHash), but this should only be an assignment. The update and comparison in the assert should be done only when an entry is already in the hash. Boosts should never be sumed up. ah yeah - true for sure! it did not break since that only happens once when it is initially added. but you are right for sure that this should only be an assignment But there is also a problem with the current code in TermFreqBoostByteStart: The arrays may not use the exact same size as expected (depending how oversize/grow works). As they are parallel arrays, all should be equal size, so we should only use grow/oversize only for the base array and resize the others to same size. Do we have an ArrayUtil method for that? Currently it (may) be broken. Any comments? good catch man! this won't happen here but its cleaner to use the exact same size. The bigger problem is that I missed to add the right constant to the grow method though. I can fix in a minute
          Hide
          Simon Willnauer added a comment -

          Updated patch after committing LUCENE-2707

          I also fixed the constant in the grow(float[]) method.

          Show
          Simon Willnauer added a comment - Updated patch after committing LUCENE-2707 I also fixed the constant in the grow(float[]) method.
          Hide
          Uwe Schindler added a comment -

          Merged patch. I had some addition asserts and some spelling probs in comments. I will now remove the attributes hell.

          So this patch is just for review, before the big changes come

          Show
          Uwe Schindler added a comment - Merged patch. I had some addition asserts and some spelling probs in comments. I will now remove the attributes hell. So this patch is just for review, before the big changes come
          Hide
          Uwe Schindler added a comment - - edited

          This is the attributes hell patch (not yet finally done on the FuzzyTermsEnum side, Robert can you review?).

          The change is:

          • BoostAttribute is only added to the TermsEnum, because the TermsEnum produces the boost, the MTQ rewrite consumes.
          • MaxNonCompetitiveBoostAttribute is owned by the rewrite mode as it is the producer. The TermsEnum consunmes this attribute

          Fixing needs the hackish attributes() method in the Fuzzy rewrite.

          TODO: Contrib/Solr is not yet reviewed for the API change in MTQ.getTermsEnum()!

          Show
          Uwe Schindler added a comment - - edited This is the attributes hell patch (not yet finally done on the FuzzyTermsEnum side, Robert can you review?). The change is: BoostAttribute is only added to the TermsEnum, because the TermsEnum produces the boost, the MTQ rewrite consumes. MaxNonCompetitiveBoostAttribute is owned by the rewrite mode as it is the producer. The TermsEnum consunmes this attribute Fixing needs the hackish attributes() method in the Fuzzy rewrite. TODO: Contrib/Solr is not yet reviewed for the API change in MTQ.getTermsEnum()!
          Hide
          Uwe Schindler added a comment -

          small improvements. Also added missing bottomChanged() to Fuzzy ctor.

          Show
          Uwe Schindler added a comment - small improvements. Also added missing bottomChanged() to Fuzzy ctor.
          Hide
          Robert Muir added a comment -

          I will play with the latest patch some, and hopefully upload a new one.

          The real solution to this "tie-break" case really is the fact that the priority queue comparison is "compare by boost, then term text".

          With the MultiTermsEnum this was no problem, because we look at all terms in order, so we made MaxNonCompetitiveBoostAttribut just a float.

          With per-segment rewrite, then we can look at terms out-of-order.

          So I think if we add the optional term text of the pq's bottom for the previous segment to the MaxNonCompetitiveBoostAttribute itself, then the enum itself can implement the tie break, cleaner, and more efficiently. The rewrite method should or consumer should only be setting the values of this attribute and not dealing with this case.

          Show
          Robert Muir added a comment - I will play with the latest patch some, and hopefully upload a new one. The real solution to this "tie-break" case really is the fact that the priority queue comparison is "compare by boost, then term text". With the MultiTermsEnum this was no problem, because we look at all terms in order, so we made MaxNonCompetitiveBoostAttribut just a float. With per-segment rewrite, then we can look at terms out-of-order. So I think if we add the optional term text of the pq's bottom for the previous segment to the MaxNonCompetitiveBoostAttribute itself, then the enum itself can implement the tie break, cleaner, and more efficiently. The rewrite method should or consumer should only be setting the values of this attribute and not dealing with this case.
          Hide
          Uwe Schindler added a comment -

          Here the patch for Robert, which fails the tie break test. I think you can fix the tie brea case using the competitiveTerm and we are done

          Show
          Uwe Schindler added a comment - Here the patch for Robert, which fails the tie break test. I think you can fix the tie brea case using the competitiveTerm and we are done
          Hide
          Robert Muir added a comment -

          here is my patch, but the test still fails... I think it is a bug in the rewrite method's priority queue.

          it has nothing to do with maxBoostAttribute, because i can add a "if (true) return;" to FuzzyTermsEnum.bottomChanged() and the test will still always fail.

          Show
          Robert Muir added a comment - here is my patch, but the test still fails... I think it is a bug in the rewrite method's priority queue. it has nothing to do with maxBoostAttribute, because i can add a "if (true) return;" to FuzzyTermsEnum.bottomChanged() and the test will still always fail.
          Hide
          Uwe Schindler added a comment -

          Here the final patch. There is only one special case:
          Our boolean clauses sorting only works for TermQuery. But The BoostOnly fuzzy rewrite creates ConstScoreQueries for each clause, so no reordering.

          You can see this in TestMultiTermQueryRewrites with tests.verbose=true.

          Show
          Uwe Schindler added a comment - Here the final patch. There is only one special case: Our boolean clauses sorting only works for TermQuery. But The BoostOnly fuzzy rewrite creates ConstScoreQueries for each clause, so no reordering. You can see this in TestMultiTermQueryRewrites with tests.verbose=true.
          Hide
          Uwe Schindler added a comment -

          This patch contains a better BooleanClause comparator that also reorders ConstantScores that contain TermQuery.

          Maybe the ideal case would be that every query gets a method that returns the "primary" term or null. The default would be null, but TermQuery and all delegating wrappers should implement that.

          Show
          Uwe Schindler added a comment - This patch contains a better BooleanClause comparator that also reorders ConstantScores that contain TermQuery. Maybe the ideal case would be that every query gets a method that returns the "primary" term or null. The default would be null, but TermQuery and all delegating wrappers should implement that.
          Hide
          Simon Willnauer added a comment - - edited

          Just as a first result here are the results I see on my workstation with a 10 M Wikipedia index (5 segments):

          Query QPS trunk QPS LUCENE-2690 Pct diff
          unit state 3.74 3.81 1.8%
          united~0.6 10.07 10.26 1.9%
          unit* 11.89 12.65 6.5%
          united~0.7 39.29 45.52 15.9%
          un*d 15.17 27.86 83.7%

          using the latest patch.

          those are run with Xmx2G on an intel core2 3ghz

          Show
          Simon Willnauer added a comment - - edited Just as a first result here are the results I see on my workstation with a 10 M Wikipedia index (5 segments): Query QPS trunk QPS LUCENE-2690 Pct diff unit state 3.74 3.81 1.8% united~0.6 10.07 10.26 1.9% unit* 11.89 12.65 6.5% united~0.7 39.29 45.52 15.9% un*d 15.17 27.86 83.7% using the latest patch. those are run with Xmx2G on an intel core2 3ghz
          Hide
          Uwe Schindler added a comment -

          Simon: Just for comparing with Mike's results: How many segments?

          Show
          Uwe Schindler added a comment - Simon: Just for comparing with Mike's results: How many segments?
          Hide
          Uwe Schindler added a comment -

          Updated patch with optimization in ctor of FuzzyTermsEnum

          Show
          Uwe Schindler added a comment - Updated patch with optimization in ctor of FuzzyTermsEnum
          Hide
          Uwe Schindler added a comment -

          Revision of last patch (was buggy).

          About the "chicken and egg problem": Maybe AutomatonTermsEnum should throw Ex, if termComparator is not the exspected one. This would prevent people from trying automaton with other indexes?

          Show
          Uwe Schindler added a comment - Revision of last patch (was buggy). About the "chicken and egg problem": Maybe AutomatonTermsEnum should throw Ex, if termComparator is not the exspected one. This would prevent people from trying automaton with other indexes?
          Hide
          Yonik Seeley added a comment -

          Hmmm, it looks like this changes BQ.rewrite() to always rewrite/clone? Do we need that extra overhead?

          Show
          Yonik Seeley added a comment - Hmmm, it looks like this changes BQ.rewrite() to always rewrite/clone? Do we need that extra overhead?
          Hide
          Uwe Schindler added a comment -

          Hmmm, it looks like this changes BQ.rewrite() to always rewrite/clone? Do we need that extra overhead?

          ...until we find a better solution, how we reorder clauses. This has a big speed degradion for lots of MTQs if we don't reorder clauses intelligent.

          Show
          Uwe Schindler added a comment - Hmmm, it looks like this changes BQ.rewrite() to always rewrite/clone? Do we need that extra overhead? ...until we find a better solution, how we reorder clauses. This has a big speed degradion for lots of MTQs if we don't reorder clauses intelligent.
          Hide
          Yonik Seeley added a comment -

          This has a big speed degradion for lots of MTQs if we don't reorder clauses intelligent.

          Seems like the right place for sorting is in the MTQ rewrite to a BQ.
          The current patch makes BQ rewrite quite a bit more expensive... a clone is always made, and equals is always called on the clone after.

          For normal boolean queries (caused by someone typing in a few words), it seems like a real-world speedup is unlikely (since the terms would need to be in the same tii block). People generating very large boolean queries should also be able to pre-sort them and not have the overhead imposed every time.

          Show
          Yonik Seeley added a comment - This has a big speed degradion for lots of MTQs if we don't reorder clauses intelligent. Seems like the right place for sorting is in the MTQ rewrite to a BQ. The current patch makes BQ rewrite quite a bit more expensive... a clone is always made, and equals is always called on the clone after. For normal boolean queries (caused by someone typing in a few words), it seems like a real-world speedup is unlikely (since the terms would need to be in the same tii block). People generating very large boolean queries should also be able to pre-sort them and not have the overhead imposed every time.
          Hide
          Uwe Schindler added a comment -

          Yes, Mr. No-inlining-Policeman

          We are still working on this patch, its marked as TODO, so we will investigate further. For random queries it had a huge positive impact on query perf. The BQ cloning/reordering was not measureable. We did this after you left Mike's house, so it was just a quick idea.

          Show
          Uwe Schindler added a comment - Yes, Mr. No-inlining-Policeman We are still working on this patch, its marked as TODO, so we will investigate further. For random queries it had a huge positive impact on query perf. The BQ cloning/reordering was not measureable. We did this after you left Mike's house, so it was just a quick idea.
          Hide
          Yonik Seeley added a comment -

          For random queries it had a huge positive impact on query perf.

          If the clauses were just term queries, that would make me really suspect the test.
          If it was MTQ queries, then MTQ should sort, not BQ.

          The BQ cloning/reordering was not measureable.

          Right - I would expect that for typical queries and typical uses.
          I guess I'm worried about the atypical cases since I've seen so many of them - people putting together single boolean queries with 10K clauses, people doing complex nested queries with thousands of terms, or people executing thousands of queries per request (or per document added, via memory index) where this overhead suddenly becomes significant.

          We are still working on this patch, its marked as TODO, so we will investigate further.

          Cool

          Show
          Yonik Seeley added a comment - For random queries it had a huge positive impact on query perf. If the clauses were just term queries, that would make me really suspect the test. If it was MTQ queries, then MTQ should sort, not BQ. The BQ cloning/reordering was not measureable. Right - I would expect that for typical queries and typical uses. I guess I'm worried about the atypical cases since I've seen so many of them - people putting together single boolean queries with 10K clauses, people doing complex nested queries with thousands of terms, or people executing thousands of queries per request (or per document added, via memory index) where this overhead suddenly becomes significant. We are still working on this patch, its marked as TODO, so we will investigate further. Cool
          Hide
          Uwe Schindler added a comment -

          Attached is a new patch with two changes:

          • moved the BQ reordering to MTQ for now. A general reordering of BooleanQueries should be done in a separate issue (with more performant rewrite). Currently this uses the same comparator like BQ before. You may wonder: why not simply use a sorted map? - the idea is that sorting at the end is faster than using a TreeMap where all terms are compared against (even those falling out of queue). I sort the BQ clauses directly like BQ, to not create an additional array to hold all terms again. Maybe its still faster by copying all BytesRefs to an array before and then build BQ? For now this should be enough. To improve we need SorterTemplate again (for the BytesRefHash case)
          • fixed an issue with the PQ in TopTermsRewrite: The bottom information was previously only set when the PQ was overflowing. In the past and now its set once the queue is full. This was an optimization bug, its now as it was always. Maybe this explains Mike's score changes on wikipedia index?

          Mike: can you test?

          Show
          Uwe Schindler added a comment - Attached is a new patch with two changes: moved the BQ reordering to MTQ for now. A general reordering of BooleanQueries should be done in a separate issue (with more performant rewrite). Currently this uses the same comparator like BQ before. You may wonder: why not simply use a sorted map? - the idea is that sorting at the end is faster than using a TreeMap where all terms are compared against (even those falling out of queue). I sort the BQ clauses directly like BQ, to not create an additional array to hold all terms again. Maybe its still faster by copying all BytesRefs to an array before and then build BQ? For now this should be enough. To improve we need SorterTemplate again (for the BytesRefHash case) fixed an issue with the PQ in TopTermsRewrite: The bottom information was previously only set when the PQ was overflowing. In the past and now its set once the queue is full. This was an optimization bug, its now as it was always. Maybe this explains Mike's score changes on wikipedia index? Mike: can you test?
          Hide
          Uwe Schindler added a comment -

          Patch with BytesRefHash parallel array sorting instead of sorting the BQ. This should improve all cases. This patch also contains a test that this resorting works.

          It also has an assert that the docFreq is correct. This only slows down tests, but is more secure!

          Now we only need to fix contrib and Mike can check the performance (Mike: you have to update your current trunk checkout, too - so scores will compare correct).

          Show
          Uwe Schindler added a comment - Patch with BytesRefHash parallel array sorting instead of sorting the BQ. This should improve all cases. This patch also contains a test that this resorting works. It also has an assert that the docFreq is correct. This only slows down tests, but is more secure! Now we only need to fix contrib and Mike can check the performance (Mike: you have to update your current trunk checkout, too - so scores will compare correct).
          Hide
          Michael McCandless added a comment -

          Test results on 10M Wiki index:

          Single seg:

          Query QPS clean QPS mtqseg3 Pct diff
          united~0.6 26.01 25.48 -2.0%
          un*ed 260.88 258.61 -0.9%
          un*d 91.52 90.99 -0.6%
          united~0.7 98.01 97.99 -0.0%
          state 39.95 39.94 -0.0%
          unit* 33.60 33.73 0.4%
          u*d 29.87 30.01 0.5%
          uni*ed 1825.14 1859.49 1.9%

          Multi seg (22 segments):

          Query QPS clean QPS mtqseg3 Pct diff
          unit* 34.68 34.56 -0.3%
          state 40.43 40.30 -0.3%
          united~0.6 3.18 3.20 0.6%
          u*d 16.81 19.55 16.3%
          united~0.7 11.01 13.85 25.8%
          un*d 52.51 66.21 26.1%
          un*ed 42.88 92.95 116.8%
          uni*ed 175.06 543.64 210.5%

          And, the test did not barf so the hits (docID & scores) are identical!

          Show
          Michael McCandless added a comment - Test results on 10M Wiki index: Single seg: Query QPS clean QPS mtqseg3 Pct diff united~0.6 26.01 25.48 -2.0% un*ed 260.88 258.61 -0.9% un*d 91.52 90.99 -0.6% united~0.7 98.01 97.99 -0.0% state 39.95 39.94 -0.0% unit* 33.60 33.73 0.4% u*d 29.87 30.01 0.5% uni*ed 1825.14 1859.49 1.9% Multi seg (22 segments): Query QPS clean QPS mtqseg3 Pct diff unit* 34.68 34.56 -0.3% state 40.43 40.30 -0.3% united~0.6 3.18 3.20 0.6% u*d 16.81 19.55 16.3% united~0.7 11.01 13.85 25.8% un*d 52.51 66.21 26.1% un*ed 42.88 92.95 116.8% uni*ed 175.06 543.64 210.5% And, the test did not barf so the hits (docID & scores) are identical!
          Hide
          Uwe Schindler added a comment -

          Patch:

          • contrib changed and tests pass (fixed also bug in MemoryIndex TermsEnum)
          • Improved test in core for dumplicate terms, boosts, sorting
          Show
          Uwe Schindler added a comment - Patch: contrib changed and tests pass (fixed also bug in MemoryIndex TermsEnum) Improved test in core for dumplicate terms, boosts, sorting
          Hide
          Uwe Schindler added a comment -

          Final patch, will commit this soon:

          • added javadocs, changes and migration instructions
          Show
          Uwe Schindler added a comment - Final patch, will commit this soon: added javadocs, changes and migration instructions
          Hide
          Uwe Schindler added a comment -

          Committed revision: 1022934

          Show
          Uwe Schindler added a comment - Committed revision: 1022934

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development