Lucene - Core
  1. Lucene - Core
  2. LUCENE-5425

Make creation of FixedBitSet in FacetsCollector overridable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.7, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage.

      Attached patch is to allow for this allocation customizable while maintaining current behavior.

      1. facetscollector.patch
        19 kB
        John Wang
      2. facetscollector.patch
        1 kB
        John Wang
      3. fixbitset.patch
        1 kB
        John Wang
      4. LUCENE-5425.patch
        21 kB
        Shai Erera
      5. openbitsetiter.patch
        7 kB
        Lei Wang

        Activity

        Hide
        Lei Wang added a comment -

        Better not to depend on the bitset, it's better to depend on a more general interface. In the user's application, it they can get rid of the memset part of that data structure, like us in twitter, it can get 4X + performance improvements than a simple caching of the bitset.

        Show
        Lei Wang added a comment - Better not to depend on the bitset, it's better to depend on a more general interface. In the user's application, it they can get rid of the memset part of that data structure, like us in twitter, it can get 4X + performance improvements than a simple caching of the bitset.
        Hide
        Shai Erera added a comment -

        The current patch's intention seems to allow the app to cache FixedBitSet, such that it can clear() it before each search? To save the long[] allocation? This looks fine to me. Can you please add javadocs to FC.createBitSet?

        Better not to depend on the bitset, it's better to depend on a more general interface.

        We need to be careful here. Mike and I experimented with cutting the API over to a general DocIdSet, but it hurt the performance of faceted search, even with "smarter" bitsets. When we move to a DocIdSet, we add the DocIdSetIterator layer while iterating over the bits which slows down the search ... at least per luceneutil benchmarks. So if we want to do that, we should do it only after proving, by means of benchmarking, that it doesn't hurt performance severely. Would actually be good to show that more compressed bitsets maybe even improve performance, but I would go with the change if the performance-loss is marginal. And we should do it under a separate issue, to not block this one.

        Show
        Shai Erera added a comment - The current patch's intention seems to allow the app to cache FixedBitSet, such that it can clear() it before each search? To save the long[] allocation? This looks fine to me. Can you please add javadocs to FC.createBitSet? Better not to depend on the bitset, it's better to depend on a more general interface. We need to be careful here. Mike and I experimented with cutting the API over to a general DocIdSet, but it hurt the performance of faceted search, even with "smarter" bitsets. When we move to a DocIdSet, we add the DocIdSetIterator layer while iterating over the bits which slows down the search ... at least per luceneutil benchmarks. So if we want to do that, we should do it only after proving, by means of benchmarking, that it doesn't hurt performance severely. Would actually be good to show that more compressed bitsets maybe even improve performance, but I would go with the change if the performance-loss is marginal. And we should do it under a separate issue, to not block this one.
        Hide
        Lei Wang added a comment -

        agree it can be a done in a separate issue. didn't know a wrapper will affect performance that much. it's just an additional method call to me. the default OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason? anyway, start with this change won't hurt anything. and caching of the bitset should be able to get 20% down on the overhead of new a bitset each time (the other 80% is from the memset). After getting this in, i can do a separate test on the DocIdSet. See if we can get an acceptable performance for the default behavior without reusing the memory.

        Show
        Lei Wang added a comment - agree it can be a done in a separate issue. didn't know a wrapper will affect performance that much. it's just an additional method call to me. the default OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason? anyway, start with this change won't hurt anything. and caching of the bitset should be able to get 20% down on the overhead of new a bitset each time (the other 80% is from the memset). After getting this in, i can do a separate test on the DocIdSet. See if we can get an acceptable performance for the default behavior without reusing the memory.
        Hide
        Shai Erera added a comment -

        didn't know a wrapper will affect performance that much.

        Me neither! We were very surprised to see some of the performance implications of saving a method call or changing the order we iterate on results + facet requests. But it's hard to argue w/ consistent numbers, and IIRC they weren't in the 3-5% range, but 10+%. Could be though that we measured several changes at once .. so I think it's worthwhile to benchmark the move to DocIdSet.

        And yes, it could be OpenBitSetIterator, I don't rule it out.

        +1 to allow caching in this issue, we can investigate generalizing the APIs in a separate issue.

        Show
        Shai Erera added a comment - didn't know a wrapper will affect performance that much. Me neither! We were very surprised to see some of the performance implications of saving a method call or changing the order we iterate on results + facet requests. But it's hard to argue w/ consistent numbers, and IIRC they weren't in the 3-5% range, but 10+%. Could be though that we measured several changes at once .. so I think it's worthwhile to benchmark the move to DocIdSet. And yes, it could be OpenBitSetIterator, I don't rule it out. +1 to allow caching in this issue, we can investigate generalizing the APIs in a separate issue.
        Hide
        John Wang added a comment -

        The API would be so much prettier if it returns a DocIdSetIterator.
        Maybe we find out if it is indeed OpenBitSetIterator that is slow and fix that, and avoid changing this api again?

        Show
        John Wang added a comment - The API would be so much prettier if it returns a DocIdSetIterator. Maybe we find out if it is indeed OpenBitSetIterator that is slow and fix that, and avoid changing this api again?
        Hide
        Robert Muir added a comment -

        If there is really concern, can the API be fixed and just changed to do instanceof?

        Show
        Robert Muir added a comment - If there is really concern, can the API be fixed and just changed to do instanceof?
        Hide
        John Wang added a comment -

        I realize if we make getting the docs abstract, to be useful, the set side also needs to be abstract. e.g. line 127
        Let me add that to this patch.

        Show
        John Wang added a comment - I realize if we make getting the docs abstract, to be useful, the set side also needs to be abstract. e.g. line 127 Let me add that to this patch.
        Hide
        John Wang added a comment -

        new patch added:
        1) created a MutableDocIdSet class which extends DocIdSet class allows you to add to a docIdSet
        2) use this abstraction in both FacetsCollector as well as the counter/accumulator classes that use MatchDocs api

        Show
        John Wang added a comment - new patch added: 1) created a MutableDocIdSet class which extends DocIdSet class allows you to add to a docIdSet 2) use this abstraction in both FacetsCollector as well as the counter/accumulator classes that use MatchDocs api
        Hide
        Shai Erera added a comment -

        If we need to make any change to the API, it has to be a DocIdSet and not Iterator, because the iterator takes away one layer that could be useful (such as a specialized implementation which uses instanceof FixedBitSet check, as what I think Rob suggests).

        But, John, didn't we say we should explore the move to a DocIdSet-based API in a separate issue where we also benchmark the implications of using all of those abstractions (both at collection and aggregation phases)? This issue was supposed to be about letting you cache the FBS instance.

        I don't think we should commit this patch. This issue should allow you to reuse a FixedBitSet. A separate issue should benchmark the move to a more general API. I want to be sure that whatever abstractions that we add do not hurt faceted search, or at least note by how much and why they are worth it. For instance, if we move to a DocIdSet API where none of the Lucene sets improves faceted search over FixedBitSet, I don't think we should do it...

        So John, please revert back to the simple patch w/ the protected method on FacetsCollector (and on trunk-based code) so we can final review and commit it. And please open a separate issue to explore using a DocIdSet in MatchingDocs, instead of FixedBitSet. Thanks!

        Show
        Shai Erera added a comment - If we need to make any change to the API, it has to be a DocIdSet and not Iterator, because the iterator takes away one layer that could be useful (such as a specialized implementation which uses instanceof FixedBitSet check, as what I think Rob suggests). But, John, didn't we say we should explore the move to a DocIdSet-based API in a separate issue where we also benchmark the implications of using all of those abstractions (both at collection and aggregation phases)? This issue was supposed to be about letting you cache the FBS instance. I don't think we should commit this patch. This issue should allow you to reuse a FixedBitSet. A separate issue should benchmark the move to a more general API. I want to be sure that whatever abstractions that we add do not hurt faceted search, or at least note by how much and why they are worth it. For instance, if we move to a DocIdSet API where none of the Lucene sets improves faceted search over FixedBitSet, I don't think we should do it... So John, please revert back to the simple patch w/ the protected method on FacetsCollector (and on trunk-based code) so we can final review and commit it. And please open a separate issue to explore using a DocIdSet in MatchingDocs, instead of FixedBitSet. Thanks!
        Hide
        Shai Erera added a comment -

        Just to make sure I don't come across wrong - when Mike and I started the facets package overhauling a year ago, the package was full of general and abstract APIs. Really, you could do both what this issue wants to achieve and even LUCENE-5428 (IIRC). After Mike wrote a completely bypassing code-path for faceted search and noted some 250% improvements, we decided to eliminate all abstractions from the API, and re-introduce them gradually. Eventually we settled on x4 speedups.

        So I just want to make sure (both here and on LUCENE-5428) that the abstractions that we add don't impact faceted search badly, and if they do and we still want to introduce them, at least show that there's some code path where you actually improve things.

        So please don't take my comments as objections, but just being cautious and this time prefer that whatever abstraction that we add, is measured from all sides.

        Show
        Shai Erera added a comment - Just to make sure I don't come across wrong - when Mike and I started the facets package overhauling a year ago, the package was full of general and abstract APIs. Really, you could do both what this issue wants to achieve and even LUCENE-5428 (IIRC). After Mike wrote a completely bypassing code-path for faceted search and noted some 250% improvements, we decided to eliminate all abstractions from the API, and re-introduce them gradually. Eventually we settled on x4 speedups. So I just want to make sure (both here and on LUCENE-5428 ) that the abstractions that we add don't impact faceted search badly, and if they do and we still want to introduce them, at least show that there's some code path where you actually improve things. So please don't take my comments as objections, but just being cautious and this time prefer that whatever abstraction that we add, is measured from all sides.
        Hide
        Lei Wang added a comment -

        I'm not exactly sure how to run the facets benchmarks. I did a run on ant run-task -Dtask.alg=conf/facets.alg, and changed "SearchSameRdr" Search > : 40 to 40000, to get stable results.

        I'm not sure how to read the results also..., but the numbers looks quite similar between trunk and the docidset on my box.

        trunk:
        [java] ------------> Report sum by Prefix (Search) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] SearchSameRdr_40000 0 true 1 40000 7,155.64 5.59 31,610,096 51,212,288
        [java] SearchSameRdr_40000 - 1 false - - 1 - - 40000 - 8,814.46 - - 4.54 - 33,534,008 - 49,209,344
        [java] SearchSameRdr_40000 2 true 1 40000 9,088.84 4.40 35,673,136 48,373,760
        [java] SearchSameRdr_40000 - 3 false - - 1 - - 40000 - 9,045.68 - - 4.42 - 35,279,544 - 47,661,056
        [java]
        [java]
        [java] ------------> Report sum by Prefix (Populate) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] Populate 0 true 1 21578 2,489.96 8.67 31,369,696 51,212,288
        [java] Populate - - 1 false - - 1 - - 21578 - 3,973.85 - - 5.43 - 33,272,104 - 49,209,344
        [java] Populate 2 true 1 21578 4,216.92 5.12 32,701,392 48,373,760
        [java] Populate - - 3 false - - 1 - - 21578 - 4,366.25 - - 4.94 - 35,064,408 - 47,661,056
        [java]
        [java]
        [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] MAddDocs_Exhaust 0 true 1 21578 3,469.13 6.22 24,536,720 51,212,288
        [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 4,845.72 - - 4.45 - 34,857,920 - 49,209,344
        [java] MAddDocs_Exhaust 2 true 1 21578 5,129.07 4.21 29,209,256 48,373,760
        [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,259.08 - - 4.10 - 25,845,424 - 47,661,056

        With the patch, but I changed the OpenBitSet to FixedBitSet, and use bits.iterator() to return iterator (It's still an OpenBitSetIterator), the result:
        [java] ------------> Report sum by Prefix (Search) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] SearchSameRdr_40000 0 true 1 40000 7,280.67 5.49 25,424,104 51,113,984
        [java] SearchSameRdr_40000 - 1 false - - 1 - - 40000 - 8,689.98 - - 4.60 - 31,356,960 - 49,053,696
        [java] SearchSameRdr_40000 2 true 1 40000 9,157.51 4.37 38,849,248 47,632,384
        [java] SearchSameRdr_40000 - 3 false - - 1 - - 40000 - 9,097.11 - - 4.40 - 39,840,912 - 46,465,024
        [java]
        [java]
        [java] ------------> Report sum by Prefix (Populate) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] Populate 0 true 1 21578 2,465.21 8.75 25,187,152 51,113,984
        [java] Populate - - 1 false - - 1 - - 21578 - 2,651.19 - - 8.14 - 30,985,904 - 49,053,696
        [java] Populate 2 true 1 21578 4,247.64 5.08 38,656,320 47,632,384
        [java] Populate - - 3 false - - 1 - - 21578 - 4,298.41 - - 5.02 - 39,355,912 - 46,465,024
        [java]
        [java]
        [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42)
        [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem
        [java] MAddDocs_Exhaust 0 true 1 21578 3,404.01 6.34 34,015,968 51,113,984
        [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 3,062.88 - - 7.05 - 30,420,848 - 49,053,696
        [java] MAddDocs_Exhaust 2 true 1 21578 5,147.42 4.19 28,833,976 47,632,384
        [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,129.07 - - 4.21 - 37,117,288 - 46,465,024

        Show
        Lei Wang added a comment - I'm not exactly sure how to run the facets benchmarks. I did a run on ant run-task -Dtask.alg=conf/facets.alg, and changed "SearchSameRdr" Search > : 40 to 40000, to get stable results. I'm not sure how to read the results also..., but the numbers looks quite similar between trunk and the docidset on my box. trunk: [java] ------------> Report sum by Prefix (Search) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] SearchSameRdr_40000 0 true 1 40000 7,155.64 5.59 31,610,096 51,212,288 [java] SearchSameRdr_40000 - 1 false - - 1 - - 40000 - 8,814.46 - - 4.54 - 33,534,008 - 49,209,344 [java] SearchSameRdr_40000 2 true 1 40000 9,088.84 4.40 35,673,136 48,373,760 [java] SearchSameRdr_40000 - 3 false - - 1 - - 40000 - 9,045.68 - - 4.42 - 35,279,544 - 47,661,056 [java] [java] [java] ------------> Report sum by Prefix (Populate) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] Populate 0 true 1 21578 2,489.96 8.67 31,369,696 51,212,288 [java] Populate - - 1 false - - 1 - - 21578 - 3,973.85 - - 5.43 - 33,272,104 - 49,209,344 [java] Populate 2 true 1 21578 4,216.92 5.12 32,701,392 48,373,760 [java] Populate - - 3 false - - 1 - - 21578 - 4,366.25 - - 4.94 - 35,064,408 - 47,661,056 [java] [java] [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_Exhaust 0 true 1 21578 3,469.13 6.22 24,536,720 51,212,288 [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 4,845.72 - - 4.45 - 34,857,920 - 49,209,344 [java] MAddDocs_Exhaust 2 true 1 21578 5,129.07 4.21 29,209,256 48,373,760 [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,259.08 - - 4.10 - 25,845,424 - 47,661,056 With the patch, but I changed the OpenBitSet to FixedBitSet, and use bits.iterator() to return iterator (It's still an OpenBitSetIterator), the result: [java] ------------> Report sum by Prefix (Search) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] SearchSameRdr_40000 0 true 1 40000 7,280.67 5.49 25,424,104 51,113,984 [java] SearchSameRdr_40000 - 1 false - - 1 - - 40000 - 8,689.98 - - 4.60 - 31,356,960 - 49,053,696 [java] SearchSameRdr_40000 2 true 1 40000 9,157.51 4.37 38,849,248 47,632,384 [java] SearchSameRdr_40000 - 3 false - - 1 - - 40000 - 9,097.11 - - 4.40 - 39,840,912 - 46,465,024 [java] [java] [java] ------------> Report sum by Prefix (Populate) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] Populate 0 true 1 21578 2,465.21 8.75 25,187,152 51,113,984 [java] Populate - - 1 false - - 1 - - 21578 - 2,651.19 - - 8.14 - 30,985,904 - 49,053,696 [java] Populate 2 true 1 21578 4,247.64 5.08 38,656,320 47,632,384 [java] Populate - - 3 false - - 1 - - 21578 - 4,298.41 - - 5.02 - 39,355,912 - 46,465,024 [java] [java] [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_Exhaust 0 true 1 21578 3,404.01 6.34 34,015,968 51,113,984 [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 3,062.88 - - 7.05 - 30,420,848 - 49,053,696 [java] MAddDocs_Exhaust 2 true 1 21578 5,147.42 4.19 28,833,976 47,632,384 [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,129.07 - - 4.21 - 37,117,288 - 46,465,024
        Hide
        Lei Wang added a comment -

        And, the idea of keep the DocIdSet a raw DocIdSet is quite good. For the collecting part, we can have a wrapper, which can return a DocIdSet after the collecting is done.

        Show
        Lei Wang added a comment - And, the idea of keep the DocIdSet a raw DocIdSet is quite good. For the collecting part, we can have a wrapper, which can return a DocIdSet after the collecting is done.
        Hide
        John Wang added a comment -

        Thanks Lei!

        Shai, given that using OBS Iterator performance is the same as trunk, do you think we should move forward and change to the DocIdSet api or do you still think we should split up as 2 different issues.

        Thanks and please advise.

        -John

        Show
        John Wang added a comment - Thanks Lei! Shai, given that using OBS Iterator performance is the same as trunk, do you think we should move forward and change to the DocIdSet api or do you still think we should split up as 2 different issues. Thanks and please advise. -John
        Hide
        Shai Erera added a comment -

        The benchmarks that we ran were with luceneutil: http://code.google.com/a/apache-extras.org/p/luceneutil/. Actually, Mike did all the running on his beast machine, so perhaps he can share the command to execute in order to run the benchmarks. The results that you get are more clear. Separately, would be good if we had a single benchmarking package, whether it's luceneutil or lucene/benchmark...

        Show
        Shai Erera added a comment - The benchmarks that we ran were with luceneutil: http://code.google.com/a/apache-extras.org/p/luceneutil/ . Actually, Mike did all the running on his beast machine, so perhaps he can share the command to execute in order to run the benchmarks. The results that you get are more clear. Separately, would be good if we had a single benchmarking package, whether it's luceneutil or lucene/benchmark...
        Hide
        Shai Erera added a comment - - edited

        Shai, given that using OBS Iterator performance is the same as trunk, do you think we should move forward and change to the DocIdSet api or do you still think we should split up as 2 different issues.

        Honestly, I don't think the lucene/benchmark package is as good as luceneutil, especially not when it comes to queries. luceneutil tries different queries, combining low/high freq terms, span queries and what not, and so the results that we get there are more stable and reflect the benefits/drawback of the change across a wider spectrum. Lei, I apologize for not mentioning that before, if it wasted your time. It seemed obvious to me .

        But the cut to DocIdSet is bigger than just allowing to reuse FixedBitSet, so why not split the issue to two separate ones anyway? Like, if it turns out that DocIdSet as an abstraction is not bad, but the iterator is, we can always do an instanceof check and execute the more optimized code. So let's just explore all that in a separate issue? Or, if you prefer to do it all in one issue, then we should get luceneutil's blessing here .

        Show
        Shai Erera added a comment - - edited Shai, given that using OBS Iterator performance is the same as trunk, do you think we should move forward and change to the DocIdSet api or do you still think we should split up as 2 different issues. Honestly, I don't think the lucene/benchmark package is as good as luceneutil, especially not when it comes to queries. luceneutil tries different queries, combining low/high freq terms, span queries and what not, and so the results that we get there are more stable and reflect the benefits/drawback of the change across a wider spectrum. Lei, I apologize for not mentioning that before, if it wasted your time. It seemed obvious to me . But the cut to DocIdSet is bigger than just allowing to reuse FixedBitSet, so why not split the issue to two separate ones anyway? Like, if it turns out that DocIdSet as an abstraction is not bad, but the iterator is, we can always do an instanceof check and execute the more optimized code. So let's just explore all that in a separate issue? Or, if you prefer to do it all in one issue, then we should get luceneutil's blessing here .
        Hide
        John Wang added a comment -

        Ok. I will revert to the previous patch applied to trunk and we can create a separate issue to track cutting over to DocIdSet and reference this ticket.

        Show
        John Wang added a comment - Ok. I will revert to the previous patch applied to trunk and we can create a separate issue to track cutting over to DocIdSet and reference this ticket.
        Hide
        John Wang added a comment -

        Reverted patch attached.

        Show
        John Wang added a comment - Reverted patch attached.
        Hide
        Shai Erera added a comment -

        Looks good. BTW "newHitSet" – is that intentional or a typo (i.e. should have been newBitSet)? I like newHitSet .

        Show
        Shai Erera added a comment - Looks good. BTW "newHitSet" – is that intentional or a typo (i.e. should have been newBitSet)? I like newHitSet .
        Hide
        John Wang added a comment -

        I named it to newHitSet in case we do decide to move to an abstract doc set, we wouldn't need to (and forget to) change the method name.

        Show
        John Wang added a comment - I named it to newHitSet in case we do decide to move to an abstract doc set, we wouldn't need to (and forget to) change the method name.
        Hide
        Lei Wang added a comment -

        tried with the lucenutil, but got some problem. I cannot get same numbers for two identical code of trunk. even if they are all trunks, i get different numbers:
        Report after iter 19:
        TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
        OrHighMed 74.15 (7.1%) 71.24 (8.3%) -3.9% ( -18% - 12%)
        LowTerm 515.68 (15.1%) 496.20 (12.3%) -3.8% ( -27% - 27%)
        OrNotHighLow 72.22 (8.2%) 70.36 (7.6%) -2.6% ( -17% - 14%)
        OrNotHighMed 79.01 (7.3%) 77.43 (8.4%) -2.0% ( -16% - 14%)
        OrHighNotHigh 38.66 (4.5%) 37.90 (6.4%) -2.0% ( -12% - 9%)
        Respell 51.21 (7.1%) 50.23 (6.5%) -1.9% ( -14% - 12%)
        MedPhrase 69.67 (7.5%) 68.35 (7.4%) -1.9% ( -15% - 14%)
        OrHighLow 67.24 (7.8%) 66.00 (9.0%) -1.8% ( -17% - 16%)
        Fuzzy1 27.37 (5.7%) 26.96 (5.5%) -1.5% ( -11% - 10%)
        Fuzzy2 37.21 (3.8%) 36.71 (5.6%) -1.3% ( -10% - 8%)
        MedSloppyPhrase 9.94 (5.4%) 9.83 (3.9%) -1.1% ( -9% - 8%)
        LowSpanNear 8.60 (3.9%) 8.54 (3.8%) -0.7% ( -8% - 7%)
        AndHighHigh 40.23 (3.1%) 40.03 (2.5%) -0.5% ( -5% - 5%)
        HighTerm 76.07 (9.0%) 75.96 (9.1%) -0.2% ( -16% - 19%)
        OrHighHigh 11.62 (3.0%) 11.62 (4.8%) -0.1% ( -7% - 7%)
        IntNRQ 9.51 (3.9%) 9.51 (8.3%) 0.0% ( -11% - 12%)
        HighPhrase 25.61 (7.0%) 25.63 (7.7%) 0.1% ( -13% - 15%)
        LowSloppyPhrase 30.21 (5.2%) 30.24 (4.3%) 0.1% ( -8% - 10%)
        PKLookup 212.03 (9.0%) 212.25 (11.5%) 0.1% ( -18% - 22%)
        OrNotHighHigh 27.75 (3.5%) 27.80 (6.5%) 0.2% ( -9% - 10%)
        OrHighNotMed 58.14 (5.9%) 58.27 (8.3%) 0.2% ( -13% - 15%)
        MedSpanNear 22.73 (3.7%) 22.80 (5.1%) 0.3% ( -8% - 9%)
        Wildcard 42.84 (5.0%) 42.97 (5.4%) 0.3% ( -9% - 11%)
        HighSloppyPhrase 23.99 (7.4%) 24.08 (6.3%) 0.4% ( -12% - 15%)
        AndHighLow 625.62 (6.6%) 629.52 (10.5%) 0.6% ( -15% - 18%)
        Prefix3 77.68 (7.2%) 78.17 (6.2%) 0.6% ( -11% - 15%)
        LowPhrase 14.58 (4.7%) 14.77 (5.0%) 1.3% ( -8% - 11%)
        HighSpanNear 11.84 (4.3%) 11.99 (5.2%) 1.3% ( -7% - 11%)
        OrHighNotLow 66.04 (8.4%) 67.28 (9.2%) 1.9% ( -14% - 21%)
        AndHighMed 66.55 (4.3%) 67.91 (6.2%) 2.1% ( -8% - 13%)
        MedTerm 139.78 (9.5%) 145.63 (10.3%) 4.2% ( -14% - 26%)

        with the patch, the numbers are also different, but no bigger difference than the trunk-trunk numbers:
        Report after iter 19:
        TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
        AndHighLow 730.30 (11.5%) 700.95 (10.6%) -4.0% ( -23% - 20%)
        LowTerm 520.94 (10.6%) 504.25 (11.4%) -3.2% ( -22% - 21%)
        Fuzzy1 57.55 (5.1%) 56.26 (4.8%) -2.2% ( -11% - 8%)
        Respell 35.85 (4.7%) 35.18 (4.1%) -1.9% ( -10% - 7%)
        OrHighNotHigh 37.77 (7.3%) 37.19 (5.9%) -1.5% ( -13% - 12%)
        HighSloppyPhrase 12.30 (7.5%) 12.17 (7.7%) -1.1% ( -15% - 15%)
        HighPhrase 29.38 (5.2%) 29.06 (4.3%) -1.1% ( -10% - 8%)
        OrNotHighMed 25.93 (6.2%) 25.68 (5.5%) -1.0% ( -11% - 11%)
        OrNotHighHigh 19.72 (5.9%) 19.53 (4.9%) -0.9% ( -11% - 10%)
        Fuzzy2 11.30 (3.6%) 11.24 (5.1%) -0.6% ( -8% - 8%)
        PKLookup 218.16 (8.6%) 217.53 (9.3%) -0.3% ( -16% - 19%)
        LowSloppyPhrase 43.09 (5.6%) 43.00 (3.5%) -0.2% ( -8% - 9%)
        MedSpanNear 30.65 (4.4%) 30.60 (3.2%) -0.1% ( -7% - 7%)
        MedSloppyPhrase 21.71 (5.7%) 21.70 (3.8%) -0.0% ( -8% - 9%)
        Wildcard 14.67 (3.3%) 14.67 (2.6%) -0.0% ( -5% - 6%)
        HighSpanNear 0.64 (4.6%) 0.64 (5.0%) 0.1% ( -9% - 10%)
        LowPhrase 21.05 (5.6%) 21.09 (7.6%) 0.2% ( -12% - 14%)
        AndHighMed 175.53 (7.2%) 176.00 (8.2%) 0.3% ( -14% - 16%)
        Prefix3 31.24 (3.3%) 31.37 (2.7%) 0.4% ( -5% - 6%)
        OrNotHighLow 76.32 (6.3%) 76.80 (7.7%) 0.6% ( -12% - 15%)
        OrHighHigh 33.43 (6.4%) 33.65 (7.6%) 0.7% ( -12% - 15%)
        AndHighHigh 35.51 (3.1%) 35.76 (3.1%) 0.7% ( -5% - 7%)
        IntNRQ 9.36 (4.4%) 9.43 (3.7%) 0.7% ( -7% - 9%)
        HighTerm 90.42 (7.0%) 91.40 (5.3%) 1.1% ( -10% - 14%)
        OrHighLow 71.32 (8.6%) 72.13 (8.2%) 1.1% ( -14% - 19%)
        LowSpanNear 107.82 (6.8%) 109.19 (5.9%) 1.3% ( -10% - 14%)
        OrHighMed 45.43 (8.8%) 46.09 (8.8%) 1.5% ( -14% - 20%)
        MedTerm 139.24 (7.0%) 141.28 (8.4%) 1.5% ( -13% - 18%)
        MedPhrase 96.51 (5.0%) 98.10 (6.8%) 1.6% ( -9% - 14%)
        OrHighNotMed 50.88 (6.9%) 52.13 (8.3%) 2.5% ( -11% - 18%)
        OrHighNotLow 65.31 (8.9%) 67.16 (8.8%) 2.8% ( -13% - 22%)

        Btw, I copied the facet config from the nightly py, and the index looks like:
        index = comp.newIndex('trunk', WIKI_MEDIUM_10M, facets = (('Date',),), facetDVFormat='Direct')

        Show
        Lei Wang added a comment - tried with the lucenutil, but got some problem. I cannot get same numbers for two identical code of trunk. even if they are all trunks, i get different numbers: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff OrHighMed 74.15 (7.1%) 71.24 (8.3%) -3.9% ( -18% - 12%) LowTerm 515.68 (15.1%) 496.20 (12.3%) -3.8% ( -27% - 27%) OrNotHighLow 72.22 (8.2%) 70.36 (7.6%) -2.6% ( -17% - 14%) OrNotHighMed 79.01 (7.3%) 77.43 (8.4%) -2.0% ( -16% - 14%) OrHighNotHigh 38.66 (4.5%) 37.90 (6.4%) -2.0% ( -12% - 9%) Respell 51.21 (7.1%) 50.23 (6.5%) -1.9% ( -14% - 12%) MedPhrase 69.67 (7.5%) 68.35 (7.4%) -1.9% ( -15% - 14%) OrHighLow 67.24 (7.8%) 66.00 (9.0%) -1.8% ( -17% - 16%) Fuzzy1 27.37 (5.7%) 26.96 (5.5%) -1.5% ( -11% - 10%) Fuzzy2 37.21 (3.8%) 36.71 (5.6%) -1.3% ( -10% - 8%) MedSloppyPhrase 9.94 (5.4%) 9.83 (3.9%) -1.1% ( -9% - 8%) LowSpanNear 8.60 (3.9%) 8.54 (3.8%) -0.7% ( -8% - 7%) AndHighHigh 40.23 (3.1%) 40.03 (2.5%) -0.5% ( -5% - 5%) HighTerm 76.07 (9.0%) 75.96 (9.1%) -0.2% ( -16% - 19%) OrHighHigh 11.62 (3.0%) 11.62 (4.8%) -0.1% ( -7% - 7%) IntNRQ 9.51 (3.9%) 9.51 (8.3%) 0.0% ( -11% - 12%) HighPhrase 25.61 (7.0%) 25.63 (7.7%) 0.1% ( -13% - 15%) LowSloppyPhrase 30.21 (5.2%) 30.24 (4.3%) 0.1% ( -8% - 10%) PKLookup 212.03 (9.0%) 212.25 (11.5%) 0.1% ( -18% - 22%) OrNotHighHigh 27.75 (3.5%) 27.80 (6.5%) 0.2% ( -9% - 10%) OrHighNotMed 58.14 (5.9%) 58.27 (8.3%) 0.2% ( -13% - 15%) MedSpanNear 22.73 (3.7%) 22.80 (5.1%) 0.3% ( -8% - 9%) Wildcard 42.84 (5.0%) 42.97 (5.4%) 0.3% ( -9% - 11%) HighSloppyPhrase 23.99 (7.4%) 24.08 (6.3%) 0.4% ( -12% - 15%) AndHighLow 625.62 (6.6%) 629.52 (10.5%) 0.6% ( -15% - 18%) Prefix3 77.68 (7.2%) 78.17 (6.2%) 0.6% ( -11% - 15%) LowPhrase 14.58 (4.7%) 14.77 (5.0%) 1.3% ( -8% - 11%) HighSpanNear 11.84 (4.3%) 11.99 (5.2%) 1.3% ( -7% - 11%) OrHighNotLow 66.04 (8.4%) 67.28 (9.2%) 1.9% ( -14% - 21%) AndHighMed 66.55 (4.3%) 67.91 (6.2%) 2.1% ( -8% - 13%) MedTerm 139.78 (9.5%) 145.63 (10.3%) 4.2% ( -14% - 26%) with the patch, the numbers are also different, but no bigger difference than the trunk-trunk numbers: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff AndHighLow 730.30 (11.5%) 700.95 (10.6%) -4.0% ( -23% - 20%) LowTerm 520.94 (10.6%) 504.25 (11.4%) -3.2% ( -22% - 21%) Fuzzy1 57.55 (5.1%) 56.26 (4.8%) -2.2% ( -11% - 8%) Respell 35.85 (4.7%) 35.18 (4.1%) -1.9% ( -10% - 7%) OrHighNotHigh 37.77 (7.3%) 37.19 (5.9%) -1.5% ( -13% - 12%) HighSloppyPhrase 12.30 (7.5%) 12.17 (7.7%) -1.1% ( -15% - 15%) HighPhrase 29.38 (5.2%) 29.06 (4.3%) -1.1% ( -10% - 8%) OrNotHighMed 25.93 (6.2%) 25.68 (5.5%) -1.0% ( -11% - 11%) OrNotHighHigh 19.72 (5.9%) 19.53 (4.9%) -0.9% ( -11% - 10%) Fuzzy2 11.30 (3.6%) 11.24 (5.1%) -0.6% ( -8% - 8%) PKLookup 218.16 (8.6%) 217.53 (9.3%) -0.3% ( -16% - 19%) LowSloppyPhrase 43.09 (5.6%) 43.00 (3.5%) -0.2% ( -8% - 9%) MedSpanNear 30.65 (4.4%) 30.60 (3.2%) -0.1% ( -7% - 7%) MedSloppyPhrase 21.71 (5.7%) 21.70 (3.8%) -0.0% ( -8% - 9%) Wildcard 14.67 (3.3%) 14.67 (2.6%) -0.0% ( -5% - 6%) HighSpanNear 0.64 (4.6%) 0.64 (5.0%) 0.1% ( -9% - 10%) LowPhrase 21.05 (5.6%) 21.09 (7.6%) 0.2% ( -12% - 14%) AndHighMed 175.53 (7.2%) 176.00 (8.2%) 0.3% ( -14% - 16%) Prefix3 31.24 (3.3%) 31.37 (2.7%) 0.4% ( -5% - 6%) OrNotHighLow 76.32 (6.3%) 76.80 (7.7%) 0.6% ( -12% - 15%) OrHighHigh 33.43 (6.4%) 33.65 (7.6%) 0.7% ( -12% - 15%) AndHighHigh 35.51 (3.1%) 35.76 (3.1%) 0.7% ( -5% - 7%) IntNRQ 9.36 (4.4%) 9.43 (3.7%) 0.7% ( -7% - 9%) HighTerm 90.42 (7.0%) 91.40 (5.3%) 1.1% ( -10% - 14%) OrHighLow 71.32 (8.6%) 72.13 (8.2%) 1.1% ( -14% - 19%) LowSpanNear 107.82 (6.8%) 109.19 (5.9%) 1.3% ( -10% - 14%) OrHighMed 45.43 (8.8%) 46.09 (8.8%) 1.5% ( -14% - 20%) MedTerm 139.24 (7.0%) 141.28 (8.4%) 1.5% ( -13% - 18%) MedPhrase 96.51 (5.0%) 98.10 (6.8%) 1.6% ( -9% - 14%) OrHighNotMed 50.88 (6.9%) 52.13 (8.3%) 2.5% ( -11% - 18%) OrHighNotLow 65.31 (8.9%) 67.16 (8.8%) 2.8% ( -13% - 22%) Btw, I copied the facet config from the nightly py, and the index looks like: index = comp.newIndex('trunk', WIKI_MEDIUM_10M, facets = (('Date',),), facetDVFormat='Direct')
        Hide
        Michael McCandless added a comment -

        Hi Lei,

        That's great that you managed to get luceneutil up and running ... it's not easy.

        Are you passing a randomSeed=N to your Competition? When you do that, it means the exact same queries are tested each time you run the perf test; if not, it randomly picks a query from the category, and e.g. a category like HighTerm actually has a wide range of queries in it ... so maybe this explains the absolute QPS diffs you see between runs?

        Separately, it's good to open the results files (e.g. /path/to/lucene/benchmark/comp.base.N) to confirm you actually see Date facet counts inside there.

        I'll also test the 2nd to last patch w/ Date faceting.

        Show
        Michael McCandless added a comment - Hi Lei, That's great that you managed to get luceneutil up and running ... it's not easy. Are you passing a randomSeed=N to your Competition? When you do that, it means the exact same queries are tested each time you run the perf test; if not, it randomly picks a query from the category, and e.g. a category like HighTerm actually has a wide range of queries in it ... so maybe this explains the absolute QPS diffs you see between runs? Separately, it's good to open the results files (e.g. /path/to/lucene/benchmark/comp.base.N) to confirm you actually see Date facet counts inside there. I'll also test the 2nd to last patch w/ Date faceting.
        Hide
        Michael McCandless added a comment -

        Also, one tip when posting the results: wrap them inside a {noformat} ... {noformat}, which preserves the monospace formatting.

        Show
        Michael McCandless added a comment - Also, one tip when posting the results: wrap them inside a {noformat} ... {noformat}, which preserves the monospace formatting.
        Hide
        Michael McCandless added a comment -

        I compared trunk to the 2nd patch here, on wikimediumall (33.3 M docs), Date faceting, using DirectDVFormat for facets:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                        HighTerm       21.28      (2.8%)       18.96      (1.2%)  -10.9% ( -14% -   -7%)
                         LowTerm       83.98      (3.6%)       75.31      (1.3%)  -10.3% ( -14% -   -5%)
                         MedTerm       30.96      (2.6%)       27.86      (1.2%)  -10.0% ( -13% -   -6%)
                    OrHighNotLow       12.47      (3.8%)       11.23      (3.6%)  -10.0% ( -16% -   -2%)
                       OrHighMed       13.39      (3.6%)       12.07      (3.4%)   -9.9% ( -16% -   -2%)
                       OrHighLow        9.94      (4.0%)        8.97      (3.7%)   -9.8% ( -16% -   -2%)
                    OrHighNotMed       15.52      (3.4%)       14.13      (3.3%)   -8.9% ( -15% -   -2%)
                   OrHighNotHigh        7.16      (4.4%)        6.58      (3.8%)   -8.1% ( -15% -    0%)
                      OrHighHigh        4.81      (4.3%)        4.42      (3.9%)   -8.1% ( -15% -    0%)
                   OrNotHighHigh        5.83      (4.7%)        5.49      (4.3%)   -5.9% ( -14% -    3%)
                      AndHighLow      292.07      (1.5%)      274.97      (2.2%)   -5.9% (  -9% -   -2%)
                       MedPhrase      143.35      (4.7%)      135.01      (4.5%)   -5.8% ( -14% -    3%)
                    HighSpanNear        6.52      (4.8%)        6.23      (4.2%)   -4.5% ( -12% -    4%)
                      HighPhrase        3.57      (5.9%)        3.42      (5.8%)   -4.4% ( -15% -    7%)
                     MedSpanNear       26.30      (3.1%)       25.55      (2.7%)   -2.8% (  -8% -    3%)
                      AndHighMed       29.54      (1.6%)       28.81      (1.5%)   -2.5% (  -5% -    0%)
                     AndHighHigh       23.98      (1.5%)       23.41      (1.4%)   -2.4% (  -5% -    0%)
                    OrNotHighMed       18.00      (5.6%)       17.59      (4.5%)   -2.3% ( -11% -    8%)
                 LowSloppyPhrase       37.65      (1.9%)       36.84      (1.6%)   -2.2% (  -5% -    1%)
                       LowPhrase       11.98      (2.0%)       11.76      (2.3%)   -1.8% (  -6% -    2%)
                     LowSpanNear        9.57      (3.0%)        9.39      (2.5%)   -1.8% (  -7% -    3%)
                         Prefix3       75.35      (1.4%)       74.68      (2.3%)   -0.9% (  -4% -    2%)
                HighSloppyPhrase        3.13      (6.9%)        3.11      (6.3%)   -0.9% ( -13% -   13%)
                          IntNRQ        4.12      (2.6%)        4.08      (4.2%)   -0.8% (  -7% -    6%)
                 MedSloppyPhrase        3.25      (3.9%)        3.22      (3.7%)   -0.7% (  -8% -    7%)
                        Wildcard       17.39      (2.8%)       17.39      (2.9%)    0.0% (  -5% -    5%)
                    OrNotHighLow       23.17      (6.9%)       23.35      (5.8%)    0.8% ( -11% -   14%)
                          Fuzzy1       63.79      (1.8%)       64.51      (1.8%)    1.1% (  -2% -    4%)
                          Fuzzy2       44.03      (2.1%)       44.71      (2.1%)    1.5% (  -2% -    5%)
                         Respell       46.73      (2.9%)       47.46      (2.8%)    1.6% (  -3% -    7%)
        

        Looks like there is some penalty for the added abstraction ... but I agree w/ Rob: we can just have the common-case Facets impl (FastTaxonomyFacetCounts) specialize for the normal case when it's a FixedBitSet we are iterating over ...

        Show
        Michael McCandless added a comment - I compared trunk to the 2nd patch here, on wikimediumall (33.3 M docs), Date faceting, using DirectDVFormat for facets: Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 21.28 (2.8%) 18.96 (1.2%) -10.9% ( -14% - -7%) LowTerm 83.98 (3.6%) 75.31 (1.3%) -10.3% ( -14% - -5%) MedTerm 30.96 (2.6%) 27.86 (1.2%) -10.0% ( -13% - -6%) OrHighNotLow 12.47 (3.8%) 11.23 (3.6%) -10.0% ( -16% - -2%) OrHighMed 13.39 (3.6%) 12.07 (3.4%) -9.9% ( -16% - -2%) OrHighLow 9.94 (4.0%) 8.97 (3.7%) -9.8% ( -16% - -2%) OrHighNotMed 15.52 (3.4%) 14.13 (3.3%) -8.9% ( -15% - -2%) OrHighNotHigh 7.16 (4.4%) 6.58 (3.8%) -8.1% ( -15% - 0%) OrHighHigh 4.81 (4.3%) 4.42 (3.9%) -8.1% ( -15% - 0%) OrNotHighHigh 5.83 (4.7%) 5.49 (4.3%) -5.9% ( -14% - 3%) AndHighLow 292.07 (1.5%) 274.97 (2.2%) -5.9% ( -9% - -2%) MedPhrase 143.35 (4.7%) 135.01 (4.5%) -5.8% ( -14% - 3%) HighSpanNear 6.52 (4.8%) 6.23 (4.2%) -4.5% ( -12% - 4%) HighPhrase 3.57 (5.9%) 3.42 (5.8%) -4.4% ( -15% - 7%) MedSpanNear 26.30 (3.1%) 25.55 (2.7%) -2.8% ( -8% - 3%) AndHighMed 29.54 (1.6%) 28.81 (1.5%) -2.5% ( -5% - 0%) AndHighHigh 23.98 (1.5%) 23.41 (1.4%) -2.4% ( -5% - 0%) OrNotHighMed 18.00 (5.6%) 17.59 (4.5%) -2.3% ( -11% - 8%) LowSloppyPhrase 37.65 (1.9%) 36.84 (1.6%) -2.2% ( -5% - 1%) LowPhrase 11.98 (2.0%) 11.76 (2.3%) -1.8% ( -6% - 2%) LowSpanNear 9.57 (3.0%) 9.39 (2.5%) -1.8% ( -7% - 3%) Prefix3 75.35 (1.4%) 74.68 (2.3%) -0.9% ( -4% - 2%) HighSloppyPhrase 3.13 (6.9%) 3.11 (6.3%) -0.9% ( -13% - 13%) IntNRQ 4.12 (2.6%) 4.08 (4.2%) -0.8% ( -7% - 6%) MedSloppyPhrase 3.25 (3.9%) 3.22 (3.7%) -0.7% ( -8% - 7%) Wildcard 17.39 (2.8%) 17.39 (2.9%) 0.0% ( -5% - 5%) OrNotHighLow 23.17 (6.9%) 23.35 (5.8%) 0.8% ( -11% - 14%) Fuzzy1 63.79 (1.8%) 64.51 (1.8%) 1.1% ( -2% - 4%) Fuzzy2 44.03 (2.1%) 44.71 (2.1%) 1.5% ( -2% - 5%) Respell 46.73 (2.9%) 47.46 (2.8%) 1.6% ( -3% - 7%) Looks like there is some penalty for the added abstraction ... but I agree w/ Rob: we can just have the common-case Facets impl (FastTaxonomyFacetCounts) specialize for the normal case when it's a FixedBitSet we are iterating over ...
        Hide
        Shai Erera added a comment -

        I think we shouldn't rush into changing this. Would be good if we can show that there's another DocIdSet which actually improves performance for some cases. Otherwise, we're generalizing the API for no good reason. That's also what we saw on LUCENE-5316: small degradation w/ generalization, but actually all data structures that we tried performed worse, so why generalize in the first place?

        Show
        Shai Erera added a comment - I think we shouldn't rush into changing this. Would be good if we can show that there's another DocIdSet which actually improves performance for some cases. Otherwise, we're generalizing the API for no good reason. That's also what we saw on LUCENE-5316 : small degradation w/ generalization, but actually all data structures that we tried performed worse, so why generalize in the first place?
        Hide
        John Wang added a comment -

        Hi Shai:

        Our motivation for generalization also comes from performance: Large index size (e.g. large maxDoc) and very high query throughput. Creating a BitSet of maxDoc bits per query is not feasible. So being able to cache in a thread local would work better (num threads is low due to async nature of our system), which is what the first patch is about.

        If we were to reuse the bitset, we would have to zero out the bits before reuse. We found given our doc size, zeroing out the bitset is expensive, specifically, 4x that of creating the bits in the first place.

        So if we can abstract this out, we can plugin more optimized implementations. We can put together a diciest impl that would perform much better than FixedBitSet.

        Having said this, I am happy to break this into multiple patches/issues.

        Thanks

        -John

        Show
        John Wang added a comment - Hi Shai: Our motivation for generalization also comes from performance: Large index size (e.g. large maxDoc) and very high query throughput. Creating a BitSet of maxDoc bits per query is not feasible. So being able to cache in a thread local would work better (num threads is low due to async nature of our system), which is what the first patch is about. If we were to reuse the bitset, we would have to zero out the bits before reuse. We found given our doc size, zeroing out the bitset is expensive, specifically, 4x that of creating the bits in the first place. So if we can abstract this out, we can plugin more optimized implementations. We can put together a diciest impl that would perform much better than FixedBitSet. Having said this, I am happy to break this into multiple patches/issues. Thanks -John
        Hide
        Shai Erera added a comment -

        I understand John. I wonder, if you can use such a specialized DocIdSet, and are worried about performance, wouldn't you want to save the iterator() overhead too, and iterate on the structure directly? If so, generalizing the API won't help you right?

        Show
        Shai Erera added a comment - I understand John. I wonder, if you can use such a specialized DocIdSet, and are worried about performance, wouldn't you want to save the iterator() overhead too, and iterate on the structure directly? If so, generalizing the API won't help you right?
        Hide
        Lei Wang added a comment -

        The overhead may not from the additional method call, might be the openbitset, the impl in the fixedbitset is more friendly to branching predict and instructions prefetching.

        But I think we can simply apply the reuse part now first. I will start run some more tests later on when i get time. I actually have an idea now to apply the optimization in general which I thought might only be applied in our case previously. will also test that out, see if we can improve this overall.

        Show
        Lei Wang added a comment - The overhead may not from the additional method call, might be the openbitset, the impl in the fixedbitset is more friendly to branching predict and instructions prefetching. But I think we can simply apply the reuse part now first. I will start run some more tests later on when i get time. I actually have an idea now to apply the optimization in general which I thought might only be applied in our case previously. will also test that out, see if we can improve this overall.
        Hide
        Shai Erera added a comment -
        • I implemented FixedBitSet.iterator() to return what I think is a more optimized version, and not OpenBitSetIterator. I also saved nextSetBit two additions in nextSetBit (and the iterator). I think we may want to commit those two changes irrespective of whether we cut over to a general DocIdSet in MatchingDocs.

        I then reviewed the patch more carefully and I noticed a couple of issues, all fixed in this patch:

        • FacetsCollector.createHitSet() returned a MutableDocIdSet which uses OpenBitSet and not FixedBitSet internally. This affects .add() too, especially as it used set() and not fastSet(). So I modified it to use FixedBitSet, as the number of bits is known in advance.
        • I moved MutableDocIdSet inside FacetsCollector and changed it to not extend DocIdSet, but rather expose two methods: add() and getDocs(). The latter returns a DocIdSet.
          • That way, MatchingDocs doesn't need to declare MutableDocIdSet but DocIdSet, which makes more sense as we don't want the "users" of MatchingDocs to be able to modify the doc id set.
        • I noticed many places in the code still included a ++doc even though it's not needed anymore, so I removed them.

        I wonder if the 10% loss that Mike saw was related to both the usage of OpenBitSet (which affects collection) and OpenBitSetIterator (which affects accumulation), and how will this patch perform vs trunk. I will try to setup my environment to run the facets benchmark, but Mike, if you can repeat the test w/ this patch and post the results, that would be great.

        Show
        Shai Erera added a comment - I implemented FixedBitSet.iterator() to return what I think is a more optimized version, and not OpenBitSetIterator. I also saved nextSetBit two additions in nextSetBit (and the iterator). I think we may want to commit those two changes irrespective of whether we cut over to a general DocIdSet in MatchingDocs. I then reviewed the patch more carefully and I noticed a couple of issues, all fixed in this patch: FacetsCollector.createHitSet() returned a MutableDocIdSet which uses OpenBitSet and not FixedBitSet internally. This affects .add() too, especially as it used set() and not fastSet(). So I modified it to use FixedBitSet, as the number of bits is known in advance. I moved MutableDocIdSet inside FacetsCollector and changed it to not extend DocIdSet, but rather expose two methods: add() and getDocs(). The latter returns a DocIdSet. That way, MatchingDocs doesn't need to declare MutableDocIdSet but DocIdSet, which makes more sense as we don't want the "users" of MatchingDocs to be able to modify the doc id set. I noticed many places in the code still included a ++doc even though it's not needed anymore, so I removed them. I wonder if the 10% loss that Mike saw was related to both the usage of OpenBitSet (which affects collection) and OpenBitSetIterator (which affects accumulation), and how will this patch perform vs trunk. I will try to setup my environment to run the facets benchmark, but Mike, if you can repeat the test w/ this patch and post the results, that would be great.
        Hide
        Michael McCandless added a comment -

        OK I re-ran with the last patch:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                      HighPhrase        3.64      (5.0%)        3.56      (5.8%)   -2.2% ( -12% -    9%)
                       MedPhrase      145.39      (4.2%)      142.65      (4.6%)   -1.9% ( -10% -    7%)
                    OrNotHighLow       23.53      (5.4%)       23.30      (6.5%)   -1.0% ( -12% -   11%)
                      AndHighLow      290.69      (1.9%)      289.32      (1.6%)   -0.5% (  -3% -    3%)
                    OrNotHighMed       18.22      (4.5%)       18.16      (5.3%)   -0.3% (  -9% -    9%)
                       LowPhrase       12.03      (2.0%)       12.02      (1.9%)   -0.1% (  -3% -    3%)
                   OrNotHighHigh        5.88      (4.0%)        5.87      (4.3%)   -0.1% (  -8% -    8%)
                      OrHighHigh        4.84      (3.8%)        4.84      (4.0%)   -0.1% (  -7% -    8%)
                      AndHighMed       29.55      (1.8%)       29.53      (1.7%)   -0.1% (  -3% -    3%)
                     AndHighHigh       24.03      (1.6%)       24.03      (1.5%)   -0.0% (  -3% -    3%)
                   OrHighNotHigh        7.22      (3.7%)        7.22      (4.0%)    0.0% (  -7% -    7%)
                       OrHighLow       10.01      (3.6%)       10.04      (3.8%)    0.2% (  -6% -    7%)
                 LowSloppyPhrase       37.69      (1.8%)       37.78      (1.8%)    0.3% (  -3% -    3%)
                     LowSpanNear        9.56      (3.2%)        9.60      (3.6%)    0.4% (  -6% -    7%)
                         LowTerm       84.44      (3.5%)       84.82      (3.4%)    0.5% (  -6% -    7%)
                     MedSpanNear       26.26      (3.0%)       26.38      (3.0%)    0.5% (  -5% -    6%)
                    HighSpanNear        6.49      (5.0%)        6.52      (4.5%)    0.5% (  -8% -   10%)
                          Fuzzy1       64.43      (2.1%)       64.78      (1.6%)    0.5% (  -3% -    4%)
                    OrHighNotLow       12.56      (3.4%)       12.63      (3.5%)    0.5% (  -6% -    7%)
                          Fuzzy2       44.54      (2.2%)       44.80      (2.0%)    0.6% (  -3% -    4%)
                       OrHighMed       13.49      (3.2%)       13.57      (3.6%)    0.6% (  -5% -    7%)
                         Respell       47.31      (3.1%)       47.60      (2.5%)    0.6% (  -4% -    6%)
                 MedSloppyPhrase        3.26      (4.3%)        3.28      (5.0%)    0.7% (  -8% -   10%)
                    OrHighNotMed       15.62      (2.9%)       15.78      (3.4%)    1.0% (  -5% -    7%)
                HighSloppyPhrase        3.15      (7.9%)        3.18     (11.1%)    1.0% ( -16% -   21%)
                        HighTerm       21.55      (2.6%)       21.79      (2.6%)    1.1% (  -3% -    6%)
                         MedTerm       31.23      (2.3%)       31.67      (2.6%)    1.4% (  -3% -    6%)
                          IntNRQ        4.11      (2.8%)        4.30      (2.3%)    4.7% (   0% -    9%)
                        Wildcard       17.39      (2.6%)       20.99      (3.9%)   20.7% (  13% -   27%)
                         Prefix3       75.17      (1.9%)       96.43      (3.3%)   28.3% (  22% -   34%)
        

        I think the improved FixedBitSet iterator made a huge difference for the MTQs (wildcard, prefix), unrelated to facets I think, since those queries very likely used the filter rewrite method.

        And then there's no more ~10% hit for fast queries, which is great: DISI is fine to use.

        Show
        Michael McCandless added a comment - OK I re-ran with the last patch: Task QPS base StdDev QPS comp StdDev Pct diff HighPhrase 3.64 (5.0%) 3.56 (5.8%) -2.2% ( -12% - 9%) MedPhrase 145.39 (4.2%) 142.65 (4.6%) -1.9% ( -10% - 7%) OrNotHighLow 23.53 (5.4%) 23.30 (6.5%) -1.0% ( -12% - 11%) AndHighLow 290.69 (1.9%) 289.32 (1.6%) -0.5% ( -3% - 3%) OrNotHighMed 18.22 (4.5%) 18.16 (5.3%) -0.3% ( -9% - 9%) LowPhrase 12.03 (2.0%) 12.02 (1.9%) -0.1% ( -3% - 3%) OrNotHighHigh 5.88 (4.0%) 5.87 (4.3%) -0.1% ( -8% - 8%) OrHighHigh 4.84 (3.8%) 4.84 (4.0%) -0.1% ( -7% - 8%) AndHighMed 29.55 (1.8%) 29.53 (1.7%) -0.1% ( -3% - 3%) AndHighHigh 24.03 (1.6%) 24.03 (1.5%) -0.0% ( -3% - 3%) OrHighNotHigh 7.22 (3.7%) 7.22 (4.0%) 0.0% ( -7% - 7%) OrHighLow 10.01 (3.6%) 10.04 (3.8%) 0.2% ( -6% - 7%) LowSloppyPhrase 37.69 (1.8%) 37.78 (1.8%) 0.3% ( -3% - 3%) LowSpanNear 9.56 (3.2%) 9.60 (3.6%) 0.4% ( -6% - 7%) LowTerm 84.44 (3.5%) 84.82 (3.4%) 0.5% ( -6% - 7%) MedSpanNear 26.26 (3.0%) 26.38 (3.0%) 0.5% ( -5% - 6%) HighSpanNear 6.49 (5.0%) 6.52 (4.5%) 0.5% ( -8% - 10%) Fuzzy1 64.43 (2.1%) 64.78 (1.6%) 0.5% ( -3% - 4%) OrHighNotLow 12.56 (3.4%) 12.63 (3.5%) 0.5% ( -6% - 7%) Fuzzy2 44.54 (2.2%) 44.80 (2.0%) 0.6% ( -3% - 4%) OrHighMed 13.49 (3.2%) 13.57 (3.6%) 0.6% ( -5% - 7%) Respell 47.31 (3.1%) 47.60 (2.5%) 0.6% ( -4% - 6%) MedSloppyPhrase 3.26 (4.3%) 3.28 (5.0%) 0.7% ( -8% - 10%) OrHighNotMed 15.62 (2.9%) 15.78 (3.4%) 1.0% ( -5% - 7%) HighSloppyPhrase 3.15 (7.9%) 3.18 (11.1%) 1.0% ( -16% - 21%) HighTerm 21.55 (2.6%) 21.79 (2.6%) 1.1% ( -3% - 6%) MedTerm 31.23 (2.3%) 31.67 (2.6%) 1.4% ( -3% - 6%) IntNRQ 4.11 (2.8%) 4.30 (2.3%) 4.7% ( 0% - 9%) Wildcard 17.39 (2.6%) 20.99 (3.9%) 20.7% ( 13% - 27%) Prefix3 75.17 (1.9%) 96.43 (3.3%) 28.3% ( 22% - 34%) I think the improved FixedBitSet iterator made a huge difference for the MTQs (wildcard, prefix), unrelated to facets I think, since those queries very likely used the filter rewrite method. And then there's no more ~10% hit for fast queries, which is great: DISI is fine to use.
        Hide
        Shai Erera added a comment -

        I ran this on a 2013 Wikipedia dump w/ 6.7M docs (full docs, not 1K) and Date facet:

                           Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
               HighSloppyPhrase        1.80      (9.7%)        1.75      (6.1%)   -2.9% ( -16% -   14%)
                      OrHighLow        5.53      (2.2%)        5.43      (2.7%)   -1.9% (  -6% -    3%)
                     OrHighHigh        3.81      (2.2%)        3.74      (2.7%)   -1.8% (  -6% -    3%)
                   OrHighNotLow        9.27      (2.1%)        9.13      (2.6%)   -1.5% (  -5% -    3%)
                   HighSpanNear        3.77      (5.4%)        3.71      (6.2%)   -1.4% ( -12% -   10%)
                   OrHighNotMed       14.84      (2.1%)       14.64      (2.5%)   -1.3% (  -5% -    3%)
                  OrHighNotHigh        8.06      (2.4%)        7.96      (2.8%)   -1.2% (  -6% -    4%)
                MedSloppyPhrase        1.66      (7.2%)        1.64      (4.3%)   -1.2% ( -11% -   11%)
                   OrNotHighLow       30.04      (4.6%)       29.71      (4.9%)   -1.1% ( -10% -    8%)
                      OrHighMed       12.16      (2.2%)       12.04      (2.3%)   -1.0% (  -5% -    3%)
                     HighPhrase        2.28     (10.2%)        2.26      (9.2%)   -0.7% ( -18% -   20%)
                  OrNotHighHigh       13.08      (3.0%)       13.00      (3.1%)   -0.7% (  -6% -    5%)
                        Respell       24.49      (3.3%)       24.33      (3.3%)   -0.6% (  -7% -    6%)
                   OrNotHighMed       18.02      (4.1%)       17.99      (4.0%)   -0.2% (  -7% -    8%)
                      LowPhrase        5.73      (7.0%)        5.72      (6.9%)   -0.2% ( -13% -   14%)
                    MedSpanNear       14.97      (3.8%)       14.99      (4.3%)    0.1% (  -7% -    8%)
                     AndHighLow      199.51      (2.9%)      200.05      (3.6%)    0.3% (  -6% -    6%)
                    LowSpanNear        4.57      (4.0%)        4.59      (4.7%)    0.3% (  -8% -    9%)
                      MedPhrase       79.00      (7.4%)       79.23      (6.3%)    0.3% ( -12% -   15%)
                         Fuzzy2       25.42      (3.0%)       25.56      (3.1%)    0.6% (  -5% -    6%)
                         Fuzzy1       35.84      (2.7%)       36.11      (3.7%)    0.7% (  -5% -    7%)
                LowSloppyPhrase       20.55      (2.7%)       20.73      (2.3%)    0.9% (  -4% -    6%)
                       HighTerm       22.31      (3.7%)       22.59      (2.6%)    1.2% (  -4% -    7%)
                     AndHighMed       16.17      (1.8%)       16.39      (2.3%)    1.3% (  -2% -    5%)
                    AndHighHigh       15.85      (2.3%)       16.17      (1.7%)    2.1% (  -1% -    6%)
                        MedTerm       26.51      (3.9%)       27.11      (4.0%)    2.3% (  -5% -   10%)
                        LowTerm       98.07      (4.6%)      101.55      (5.5%)    3.5% (  -6% -   14%)
                         IntNRQ        8.61      (4.3%)        9.20      (4.6%)    6.9% (  -1% -   16%)
                       Wildcard       12.96      (3.0%)       14.30      (3.6%)   10.3% (   3% -   17%)
                        Prefix3       74.18      (2.7%)       96.70      (4.9%)   30.4% (  22% -   38%)
        

        Results are consistent with yours. So should we proceed w/ the API change?

        Show
        Shai Erera added a comment - I ran this on a 2013 Wikipedia dump w/ 6.7M docs (full docs, not 1K) and Date facet: Task QPS base StdDev QPS comp StdDev Pct diff HighSloppyPhrase 1.80 (9.7%) 1.75 (6.1%) -2.9% ( -16% - 14%) OrHighLow 5.53 (2.2%) 5.43 (2.7%) -1.9% ( -6% - 3%) OrHighHigh 3.81 (2.2%) 3.74 (2.7%) -1.8% ( -6% - 3%) OrHighNotLow 9.27 (2.1%) 9.13 (2.6%) -1.5% ( -5% - 3%) HighSpanNear 3.77 (5.4%) 3.71 (6.2%) -1.4% ( -12% - 10%) OrHighNotMed 14.84 (2.1%) 14.64 (2.5%) -1.3% ( -5% - 3%) OrHighNotHigh 8.06 (2.4%) 7.96 (2.8%) -1.2% ( -6% - 4%) MedSloppyPhrase 1.66 (7.2%) 1.64 (4.3%) -1.2% ( -11% - 11%) OrNotHighLow 30.04 (4.6%) 29.71 (4.9%) -1.1% ( -10% - 8%) OrHighMed 12.16 (2.2%) 12.04 (2.3%) -1.0% ( -5% - 3%) HighPhrase 2.28 (10.2%) 2.26 (9.2%) -0.7% ( -18% - 20%) OrNotHighHigh 13.08 (3.0%) 13.00 (3.1%) -0.7% ( -6% - 5%) Respell 24.49 (3.3%) 24.33 (3.3%) -0.6% ( -7% - 6%) OrNotHighMed 18.02 (4.1%) 17.99 (4.0%) -0.2% ( -7% - 8%) LowPhrase 5.73 (7.0%) 5.72 (6.9%) -0.2% ( -13% - 14%) MedSpanNear 14.97 (3.8%) 14.99 (4.3%) 0.1% ( -7% - 8%) AndHighLow 199.51 (2.9%) 200.05 (3.6%) 0.3% ( -6% - 6%) LowSpanNear 4.57 (4.0%) 4.59 (4.7%) 0.3% ( -8% - 9%) MedPhrase 79.00 (7.4%) 79.23 (6.3%) 0.3% ( -12% - 15%) Fuzzy2 25.42 (3.0%) 25.56 (3.1%) 0.6% ( -5% - 6%) Fuzzy1 35.84 (2.7%) 36.11 (3.7%) 0.7% ( -5% - 7%) LowSloppyPhrase 20.55 (2.7%) 20.73 (2.3%) 0.9% ( -4% - 6%) HighTerm 22.31 (3.7%) 22.59 (2.6%) 1.2% ( -4% - 7%) AndHighMed 16.17 (1.8%) 16.39 (2.3%) 1.3% ( -2% - 5%) AndHighHigh 15.85 (2.3%) 16.17 (1.7%) 2.1% ( -1% - 6%) MedTerm 26.51 (3.9%) 27.11 (4.0%) 2.3% ( -5% - 10%) LowTerm 98.07 (4.6%) 101.55 (5.5%) 3.5% ( -6% - 14%) IntNRQ 8.61 (4.3%) 9.20 (4.6%) 6.9% ( -1% - 16%) Wildcard 12.96 (3.0%) 14.30 (3.6%) 10.3% ( 3% - 17%) Prefix3 74.18 (2.7%) 96.70 (4.9%) 30.4% ( 22% - 38%) Results are consistent with yours. So should we proceed w/ the API change?
        Hide
        Shai Erera added a comment -

        John/Lei, can you please review the new patch to confirm this API will work for you?

        Show
        Shai Erera added a comment - John/Lei, can you please review the new patch to confirm this API will work for you?
        Hide
        John Wang added a comment -

        Yes, this works great for us!

        Thanks Shai and Mike!

        -John

        Show
        John Wang added a comment - Yes, this works great for us! Thanks Shai and Mike! -John
        Hide
        Lei Wang added a comment -

        Yes! with this patch, we can write our own FacetCollector, and do customize.

        Only one small suggestions. The createHitsSet is marked as protected, but the class itself is final, no sub-class can override it other then creating a new FacetCollector. Can we remove the final modifier for the class, and add finals to the methods we don't want the user to override?

        Show
        Lei Wang added a comment - Yes! with this patch, we can write our own FacetCollector, and do customize. Only one small suggestions. The createHitsSet is marked as protected, but the class itself is final, no sub-class can override it other then creating a new FacetCollector. Can we remove the final modifier for the class, and add finals to the methods we don't want the user to override?
        Hide
        Shai Erera added a comment -

        The createHitsSet is marked as protected, but the class itself is final

        Duh, will remove final from class, thanks for noticing that! .

        I will run some tests and then commit the patch.

        Show
        Shai Erera added a comment - The createHitsSet is marked as protected, but the class itself is final Duh, will remove final from class, thanks for noticing that! . I will run some tests and then commit the patch.
        Hide
        Lei Wang added a comment -

        Thanks!

        Show
        Lei Wang added a comment - Thanks!
        Hide
        ASF subversion and git services added a comment -

        Commit 1564898 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1564898 ]

        LUCENE-5425: Make creation of FixedBitSet in FacetsCollector overridable

        Show
        ASF subversion and git services added a comment - Commit 1564898 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1564898 ] LUCENE-5425 : Make creation of FixedBitSet in FacetsCollector overridable
        Hide
        ASF subversion and git services added a comment -

        Commit 1564907 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1564907 ]

        LUCENE-5425: Make creation of FixedBitSet in FacetsCollector overridable

        Show
        ASF subversion and git services added a comment - Commit 1564907 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1564907 ] LUCENE-5425 : Make creation of FixedBitSet in FacetsCollector overridable
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. I renamed MutableDocIdSet to Docs as it wasn't a DocIdSet anymore.

        Thanks John and Lei for your contribution!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. I renamed MutableDocIdSet to Docs as it wasn't a DocIdSet anymore. Thanks John and Lei for your contribution!
        Hide
        Uwe Schindler added a comment - - edited

        Hi,
        unfortunately the methods FixedBitSet.or(DocIdSetIterator) now can no longer use the optimization. We should refactor the anaonymous iterator to be a pkg-private class and change the instanceof checks in the and/or/.... methods use that class.
        BooleanFilter and ChainedFilter make heavy use of this opto!!!

        Show
        Uwe Schindler added a comment - - edited Hi, unfortunately the methods FixedBitSet.or(DocIdSetIterator) now can no longer use the optimization. We should refactor the anaonymous iterator to be a pkg-private class and change the instanceof checks in the and/or/.... methods use that class. BooleanFilter and ChainedFilter make heavy use of this opto!!!
        Hide
        Lei Wang added a comment -

        Looks like we should be able to replace the impl of OpenBitSetIterator with the one Shai added in the FixedBitSet. The only missing data is the numBits, but it's optional. And at the same time, any code that is using the will be benefit from this.

        Show
        Lei Wang added a comment - Looks like we should be able to replace the impl of OpenBitSetIterator with the one Shai added in the FixedBitSet. The only missing data is the numBits, but it's optional. And at the same time, any code that is using the will be benefit from this.
        Hide
        Lei Wang added a comment -

        Faster OpenBitSetIterator

        Show
        Lei Wang added a comment - Faster OpenBitSetIterator
        Hide
        Shai Erera added a comment -

        Uwe, I added an explicit FixedBitSetIterator and the optimization in LUCENE-5440. Lei, on LUCENE-5440 I'm moving all the code to use FixedBitSet instead of OpenBitSet and therefore it will use the new iterator by default. The intention is to get rid of OpenBitSet and accompanying classes.

        Show
        Shai Erera added a comment - Uwe, I added an explicit FixedBitSetIterator and the optimization in LUCENE-5440 . Lei, on LUCENE-5440 I'm moving all the code to use FixedBitSet instead of OpenBitSet and therefore it will use the new iterator by default. The intention is to get rid of OpenBitSet and accompanying classes.
        Hide
        Lei Wang added a comment -

        oh cool! that patch looks better!

        Show
        Lei Wang added a comment - oh cool! that patch looks better!

          People

          • Assignee:
            Shai Erera
            Reporter:
            John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development