Lucene - Core
  1. Lucene - Core
  2. LUCENE-4600

Explore facets aggregation during documents collection

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, Trunk
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Today the facet module simply gathers all hits (as a bitset, optionally with a float[] to hold scores as well, if you will aggregate them) during collection, and then at the end when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual aggregation.

      We should investigate just aggregating as we collect instead, so we don't have to tie up transient RAM (fairly small for the bit set but possibly big for the float[]).

      1. LUCENE-4600.patch
        96 kB
        Shai Erera
      2. LUCENE-4600.patch
        114 kB
        Shai Erera
      3. LUCENE-4600.patch
        110 kB
        Shai Erera
      4. LUCENE-4600.patch
        85 kB
        Shai Erera
      5. LUCENE-4600.patch
        85 kB
        Shai Erera
      6. LUCENE-4600.patch
        17 kB
        Michael McCandless
      7. LUCENE-4600.patch
        10 kB
        Michael McCandless
      8. LUCENE-4600-cli.patch
        24 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Shai Erera added a comment -

          Just one comment, sampling benefits from this two pass, because that way we can guarantee a minimum sample set size. Maybe there's way to achieve that with in-collection aggregation too, but noting it here so that it's in our minds.

          See my comment on LUCENE-4598, the bitset may not be that small .

          Show
          Shai Erera added a comment - Just one comment, sampling benefits from this two pass, because that way we can guarantee a minimum sample set size. Maybe there's way to achieve that with in-collection aggregation too, but noting it here so that it's in our minds. See my comment on LUCENE-4598 , the bitset may not be that small .
          Hide
          Michael McCandless added a comment -

          sampling benefits from this two pass, because that way we can guarantee a minimum sample set size.

          Ahh true ...

          We have talked about adding a Scorer.getEstimatedHitCount (somewhere Robert has a patch...), so that eg BooleanQuery can do a better job ordering its sub-scorers, but I think we could use it for facets too (ie to pick sampling collector or not).

          But, if the estimate was off (which it's allowed to be) ... then it could get tricky for facets, eg you may have to re-run the query with the non-sampling collector (or with higher sampling %tg) ...

          Show
          Michael McCandless added a comment - sampling benefits from this two pass, because that way we can guarantee a minimum sample set size. Ahh true ... We have talked about adding a Scorer.getEstimatedHitCount (somewhere Robert has a patch...), so that eg BooleanQuery can do a better job ordering its sub-scorers, but I think we could use it for facets too (ie to pick sampling collector or not). But, if the estimate was off (which it's allowed to be) ... then it could get tricky for facets, eg you may have to re-run the query with the non-sampling collector (or with higher sampling %tg) ...
          Hide
          Shai Erera added a comment -

          I'd rather if we rename this issue to something like "implement an in-collection FacetsAccumulator/Collector". I don't think that "facets should" aggregate only one way. There are many faceting examples, and some will have different flavors than others.

          However, if this new Collector will perform better on a 'common' case, then I'm +1 for making it the default.

          Note that I put 'common' in quotes. The benchmark that you're doing indexing Wikipedia w/ a single Date facet dimension is not common. I think that we should define the common case, maybe following how Solr users use facets. I.e., is it the eCommerce case, where each document is associated with <10 dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's say that the facets defaults are tuned for that case, and then we benchmark it.

          After we have such benchmark, we can compare the two aggregating collectors and decide which should be default.

          And we should also define other scenarios too: few dimensions, flat taxonomies, but with hundred thousands or millions of categories – what FacetsAccumulator/Collector (including maybe an entirely different indexing chain) suits that case?

          We then document some recipes on the Wiki, and recommend the best configuration for each case.

          Show
          Shai Erera added a comment - I'd rather if we rename this issue to something like "implement an in-collection FacetsAccumulator/Collector". I don't think that "facets should" aggregate only one way. There are many faceting examples, and some will have different flavors than others. However, if this new Collector will perform better on a 'common' case, then I'm +1 for making it the default. Note that I put 'common' in quotes. The benchmark that you're doing indexing Wikipedia w/ a single Date facet dimension is not common. I think that we should define the common case, maybe following how Solr users use facets. I.e., is it the eCommerce case, where each document is associated with <10 dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's say that the facets defaults are tuned for that case, and then we benchmark it. After we have such benchmark, we can compare the two aggregating collectors and decide which should be default. And we should also define other scenarios too: few dimensions, flat taxonomies, but with hundred thousands or millions of categories – what FacetsAccumulator/Collector (including maybe an entirely different indexing chain) suits that case? We then document some recipes on the Wiki, and recommend the best configuration for each case.
          Hide
          Michael McCandless added a comment -

          I agree we should keep "do all aggregation at the end" ... it could be for some use-cases (sampling) it's better.

          So the "aggregate as you collect" should be an option, and not necessarily the default until we can see if it's better for the "common" case.

          Feel free to change the title of this issue!

          Show
          Michael McCandless added a comment - I agree we should keep "do all aggregation at the end" ... it could be for some use-cases (sampling) it's better. So the "aggregate as you collect" should be an option, and not necessarily the default until we can see if it's better for the "common" case. Feel free to change the title of this issue!
          Hide
          Gilad Barkai added a comment -

          Aggregating all doc ids first also make it easier to compute actual results after sampling.
          That is done by taking the sampling result top-(c)K and calculating their true value over all matching documents, giving the benefit of sampling and results which could make sense to the user (e.g in counting the end number would actually be the number of matching documents to this category).

          As for aggregating 'on the fly' it has some other issues

          • It (was?) believed that accessing the counting array during query execution may lead to memory cache issues. The entire counting array could be accessed for every document over and over, and it's not guaranteed it would fit into the cache (that's the CPU's one). That might not be a problem on modern hardware
          • While the OS can cache all payload data itself, it gets difficult as the index grows. If the OS fails to cache the file, it is (again, was?) believed that going over the file in sequential manner once without seeks (at least by the current thread) would make it faster.

          It sort of becoming a religion with all those "believes", as some scenarios used to make sense a few years ago. I'm not sure they still do.
          Can't wait to see how some of these co-exist with the benchmark results.
          If all religions could have been benchmarked...

          Show
          Gilad Barkai added a comment - Aggregating all doc ids first also make it easier to compute actual results after sampling. That is done by taking the sampling result top-(c)K and calculating their true value over all matching documents, giving the benefit of sampling and results which could make sense to the user (e.g in counting the end number would actually be the number of matching documents to this category). As for aggregating 'on the fly' it has some other issues It (was?) believed that accessing the counting array during query execution may lead to memory cache issues. The entire counting array could be accessed for every document over and over, and it's not guaranteed it would fit into the cache (that's the CPU's one). That might not be a problem on modern hardware While the OS can cache all payload data itself, it gets difficult as the index grows. If the OS fails to cache the file, it is (again, was?) believed that going over the file in sequential manner once without seeks (at least by the current thread) would make it faster. It sort of becoming a religion with all those "believes", as some scenarios used to make sense a few years ago. I'm not sure they still do. Can't wait to see how some of these co-exist with the benchmark results. If all religions could have been benchmarked...
          Hide
          Michael McCandless added a comment -

          Initial prototype patch ... I created a CountingFacetsCollector that aggregates per-segment, and it "hardwires" a dgap/vint decoding.

          I tested using luceneutil's date faceting and it gives decent speedups for TermQuery:

                          HighTerm        0.54      (2.7%)        0.63      (1.4%)   17.6% (  13% -   22%)
                           LowTerm        7.69      (1.6%)        9.15      (2.1%)   18.9% (  14% -   23%)
                           MedTerm        3.39      (1.2%)        4.48      (1.3%)   32.2% (  29% -   35%)
          
          Show
          Michael McCandless added a comment - Initial prototype patch ... I created a CountingFacetsCollector that aggregates per-segment, and it "hardwires" a dgap/vint decoding. I tested using luceneutil's date faceting and it gives decent speedups for TermQuery: HighTerm 0.54 (2.7%) 0.63 (1.4%) 17.6% ( 13% - 22%) LowTerm 7.69 (1.6%) 9.15 (2.1%) 18.9% ( 14% - 23%) MedTerm 3.39 (1.2%) 4.48 (1.3%) 32.2% ( 29% - 35%)
          Hide
          Michael McCandless added a comment -

          New patch, adding a hacked up CachedCountingFacetsCollector.

          All it does is first pre-load all payloads into a PackedBytes (just like DocValues), and then during aggregation, instead of pulling the byte[] from payloads it pulls it from this RAM cache.

          This results in an unexpectedly big speedup:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm        0.53      (0.9%)        1.00      (2.5%)   87.3% (  83% -   91%)
                           LowTerm        7.59      (0.6%)       26.75     (12.9%)  252.6% ( 237% -  267%)
                           MedTerm        3.35      (0.7%)       12.71      (9.0%)  279.8% ( 268% -  291%)
          

          The only "real" difference is that I'm pulling the byte[] from RAM instead of from payloads, ie I still pay the vInt+dgap decode cost per hit ... so it's surprising payloads add THAT MUCH overhead? (The test was "hot" so payloads were coming from OS's IO cache via MMapDir).

          I think the reason why HighTerm sees the least gains is because .advance is much less costly for it, since often the target is in the already-loaded block.

          I had separately previously tested the existing int[][][] cache (CategoryListCache) but it had smaller gains than this (73% for MedTerm), and it required more RAM (1.9 GB vs 377 RAM for this patch).

          Net/net I think we should offer an easy-to-use DV-backed facets impl...

          Show
          Michael McCandless added a comment - New patch, adding a hacked up CachedCountingFacetsCollector. All it does is first pre-load all payloads into a PackedBytes (just like DocValues), and then during aggregation, instead of pulling the byte[] from payloads it pulls it from this RAM cache. This results in an unexpectedly big speedup: Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 0.53 (0.9%) 1.00 (2.5%) 87.3% ( 83% - 91%) LowTerm 7.59 (0.6%) 26.75 (12.9%) 252.6% ( 237% - 267%) MedTerm 3.35 (0.7%) 12.71 (9.0%) 279.8% ( 268% - 291%) The only "real" difference is that I'm pulling the byte[] from RAM instead of from payloads, ie I still pay the vInt+dgap decode cost per hit ... so it's surprising payloads add THAT MUCH overhead? (The test was "hot" so payloads were coming from OS's IO cache via MMapDir). I think the reason why HighTerm sees the least gains is because .advance is much less costly for it, since often the target is in the already-loaded block. I had separately previously tested the existing int[][][] cache (CategoryListCache) but it had smaller gains than this (73% for MedTerm), and it required more RAM (1.9 GB vs 377 RAM for this patch). Net/net I think we should offer an easy-to-use DV-backed facets impl...
          Hide
          Shai Erera added a comment -

          Net/net I think we should offer an easy-to-use DV-backed facets impl...

          If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...

          The patch looks very good. Few comments/questions:

          • Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help.
            • Preferably, if we had an AtomicReader which caches these bytes, then you wouldn't need to reuse the Collector?
            • Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same?

          If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:

          • Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().
          • Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.
          • Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.
          • I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.
          • Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.
          • In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?
          • Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only.
            • For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it?

          I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it ..
          On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.

          About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...

          Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.

          Overall though, great work Mike !

          We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...

          Show
          Shai Erera added a comment - Net/net I think we should offer an easy-to-use DV-backed facets impl... If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ... The patch looks very good. Few comments/questions: Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help. Preferably, if we had an AtomicReader which caches these bytes, then you wouldn't need to reuse the Collector? Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same? If you want to make this a class that can be reused by other scenarios, then few tips that can enable that: Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm(). Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder. Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors. I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too. Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array. In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right? Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only. For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it? I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps. About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ... Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching. Overall though, great work Mike ! We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...
          Hide
          Shai Erera added a comment -

          Changing the title, which got me thinking – Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?

          Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely.

          If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.

          Show
          Shai Erera added a comment - Changing the title, which got me thinking – Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference? Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely. If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.
          Hide
          Michael McCandless added a comment -

          Net/net I think we should offer an easy-to-use DV-backed facets impl...

          If only DV could handle multi-values. Can they handle a single byte[]?

          Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[].

          They can handle byte[], so I think we should just offer that.

          Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ...

          Right, though in the special (common?) case where a given facet field
          is single-valued, like the Date facets I added to luceneutil /
          nightlybench (see the graph here:
          http://people.apache.org/~mikemccand/lucenebench/TermDateFacets.html
          – only 3 data points so far!), we could also use DV's int fields and
          let it encode the single ord (eg with packed ints) and then aggreggate
          up the taxonomy after aggregation of the leaf ords is done. I'm
          playing with a prototype patch for this ...

          Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help.

          No no: this is all just a hack (the CachedBytes / static cache). We
          should somehow cleanly switch to DV ... it wasn't clear to me how to
          do that ...

          Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same?

          Maybe! Have to test ...

          If you want to make this a class that can be reused by other scenarios, then few tips that can enable that:

          I do! If ... making it fully generic doesn't hurt perf much. The
          decode chain (w/ separate reInit called per doc) seems heavyish ...

          Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm().

          Ahh ok. I'll fix that.

          Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.

          OK I'll try that.

          Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors.

          I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too.

          OK good.

          Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array.

          Ahh I'll do that.

          Separately I was wondering if we should sometimes do aggregation
          backed by an int[] hashmap, and have it "upgrade" to a non-sparse
          array only once the number collected got too large. Not sure it's
          THAT important since it would only serve to keep fast queries fast but
          would make slow queries a bit slower...

          In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right?

          Also for multiple threads running at once ... but it's all a hack anyway ...

          Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only.
          For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it?

          Oh good point – the DV/cache collectors can accept out of order.
          I'll fix.

          I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it ..
          On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps.

          I think we should have two new collectors here? One keeps using
          payloads but operates per segment and aggregates on the fly (if, on
          making it generic again, we still see gains).

          The other stores the byte[] in DV. But somehow we have to make "send
          the byte[] to DV not payloads at index time" easy ... I'm not sure how

          About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively?

          Right: base = current trunk, comp = the two new collectors.

          I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ...

          This also surprised me, but I suspect it's the per-doc pointer
          dereferencing that's costing us. I saw the same problem with
          DirectPostingsFormat ... This also ties up tons of extra RAM (pointer
          = 4 or 8 bytes; int[] object overhead maybe 8 bytes?). I bet if we
          made a single int[], and did our own addressing (eg another int[] that
          maps docID to its address) then that would be faster than byte[] via
          cache/DV.

          Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching.

          Yeah good question. I'll separately test the specialized decode to
          see how much it's helping....

          Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference?

          Right! DV vs payloads is decoupled from during- vs post-collection
          aggregation.

          I'll open a separate issue to allow byte[] DV backing for facets....

          Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely.
          If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too.

          Definitely.

          Overall though, great work Mike !
          We must get this code in. It's clear that it can potentially gain a lot for some scenarios ...

          Thanks! I want to see that graph jump

          Show
          Michael McCandless added a comment - Net/net I think we should offer an easy-to-use DV-backed facets impl... If only DV could handle multi-values. Can they handle a single byte[]? Because essentially that's what the facets API needs today - it stores everything in the payload, which is byte[]. They can handle byte[], so I think we should just offer that. Having a multi-val DV could benefit us by e.g. not needing to write an iterator on the payload to get the category ordinals ... Right, though in the special (common?) case where a given facet field is single-valued, like the Date facets I added to luceneutil / nightlybench (see the graph here: http://people.apache.org/~mikemccand/lucenebench/TermDateFacets.html – only 3 data points so far!), we could also use DV's int fields and let it encode the single ord (eg with packed ints) and then aggreggate up the taxonomy after aggregation of the leaf ords is done. I'm playing with a prototype patch for this ... Do I understand correctly that the caching Collector is reusable? Otherwise I don't see how the CachedBytes help. No no: this is all just a hack (the CachedBytes / static cache). We should somehow cleanly switch to DV ... it wasn't clear to me how to do that ... Hmmm, what if you used the in-mem Codec, for loading just this term's posting list into RAM? Do you think that you would gain the same? Maybe! Have to test ... If you want to make this a class that can be reused by other scenarios, then few tips that can enable that: I do! If ... making it fully generic doesn't hurt perf much. The decode chain (w/ separate reInit called per doc) seems heavyish ... Instead of referencing CatListParams.DEFAULT_TERM, you can pull the CLP from FacetSearchParams.getFacetIndexingParams().getCLP(new CP()).getTerm(). Ahh ok. I'll fix that. Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder. OK I'll try that. Not sure that we should, but this class supports only one CLP. I think it's ok to leave it like that, and get the CLP.term() at ctor, but then we must be able to cache the bytes at the reader level. That way, if an app uses multiple CLPs, it can initialize multi such Collectors. I think it's ok to rely on the top Query to not call us for deleted docs, and therefore pass liveDocs=null. If a Query wants to iterate on deleted docs, we should count facets for them too. OK good. Maybe you should take the IntArrayAllocator from the outside? That class can be initialized by the app once to e.g. use maxArrays=10 (e.g. if it expects max 10 queries in parallel), and then the int[] are reused whenever possible. The way the patch is now, if you reuse that Collector, you can only reuse one array. Ahh I'll do that. Separately I was wondering if we should sometimes do aggregation backed by an int[] hashmap, and have it "upgrade" to a non-sparse array only once the number collected got too large. Not sure it's THAT important since it would only serve to keep fast queries fast but would make slow queries a bit slower... In setNextReader you sync on the cache only in case someone executes a search w/ an ExecutorService? That's another point where caching at the Codec/AtomicReader level would be better, right? Also for multiple threads running at once ... but it's all a hack anyway ... Why is acceptDocsOutOfOrder false? Is it because of how the cache works? Because facet counting is not limited to in-order only. For the non-caching one that's true, because we can only advance on the fulltree posting. But if the posting is entirely in RAM, we can random access it? Oh good point – the DV/cache collectors can accept out of order. I'll fix. I wonder if we can write a good single Collector, and optimize the caching stuff through the Reader, or DV. Collectors in Lucene are usually not reusable? At least, I haven't seen such pattern. The current FacetsCollector isn't reusable (b/c of the bitset and potential scores array). So I'm worried users might be confused and won't benefit the most from that Collector, b/c they won't reuse it .. On the other hand, saying that we have a FacetsIndexReader (composite) which per configuration initializes the right FacetAtomicReader would be more consumable by apps. I think we should have two new collectors here? One keeps using payloads but operates per segment and aggregates on the fly (if, on making it generic again, we still see gains). The other stores the byte[] in DV. But somehow we have to make "send the byte[] to DV not payloads at index time" easy ... I'm not sure how About the results, just to clarify – in both runs the 'QPS base' refers to current facet counting and 'QPS comp' refers to the two new collectors respectively? Right: base = current trunk, comp = the two new collectors. I'm surprised that the int[][][] didn't perform much better, since you don't need to do the decoding for every document, for every query. But then, perhaps it's because the RAM size is so large, and we pay a lot swapping in/out from CPU cache ... This also surprised me, but I suspect it's the per-doc pointer dereferencing that's costing us. I saw the same problem with DirectPostingsFormat ... This also ties up tons of extra RAM (pointer = 4 or 8 bytes; int[] object overhead maybe 8 bytes?). I bet if we made a single int[], and did our own addressing (eg another int[] that maps docID to its address) then that would be faster than byte[] via cache/DV. Also, note that you wrote a specialized code for decoding the payload, vs. using an API to do that (e.g. PackedInts / IntDecoder). I wonder how would that compare to the base collection, i.e. would we still see the big difference between int[][][] and the byte[] caching. Yeah good question. I'll separately test the specialized decode to see how much it's helping.... Mike, if we do the Reader/DV caching approach, that could benefit post-collection performance too, right? Is it possile that you hack the current FacetsCollector to do the aggregation over CachedBytes and then compare the difference? Right! DV vs payloads is decoupled from during- vs post-collection aggregation. I'll open a separate issue to allow byte[] DV backing for facets.... Because your first results show that during-collection are not that much faster than post-collection, I am just wondering if it'll be the same when we cache the bytes outside the collector entirely. If so, I think it should push us to do this caching outside, because we've already identified cases where post-collection is needed (e.g. sampling) too. Definitely. Overall though, great work Mike ! We must get this code in. It's clear that it can potentially gain a lot for some scenarios ... Thanks! I want to see that graph jump
          Hide
          Shai Erera added a comment -

          I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.

          But note that DV means ugrading existing indexes. How do you move from a payload to DV? Is it something that can be done in addIndexes? If facets could determine where the data is written, per-segment, the indexes will be migrated on-the-fly, as segments are merged.

          But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.

          If you want to simulate DVs, you'll need to implement few classes. First, instead of CategoryDocBuilder, you can constuct your own Document, while adding DVFields. Just make sure that when you resolve a CP to its ord, you also resolve all its parents and add all of them to the DV - to compare today(payload) to today(DV) (today == writing all parents).

          Then, I think that you should also write your CategoryListIterator, to iterate on the DV.

          Those are the base classes for sure, maybe you'll need a few others to get the CLI into the chain.

          I hope that I related to all the comments, but I might have missed a question .

          Show
          Shai Erera added a comment - I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV. But note that DV means ugrading existing indexes. How do you move from a payload to DV? Is it something that can be done in addIndexes? If facets could determine where the data is written, per-segment, the indexes will be migrated on-the-fly, as segments are merged. But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV. If you want to simulate DVs, you'll need to implement few classes. First, instead of CategoryDocBuilder, you can constuct your own Document, while adding DVFields. Just make sure that when you resolve a CP to its ord, you also resolve all its parents and add all of them to the DV - to compare today(payload) to today(DV) (today == writing all parents). Then, I think that you should also write your CategoryListIterator, to iterate on the DV. Those are the base classes for sure, maybe you'll need a few others to get the CLI into the chain. I hope that I related to all the comments, but I might have missed a question .
          Hide
          Shai Erera added a comment -

          Another point about DV - that's actually a design thing. One important hook is IntEncoder/Decoder. It determines how the fulltree is encoded/decoded. For example, you used one method (VInt+DGap), but there are other encoders. In one application, every document added almost unique facets and so the ordinals returned had a gap of 1-2. Therefore we have a FourOnes and EightOnes encoders.

          Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.

          Show
          Shai Erera added a comment - Another point about DV - that's actually a design thing. One important hook is IntEncoder/Decoder. It determines how the fulltree is encoded/decoded. For example, you used one method (VInt+DGap), but there are other encoders. In one application, every document added almost unique facets and so the ordinals returned had a gap of 1-2. Therefore we have a FourOnes and EightOnes encoders. Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.
          Hide
          Michael McCandless added a comment -

          I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV.

          +1, though we should test the on-disk DV vs current payloads to be sure.

          But note that DV means ugrading existing indexes.

          Hmm it would be nice to somehow migrate on the fly ... not sure how.

          But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV.

          If we do the migrate-on-the-fly then users can use IndexUpgrader to migrate entire index.

          Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working.

          +1, the abstractions are nice an generic.

          I'll test to see how much these abstraction are hurting the hotspots ... we can always make/pick specialized collectors (like the patch) if necessary, and keep generic collectors for the fully general cases ...

          Show
          Michael McCandless added a comment - I would like to see one ordinals-store, I don't think that we should allow either payload or DV. If DV lets us write byte[], and we could read it off-disk or RAM, we should make the cut to DV. +1, though we should test the on-disk DV vs current payloads to be sure. But note that DV means ugrading existing indexes. Hmm it would be nice to somehow migrate on the fly ... not sure how. But if there's a clean way to do a one-time index upgrade to DV, then let's just write it once, and then DVs are migratable, so that's another +1 for DV. If we do the migrate-on-the-fly then users can use IndexUpgrader to migrate entire index. Point is, this abstract layer should remain. I know that you're in the exploration phase, but keep that in mind. In fact, if we're able to make the cut to DV as an internal change, we could also benefit from the existing test suite, to make sure everything's working. +1, the abstractions are nice an generic. I'll test to see how much these abstraction are hurting the hotspots ... we can always make/pick specialized collectors (like the patch) if necessary, and keep generic collectors for the fully general cases ...
          Hide
          Michael McCandless added a comment -

          I created LUCENE-4602 to cutover to DV.

          Show
          Michael McCandless added a comment - I created LUCENE-4602 to cutover to DV.
          Hide
          Michael McCandless added a comment -

          Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder.

          I tried this, changing the CountingFacetsCollector to the attached
          patch (to use CategoryListIterator), but alas those abstractions are
          apparently costing us in this hotspot (unless I screwed something up
          in the patch? Eg, that null I pass is kinda spooky!):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm        0.86      (4.7%)        0.56      (0.4%)  -34.4% ( -37% -  -30%)
                           MedTerm        5.85      (1.0%)        5.04      (0.5%)  -13.9% ( -15% -  -12%)
                           LowTerm       11.82      (0.6%)       11.02      (0.5%)   -6.8% (  -7% -   -5%)
          

          base is the original CountingFacetsCollector and comp is the new one
          using the CategoryListIterator API.

          I think we should try to invoke specialized collectors when possible?

          Show
          Michael McCandless added a comment - Also, you can obtain the right IntDecoder from the CLP for decoding the ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to use a PackedInts decoder. I tried this, changing the CountingFacetsCollector to the attached patch (to use CategoryListIterator), but alas those abstractions are apparently costing us in this hotspot (unless I screwed something up in the patch? Eg, that null I pass is kinda spooky!): Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 0.86 (4.7%) 0.56 (0.4%) -34.4% ( -37% - -30%) MedTerm 5.85 (1.0%) 5.04 (0.5%) -13.9% ( -15% - -12%) LowTerm 11.82 (0.6%) 11.02 (0.5%) -6.8% ( -7% - -5%) base is the original CountingFacetsCollector and comp is the new one using the CategoryListIterator API. I think we should try to invoke specialized collectors when possible?
          Hide
          Michael McCandless added a comment -

          Maybe you should take the IntArrayAllocator from the outside?

          This actually makes me sort of nervous, because if the app passes 10 to IntArrayAllocator, it means we hold onto 10 int[] sized to the number of ords right?

          Why try to recycle the int[]'s? Why not let GC handle it...?

          Show
          Michael McCandless added a comment - Maybe you should take the IntArrayAllocator from the outside? This actually makes me sort of nervous, because if the app passes 10 to IntArrayAllocator, it means we hold onto 10 int[] sized to the number of ords right? Why try to recycle the int[]'s? Why not let GC handle it...?
          Hide
          Shai Erera added a comment -

          Why try to recycle the int[]'s? Why not let GC handle it...?

          It was Gilad who mentioned "believes" and "religions" .. that code is written since Java 1.4. Not sure that at the time Java was very good at allocating and disposing arrays ... Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...

          Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?

          Show
          Shai Erera added a comment - Why try to recycle the int[]'s? Why not let GC handle it...? It was Gilad who mentioned "believes" and "religions" .. that code is written since Java 1.4. Not sure that at the time Java was very good at allocating and disposing arrays ... Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ... Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?
          Hide
          Michael McCandless added a comment -

          Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ...

          Actually we stopped recycling with DWPT ... now we let GC do its job. But, also, when IW did this, it was internal (no public API was affected) ... I don't like that the app can/should pass in IntArrayAllocator to the public APIs.

          Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones?

          OK I'll open a new issue. Rather than adding a cleanup thread to the current impl, I think we should remove Int/FloatArrayAllocator and just do new int[]/float[]? And only add it back if we can prove there's a performance gain? I think we should let Java/GC do its job ...

          Show
          Michael McCandless added a comment - Also, it's not like this is unique to facets. IIRC, IndexWriter also holds onto some char[] or byte[] arrays? At least, few days ago someone asked me how come IW never releases some 100 MB of char[] - since he set RAM buffer size to 128 MB, it made sense to me ... Actually we stopped recycling with DWPT ... now we let GC do its job. But, also, when IW did this, it was internal (no public API was affected) ... I don't like that the app can/should pass in IntArrayAllocator to the public APIs. Perhaps leave it for now, and separately (new issue? ) we can test if allocating a new array is costly? If it turns out that this is actually important, we can have a cleanup thread to reclaim the unused ones? OK I'll open a new issue. Rather than adding a cleanup thread to the current impl, I think we should remove Int/FloatArrayAllocator and just do new int[]/float[]? And only add it back if we can prove there's a performance gain? I think we should let Java/GC do its job ...
          Hide
          Shai Erera added a comment -

          Ok, let's continue the discussion on LUCENE-4615. The Allocator is also used to pass the array between different objects, but perhaps there are other ways too.

          Show
          Shai Erera added a comment - Ok, let's continue the discussion on LUCENE-4615 . The Allocator is also used to pass the array between different objects, but perhaps there are other ways too.
          Hide
          Shai Erera added a comment -

          Patch introduces CountingFacetsCollector, very similar to Mike's version, only "productized".

          Made FacetsCollector abstract with a utility create() method which returns either CountingFacetsCollector or StandardFacetsCollector (previously, FC), given the parameters.

          All tests were migrated to use FC.create and all pass (utilizing the new collector). Still, I wrote a dedicated test for the new Collector too.

          Preliminary results that we have, show nice improvements w/ this Collector. Mike, can you paste them here?

          There are some nocommits, which I will resolve before committing. But before that, I'd like to compare this Collector to ones that use different abstractions from the code, e.g. IntDecoder (vs hard-wiring to dgap+vint), CategoryListIterator etc.

          Also, I also want to compare this Collector to one that in collect() marks a bitset, and does all the work in getFacetResults.

          Show
          Shai Erera added a comment - Patch introduces CountingFacetsCollector, very similar to Mike's version, only "productized". Made FacetsCollector abstract with a utility create() method which returns either CountingFacetsCollector or StandardFacetsCollector (previously, FC), given the parameters. All tests were migrated to use FC.create and all pass (utilizing the new collector). Still, I wrote a dedicated test for the new Collector too. Preliminary results that we have, show nice improvements w/ this Collector. Mike, can you paste them here? There are some nocommits, which I will resolve before committing. But before that, I'd like to compare this Collector to ones that use different abstractions from the code, e.g. IntDecoder (vs hard-wiring to dgap+vint), CategoryListIterator etc. Also, I also want to compare this Collector to one that in collect() marks a bitset, and does all the work in getFacetResults.
          Hide
          Michael McCandless added a comment -

          Patch looks great: +1

          And this is a healthy speedup, on the Wikipedia 1M / 25 ords per doc test:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          PKLookup      239.18      (1.5%)      238.87      (1.1%)   -0.1% (  -2% -    2%)
                           LowTerm       98.99      (3.1%)      135.95      (1.8%)   37.3% (  31% -   43%)
                          HighTerm       20.95      (1.2%)       29.08      (2.4%)   38.8% (  34% -   42%)
                           MedTerm       34.55      (1.5%)       48.31      (2.0%)   39.8% (  35% -   43%)
          
          Show
          Michael McCandless added a comment - Patch looks great: +1 And this is a healthy speedup, on the Wikipedia 1M / 25 ords per doc test: Task QPS base StdDev QPS comp StdDev Pct diff PKLookup 239.18 (1.5%) 238.87 (1.1%) -0.1% ( -2% - 2%) LowTerm 98.99 (3.1%) 135.95 (1.8%) 37.3% ( 31% - 43%) HighTerm 20.95 (1.2%) 29.08 (2.4%) 38.8% ( 34% - 42%) MedTerm 34.55 (1.5%) 48.31 (2.0%) 39.8% ( 35% - 43%)
          Hide
          Shai Erera added a comment -

          handle some nocommits. Now there's no translation from OrdinalValue to FRNImpl in getFacetResults (the latter is used directly in the queue). I wonder if this buys us anything.

          Show
          Shai Erera added a comment - handle some nocommits. Now there's no translation from OrdinalValue to FRNImpl in getFacetResults (the latter is used directly in the queue). I wonder if this buys us anything.
          Hide
          Michael McCandless added a comment -

          It's faster!

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          PKLookup      239.75      (1.2%)      237.59      (1.0%)   -0.9% (  -3% -    1%)
                          HighTerm       21.21      (1.5%)       29.80      (2.6%)   40.5% (  35% -   45%)
                           MedTerm       34.90      (1.9%)       50.24      (1.9%)   44.0% (  39% -   48%)
                           LowTerm       99.85      (3.7%)      152.40      (1.1%)   52.6% (  46% -   59%)
          
          Show
          Michael McCandless added a comment - It's faster! Task QPS base StdDev QPS comp StdDev Pct diff PKLookup 239.75 (1.2%) 237.59 (1.0%) -0.9% ( -3% - 1%) HighTerm 21.21 (1.5%) 29.80 (2.6%) 40.5% ( 35% - 45%) MedTerm 34.90 (1.9%) 50.24 (1.9%) 44.0% ( 39% - 48%) LowTerm 99.85 (3.7%) 152.40 (1.1%) 52.6% ( 46% - 59%)
          Hide
          Shai Erera added a comment -

          Patch adds two Collectors:

          • DecoderCountingFacetsCollector, which uses the IntDecoder abstraction (but the rest is like CountingFacetsCollector)
          • PostCollectionCountingFacetsCollector, which moves the work from collect() to getFacetResults(). In collect(), it keeps a per-DocValues.Source bits (FixedBitSet) of the matching docs.

          I wonder how these two compare to CountingFacetsCollector. I modified FacetsCollector.create() to return any of the 3, so just make sure to comment out the irrelevant ones in the benchmark.

          Show
          Shai Erera added a comment - Patch adds two Collectors: DecoderCountingFacetsCollector, which uses the IntDecoder abstraction (but the rest is like CountingFacetsCollector) PostCollectionCountingFacetsCollector, which moves the work from collect() to getFacetResults(). In collect(), it keeps a per-DocValues.Source bits (FixedBitSet) of the matching docs. I wonder how these two compare to CountingFacetsCollector. I modified FacetsCollector.create() to return any of the 3, so just make sure to comment out the irrelevant ones in the benchmark.
          Hide
          Michael McCandless added a comment -

          Base = DecoderCountingFacetsCollector; comp=CountingFacetsCollector:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       25.67      (1.6%)       30.45      (1.9%)   18.6% (  14% -   22%)
                           LowTerm      145.87      (1.0%)      154.38      (0.8%)    5.8% (   4% -    7%)
                           MedTerm       44.45      (1.4%)       51.01      (1.5%)   14.8% (  11% -   17%)
                          PKLookup      240.08      (0.9%)      239.94      (1.0%)   -0.1% (  -1% -    1%)
          

          So it seems like the IntDecoder abstractions hurt ...

          Base = DecoderCountingFacetsCollector; comp=PostCollectionCountingFacetsCollector:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       30.46      (0.8%)       30.16      (2.1%)   -1.0% (  -3% -    2%)
                           LowTerm      142.89      (0.5%)      153.94      (0.8%)    7.7% (   6% -    9%)
                           MedTerm       50.46      (0.8%)       50.65      (1.8%)    0.4% (  -2% -    2%)
                          PKLookup      238.65      (1.1%)      238.55      (0.9%)   -0.0% (  -2% -    2%)
          

          This is very interesting! And good news for sampling?

          Show
          Michael McCandless added a comment - Base = DecoderCountingFacetsCollector; comp=CountingFacetsCollector: Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 25.67 (1.6%) 30.45 (1.9%) 18.6% ( 14% - 22%) LowTerm 145.87 (1.0%) 154.38 (0.8%) 5.8% ( 4% - 7%) MedTerm 44.45 (1.4%) 51.01 (1.5%) 14.8% ( 11% - 17%) PKLookup 240.08 (0.9%) 239.94 (1.0%) -0.1% ( -1% - 1%) So it seems like the IntDecoder abstractions hurt ... Base = DecoderCountingFacetsCollector; comp=PostCollectionCountingFacetsCollector: Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 30.46 (0.8%) 30.16 (2.1%) -1.0% ( -3% - 2%) LowTerm 142.89 (0.5%) 153.94 (0.8%) 7.7% ( 6% - 9%) MedTerm 50.46 (0.8%) 50.65 (1.8%) 0.4% ( -2% - 2%) PKLookup 238.65 (1.1%) 238.55 (0.9%) -0.0% ( -2% - 2%) This is very interesting! And good news for sampling?
          Hide
          Shai Erera added a comment -

          ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint specialization is simple, specializing e.g. a packed-ints (or whatever other block encoding algorithm we'll come up with on LUCENE-4609) will make the code uglier .

          It looks like PostCollection doesn't hurt much? Can you compare it to Counting directly? I'm confused by the results ... they seem to improve the Decoder collector, but not sure how it will match to Counting. If the differences are miniscule (to any direction), then it could mean good news to sampling, because then we will be able to fold in sampling to this specialized Collector. But it would also mean that we can fold in complements (TotalFacetCounts).

          So it looks like using any abstraction will hurt us. I didn't even try Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. the Collector will decode into an IntsRef, not using IntDecoder, and then delegate to Aggregator) – seems useless to me, as counting + default decoding are the common scenario that we want to target.

          Based on the Counting vs PostCollection results, we should decide whether to always do post-collection in Counting, or not. Folding in Sampling and Complements should be done separately, because they are not so easy to bring in w/ the current state of the API.

          Show
          Shai Erera added a comment - ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint specialization is simple, specializing e.g. a packed-ints (or whatever other block encoding algorithm we'll come up with on LUCENE-4609 ) will make the code uglier . It looks like PostCollection doesn't hurt much? Can you compare it to Counting directly? I'm confused by the results ... they seem to improve the Decoder collector, but not sure how it will match to Counting. If the differences are miniscule (to any direction), then it could mean good news to sampling, because then we will be able to fold in sampling to this specialized Collector. But it would also mean that we can fold in complements (TotalFacetCounts). So it looks like using any abstraction will hurt us. I didn't even try Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. the Collector will decode into an IntsRef, not using IntDecoder, and then delegate to Aggregator) – seems useless to me, as counting + default decoding are the common scenario that we want to target. Based on the Counting vs PostCollection results, we should decide whether to always do post-collection in Counting, or not. Folding in Sampling and Complements should be done separately, because they are not so easy to bring in w/ the current state of the API.
          Hide
          Shai Erera added a comment -

          Hmm, it occurred to me that maybe your second comparison was between PostCollection and Counting? If so, then while it's indeed interesting, it's puzzling. PostCollection allocates FixedBitSet for every segment and in the end obtains a DISI from each FBS. As much as I know, DISIs over bitsets are not so cheap, especially when nextDoc() is called, because they need to find the next set bit ... if indeed it's faster, we must get to the bottom of it. It could mean other Collector could benefit from such post-collection technique ...

          While on that, is the best way to iterate on a bitset's set bits via DISI? I'm looking at OpenBitSetDISI.nextDoc() and it looks much more expensive than FixedBitSet.nextSetBit(). I modified PostCollection to do:

          while (doc < length && (doc = bits.nextSetBit(doc)) != -1) {
            .. the previous code
            ++doc;
          }
          

          And all tests pass with this change too. I wonder if that's faster than DISI.

          BTW, while making this change I noticed that I have a slight inefficiency in all 3 Collectors. If the document has not facets, I should have returned, but I forgot the return statement, e.g.:

              if (buf.length == 0) {
                // this document has no facets
                return; // THAT LINE WAS MISSING!
              }
          

          The code is still correct, just doing some redundant extra instructions. I'll upload an updated patch, with both changes shortly.

          Show
          Shai Erera added a comment - Hmm, it occurred to me that maybe your second comparison was between PostCollection and Counting? If so, then while it's indeed interesting, it's puzzling. PostCollection allocates FixedBitSet for every segment and in the end obtains a DISI from each FBS. As much as I know, DISIs over bitsets are not so cheap, especially when nextDoc() is called, because they need to find the next set bit ... if indeed it's faster, we must get to the bottom of it. It could mean other Collector could benefit from such post-collection technique ... While on that, is the best way to iterate on a bitset's set bits via DISI? I'm looking at OpenBitSetDISI.nextDoc() and it looks much more expensive than FixedBitSet.nextSetBit(). I modified PostCollection to do: while (doc < length && (doc = bits.nextSetBit(doc)) != -1) { .. the previous code ++doc; } And all tests pass with this change too. I wonder if that's faster than DISI. BTW, while making this change I noticed that I have a slight inefficiency in all 3 Collectors. If the document has not facets, I should have returned, but I forgot the return statement, e.g.: if (buf.length == 0) { // this document has no facets return ; // THAT LINE WAS MISSING! } The code is still correct, just doing some redundant extra instructions. I'll upload an updated patch, with both changes shortly.
          Hide
          Shai Erera added a comment -

          Patch fixes the missing return statement in all 3 collectors, as well as moves from DISI to nextSetBit.

          Mike, is it possible to compare Counting and PostCollection to trunk, instead of to each other?

          Show
          Shai Erera added a comment - Patch fixes the missing return statement in all 3 collectors, as well as moves from DISI to nextSetBit. Mike, is it possible to compare Counting and PostCollection to trunk, instead of to each other?
          Hide
          Michael McCandless added a comment -

          Can you compare it to Counting directly?

          Ugh, sorry, that is in fact what I ran but I put the wrong base/comp above it. The test was actually base = PostCollectionCountingFacetsCollector, comp = CountingFacetsCollector.

          Show
          Michael McCandless added a comment - Can you compare it to Counting directly? Ugh, sorry, that is in fact what I ran but I put the wrong base/comp above it. The test was actually base = PostCollectionCountingFacetsCollector, comp = CountingFacetsCollector.
          Hide
          Michael McCandless added a comment -

          StandardFacetsCollector (base) vs DecoderCountingFacetsCollector (comp):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       21.44      (1.4%)       25.71      (1.3%)   19.9% (  16% -   22%)
                           LowTerm       99.73      (3.2%)      145.71      (1.2%)   46.1% (  40% -   52%)
                           MedTerm       35.13      (1.6%)       44.46      (1.1%)   26.6% (  23% -   29%)
                          PKLookup      241.15      (1.0%)      238.90      (1.0%)   -0.9% (  -2% -    1%)
          

          StandardFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       21.26      (0.9%)       31.36      (1.4%)   47.5% (  44% -   50%)
                           LowTerm       99.84      (3.2%)      159.17      (0.7%)   59.4% (  53% -   65%)
                           MedTerm       34.91      (1.3%)       52.65      (1.2%)   50.8% (  47% -   54%)
                          PKLookup      238.08      (1.3%)      238.26      (1.2%)    0.1% (  -2% -    2%)
          

          StandardFacetsCollector (base) vs CountingFacetsCollector (comp):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       21.35      (1.3%)       30.26      (2.9%)   41.7% (  37% -   46%)
                           LowTerm      100.45      (4.0%)      153.26      (1.1%)   52.6% (  45% -   60%)
                           MedTerm       35.02      (1.9%)       50.77      (2.0%)   45.0% (  40% -   49%)
                          PKLookup      237.88      (2.4%)      239.34      (0.9%)    0.6% (  -2% -    4%)
          
          Show
          Michael McCandless added a comment - StandardFacetsCollector (base) vs DecoderCountingFacetsCollector (comp): Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 21.44 (1.4%) 25.71 (1.3%) 19.9% ( 16% - 22%) LowTerm 99.73 (3.2%) 145.71 (1.2%) 46.1% ( 40% - 52%) MedTerm 35.13 (1.6%) 44.46 (1.1%) 26.6% ( 23% - 29%) PKLookup 241.15 (1.0%) 238.90 (1.0%) -0.9% ( -2% - 1%) StandardFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp): Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 21.26 (0.9%) 31.36 (1.4%) 47.5% ( 44% - 50%) LowTerm 99.84 (3.2%) 159.17 (0.7%) 59.4% ( 53% - 65%) MedTerm 34.91 (1.3%) 52.65 (1.2%) 50.8% ( 47% - 54%) PKLookup 238.08 (1.3%) 238.26 (1.2%) 0.1% ( -2% - 2%) StandardFacetsCollector (base) vs CountingFacetsCollector (comp): Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 21.35 (1.3%) 30.26 (2.9%) 41.7% ( 37% - 46%) LowTerm 100.45 (4.0%) 153.26 (1.1%) 52.6% ( 45% - 60%) MedTerm 35.02 (1.9%) 50.77 (2.0%) 45.0% ( 40% - 49%) PKLookup 237.88 (2.4%) 239.34 (0.9%) 0.6% ( -2% - 4%)
          Hide
          Michael McCandless added a comment -

          I re-ran CountingFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                          HighTerm       30.15      (1.4%)       30.97      (1.1%)    2.7% (   0% -    5%)
                           LowTerm      153.06      (0.4%)      158.26      (0.7%)    3.4% (   2% -    4%)
                           MedTerm       50.69      (0.9%)       52.29      (0.9%)    3.2% (   1% -    5%)
                          PKLookup      238.04      (1.3%)      236.79      (1.8%)   -0.5% (  -3% -    2%)
          

          I think the cutover away from DISI made it faster ... and it's surprising this (allocate bit set, set the bits, revisit the set bits in the end) is faster than count-as-you-go.

          Show
          Michael McCandless added a comment - I re-ran CountingFacetsCollector (base) vs PostCollectionCountingFacetsCollector (comp): Task QPS base StdDev QPS comp StdDev Pct diff HighTerm 30.15 (1.4%) 30.97 (1.1%) 2.7% ( 0% - 5%) LowTerm 153.06 (0.4%) 158.26 (0.7%) 3.4% ( 2% - 4%) MedTerm 50.69 (0.9%) 52.29 (0.9%) 3.2% ( 1% - 5%) PKLookup 238.04 (1.3%) 236.79 (1.8%) -0.5% ( -3% - 2%) I think the cutover away from DISI made it faster ... and it's surprising this (allocate bit set, set the bits, revisit the set bits in the end) is faster than count-as-you-go.
          Hide
          Shai Erera added a comment -

          I'm surprised too. Throwing off a wild idea, maybe the post collection buys us a locality of reference in terms of the counts[] (and maybe even DocValues.Source?

          It almost feels counter-intuitive, right? CountingFC's operations are a subset of PostCollectionCFC. The latter adds many bitwise operations, ifs, loops and what not. So what do we do? Stick w/ post-collection?

          Show
          Shai Erera added a comment - I'm surprised too. Throwing off a wild idea, maybe the post collection buys us a locality of reference in terms of the counts[] (and maybe even DocValues.Source? It almost feels counter-intuitive, right? CountingFC's operations are a subset of PostCollectionCFC. The latter adds many bitwise operations, ifs, loops and what not. So what do we do? Stick w/ post-collection?
          Hide
          Michael McCandless added a comment -

          I ran the same test, but w/ the full set of query categories:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                        AndHighLow      111.98      (1.0%)      110.10      (1.0%)   -1.7% (  -3% -    0%)
                      HighSpanNear      128.42      (1.4%)      126.32      (1.1%)   -1.6% (  -4% -    0%)
                       LowSpanNear      128.68      (1.4%)      126.59      (1.0%)   -1.6% (  -3% -    0%)
                       MedSpanNear      128.18      (1.3%)      126.29      (1.1%)   -1.5% (  -3% -    0%)
                           Respell       55.79      (3.9%)       55.35      (4.8%)   -0.8% (  -9% -    8%)
                          PKLookup      206.89      (1.1%)      208.08      (1.5%)    0.6% (  -2% -    3%)
                            Fuzzy2       36.21      (1.3%)       36.49      (2.3%)    0.8% (  -2% -    4%)
                         MedPhrase       56.42      (1.4%)       56.94      (1.3%)    0.9% (  -1% -    3%)
                          Wildcard       64.26      (3.8%)       64.88      (2.0%)    1.0% (  -4% -    7%)
                        AndHighMed       51.80      (0.7%)       52.44      (1.2%)    1.2% (   0% -    3%)
                            IntNRQ       18.49      (4.8%)       18.78      (5.5%)    1.6% (  -8% -   12%)
                           LowTerm       41.15      (0.6%)       41.82      (0.9%)    1.6% (   0% -    3%)
                           Prefix3       46.94      (4.3%)       47.92      (3.4%)    2.1% (  -5% -   10%)
                           MedTerm       18.47      (0.8%)       18.92      (1.3%)    2.4% (   0% -    4%)
                        HighPhrase       15.16      (6.2%)       15.77      (4.3%)    4.0% (  -6% -   15%)
                          HighTerm        6.76      (1.2%)        7.07      (1.2%)    4.5% (   2% -    7%)
                   LowSloppyPhrase       17.14      (3.8%)       17.96      (2.3%)    4.8% (  -1% -   11%)
                            Fuzzy1       27.29      (0.8%)       28.62      (1.4%)    4.9% (   2% -    7%)
                   MedSloppyPhrase       17.64      (2.4%)       18.90      (1.0%)    7.2% (   3% -   10%)
                       AndHighHigh       11.11      (0.5%)       11.97      (0.9%)    7.7% (   6% -    9%)
                  HighSloppyPhrase        0.83     (10.5%)        0.91      (5.9%)   10.1% (  -5% -   29%)
                         LowPhrase       15.83      (3.2%)       17.45      (0.2%)   10.2% (   6% -   14%)
                        OrHighHigh        3.22      (0.7%)        3.80      (1.5%)   18.1% (  15% -   20%)
                         OrHighLow        5.68      (0.3%)        6.73      (1.5%)   18.4% (  16% -   20%)
                         OrHighMed        5.61      (0.5%)        6.66      (1.6%)   18.7% (  16% -   20%)
          

          Somehow post-collection is a big gain for the Or queries ... I wonder if somehow we are not getting the out of order scorer (BooleanScorer) w/ CountingCollector ... but looking at both collectors they both return true from acceptsDocsOutOfOrder ...

          Net/net it seems like we should stick with post collection? The possible downside is memory use of the temporary bit set I guess ...

          Show
          Michael McCandless added a comment - I ran the same test, but w/ the full set of query categories: Task QPS base StdDev QPS comp StdDev Pct diff AndHighLow 111.98 (1.0%) 110.10 (1.0%) -1.7% ( -3% - 0%) HighSpanNear 128.42 (1.4%) 126.32 (1.1%) -1.6% ( -4% - 0%) LowSpanNear 128.68 (1.4%) 126.59 (1.0%) -1.6% ( -3% - 0%) MedSpanNear 128.18 (1.3%) 126.29 (1.1%) -1.5% ( -3% - 0%) Respell 55.79 (3.9%) 55.35 (4.8%) -0.8% ( -9% - 8%) PKLookup 206.89 (1.1%) 208.08 (1.5%) 0.6% ( -2% - 3%) Fuzzy2 36.21 (1.3%) 36.49 (2.3%) 0.8% ( -2% - 4%) MedPhrase 56.42 (1.4%) 56.94 (1.3%) 0.9% ( -1% - 3%) Wildcard 64.26 (3.8%) 64.88 (2.0%) 1.0% ( -4% - 7%) AndHighMed 51.80 (0.7%) 52.44 (1.2%) 1.2% ( 0% - 3%) IntNRQ 18.49 (4.8%) 18.78 (5.5%) 1.6% ( -8% - 12%) LowTerm 41.15 (0.6%) 41.82 (0.9%) 1.6% ( 0% - 3%) Prefix3 46.94 (4.3%) 47.92 (3.4%) 2.1% ( -5% - 10%) MedTerm 18.47 (0.8%) 18.92 (1.3%) 2.4% ( 0% - 4%) HighPhrase 15.16 (6.2%) 15.77 (4.3%) 4.0% ( -6% - 15%) HighTerm 6.76 (1.2%) 7.07 (1.2%) 4.5% ( 2% - 7%) LowSloppyPhrase 17.14 (3.8%) 17.96 (2.3%) 4.8% ( -1% - 11%) Fuzzy1 27.29 (0.8%) 28.62 (1.4%) 4.9% ( 2% - 7%) MedSloppyPhrase 17.64 (2.4%) 18.90 (1.0%) 7.2% ( 3% - 10%) AndHighHigh 11.11 (0.5%) 11.97 (0.9%) 7.7% ( 6% - 9%) HighSloppyPhrase 0.83 (10.5%) 0.91 (5.9%) 10.1% ( -5% - 29%) LowPhrase 15.83 (3.2%) 17.45 (0.2%) 10.2% ( 6% - 14%) OrHighHigh 3.22 (0.7%) 3.80 (1.5%) 18.1% ( 15% - 20%) OrHighLow 5.68 (0.3%) 6.73 (1.5%) 18.4% ( 16% - 20%) OrHighMed 5.61 (0.5%) 6.66 (1.6%) 18.7% ( 16% - 20%) Somehow post-collection is a big gain for the Or queries ... I wonder if somehow we are not getting the out of order scorer (BooleanScorer) w/ CountingCollector ... but looking at both collectors they both return true from acceptsDocsOutOfOrder ... Net/net it seems like we should stick with post collection? The possible downside is memory use of the temporary bit set I guess ...
          Hide
          Michael McCandless added a comment -

          I confirmed that the Or queries are using BooleanScorer in both base and comp, so those gains are "real".

          Show
          Michael McCandless added a comment - I confirmed that the Or queries are using BooleanScorer in both base and comp, so those gains are "real".
          Hide
          Michael McCandless added a comment -

          Results if I rebuild the index with NO_PARENTS (just to make sure the locality gains are not due to frequently visiting the parent ords in the count array):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                           Respell       55.59      (3.9%)       54.45      (3.4%)   -2.0% (  -8% -    5%)
                            IntNRQ       18.34      (7.1%)       18.04      (6.4%)   -1.7% ( -14% -   12%)
                        AndHighLow       86.87      (0.6%)       86.26      (1.9%)   -0.7% (  -3% -    1%)
                       MedSpanNear       97.31      (0.9%)       96.63      (1.8%)   -0.7% (  -3% -    1%)
                           Prefix3       46.40      (5.6%)       46.11      (4.6%)   -0.6% ( -10% -   10%)
                       LowSpanNear       97.76      (0.9%)       97.28      (1.8%)   -0.5% (  -3% -    2%)
                            Fuzzy2       31.88      (1.6%)       31.77      (2.7%)   -0.3% (  -4% -    3%)
                          Wildcard       62.53      (2.9%)       62.34      (2.5%)   -0.3% (  -5% -    5%)
                          PKLookup      210.69      (1.5%)      210.37      (1.8%)   -0.1% (  -3% -    3%)
                      HighSpanNear       97.44      (1.4%)       97.35      (1.7%)   -0.1% (  -3% -    3%)
                         MedPhrase       49.87      (2.4%)       50.18      (2.5%)    0.6% (  -4% -    5%)
                        HighPhrase       14.32      (8.8%)       14.42      (8.8%)    0.7% ( -15% -   20%)
                           LowTerm       37.64      (0.5%)       37.90      (1.3%)    0.7% (  -1% -    2%)
                        AndHighMed       45.23      (0.6%)       45.74      (1.1%)    1.1% (   0% -    2%)
                           MedTerm       22.53      (1.0%)       23.00      (1.3%)    2.1% (   0% -    4%)
                   LowSloppyPhrase       16.27      (2.5%)       16.65      (5.7%)    2.3% (  -5% -   10%)
                            Fuzzy1       24.86      (1.7%)       25.87      (1.4%)    4.1% (   0% -    7%)
                          HighTerm        7.67      (1.6%)        8.00      (2.4%)    4.3% (   0% -    8%)
                   MedSloppyPhrase       16.67      (1.2%)       17.58      (3.1%)    5.5% (   1% -    9%)
                  HighSloppyPhrase        0.81      (6.6%)        0.86     (12.8%)    6.9% ( -11% -   28%)
                       AndHighHigh       11.38      (0.8%)       12.18      (1.2%)    7.1% (   5% -    9%)
                         LowPhrase       14.69      (4.7%)       15.82      (5.7%)    7.6% (  -2% -   18%)
                        OrHighHigh        3.60      (2.3%)        4.32      (3.3%)   20.0% (  14% -   26%)
                         OrHighMed        6.20      (1.9%)        7.51      (3.0%)   21.1% (  15% -   26%)
                         OrHighLow        6.25      (2.0%)        7.60      (2.4%)   21.7% (  17% -   26%)
          

          So net/net post is still better! Separately it looks like NO_PARENTS is maybe ~10% faster for the high-cost queries, but slower for the low cost queries ... which is expected because iterating over 2.2 M ords in the end is a fixed non-trivial cost ...

          Show
          Michael McCandless added a comment - Results if I rebuild the index with NO_PARENTS (just to make sure the locality gains are not due to frequently visiting the parent ords in the count array): Task QPS base StdDev QPS comp StdDev Pct diff Respell 55.59 (3.9%) 54.45 (3.4%) -2.0% ( -8% - 5%) IntNRQ 18.34 (7.1%) 18.04 (6.4%) -1.7% ( -14% - 12%) AndHighLow 86.87 (0.6%) 86.26 (1.9%) -0.7% ( -3% - 1%) MedSpanNear 97.31 (0.9%) 96.63 (1.8%) -0.7% ( -3% - 1%) Prefix3 46.40 (5.6%) 46.11 (4.6%) -0.6% ( -10% - 10%) LowSpanNear 97.76 (0.9%) 97.28 (1.8%) -0.5% ( -3% - 2%) Fuzzy2 31.88 (1.6%) 31.77 (2.7%) -0.3% ( -4% - 3%) Wildcard 62.53 (2.9%) 62.34 (2.5%) -0.3% ( -5% - 5%) PKLookup 210.69 (1.5%) 210.37 (1.8%) -0.1% ( -3% - 3%) HighSpanNear 97.44 (1.4%) 97.35 (1.7%) -0.1% ( -3% - 3%) MedPhrase 49.87 (2.4%) 50.18 (2.5%) 0.6% ( -4% - 5%) HighPhrase 14.32 (8.8%) 14.42 (8.8%) 0.7% ( -15% - 20%) LowTerm 37.64 (0.5%) 37.90 (1.3%) 0.7% ( -1% - 2%) AndHighMed 45.23 (0.6%) 45.74 (1.1%) 1.1% ( 0% - 2%) MedTerm 22.53 (1.0%) 23.00 (1.3%) 2.1% ( 0% - 4%) LowSloppyPhrase 16.27 (2.5%) 16.65 (5.7%) 2.3% ( -5% - 10%) Fuzzy1 24.86 (1.7%) 25.87 (1.4%) 4.1% ( 0% - 7%) HighTerm 7.67 (1.6%) 8.00 (2.4%) 4.3% ( 0% - 8%) MedSloppyPhrase 16.67 (1.2%) 17.58 (3.1%) 5.5% ( 1% - 9%) HighSloppyPhrase 0.81 (6.6%) 0.86 (12.8%) 6.9% ( -11% - 28%) AndHighHigh 11.38 (0.8%) 12.18 (1.2%) 7.1% ( 5% - 9%) LowPhrase 14.69 (4.7%) 15.82 (5.7%) 7.6% ( -2% - 18%) OrHighHigh 3.60 (2.3%) 4.32 (3.3%) 20.0% ( 14% - 26%) OrHighMed 6.20 (1.9%) 7.51 (3.0%) 21.1% ( 15% - 26%) OrHighLow 6.25 (2.0%) 7.60 (2.4%) 21.7% ( 17% - 26%) So net/net post is still better! Separately it looks like NO_PARENTS is maybe ~10% faster for the high-cost queries, but slower for the low cost queries ... which is expected because iterating over 2.2 M ords in the end is a fixed non-trivial cost ...
          Hide
          Shai Erera added a comment -

          Good. So I'll consolidate Post and Counting into one, an also add handling for NO_PARENTS case. Unfortunately, we cannot compare trunk vs patch for the NO_PARENTS case, unless we write a lot of redundant code (e.g. a NoParentsAccumulator). We'll have to suffice w/ the absolute QPS numbers I guess, which is about 12% improvements.

          Show
          Shai Erera added a comment - Good. So I'll consolidate Post and Counting into one, an also add handling for NO_PARENTS case. Unfortunately, we cannot compare trunk vs patch for the NO_PARENTS case, unless we write a lot of redundant code (e.g. a NoParentsAccumulator). We'll have to suffice w/ the absolute QPS numbers I guess, which is about 12% improvements.
          Hide
          Shai Erera added a comment -

          Patch finalizes CountingFacetsCollector to handle the specialized case of facets counting, doing the counting in post-aggregation. Also, it can handle OrdinalPolicy.NO_PARENTS, allowing to index only leaf ordinals, and counting up the parents after the leafs' counts have been resolved.

          Added a CHANGES entry, and updated some javadocs.

          Would be good if we can give this version a final comparison against trunk. For the ALL_PARENTS case, we can compare the pct diff, while for NO_PARENTS we can only compare absolute QPS for now.

          Show
          Shai Erera added a comment - Patch finalizes CountingFacetsCollector to handle the specialized case of facets counting, doing the counting in post-aggregation. Also, it can handle OrdinalPolicy.NO_PARENTS, allowing to index only leaf ordinals, and counting up the parents after the leafs' counts have been resolved. Added a CHANGES entry, and updated some javadocs. Would be good if we can give this version a final comparison against trunk. For the ALL_PARENTS case, we can compare the pct diff, while for NO_PARENTS we can only compare absolute QPS for now.
          Hide
          Michael McCandless added a comment -

          ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp):

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                           Respell       55.89      (3.2%)       55.13      (3.9%)   -1.4% (  -8% -    5%)
                          PKLookup      207.52      (1.6%)      206.95      (1.4%)   -0.3% (  -3% -    2%)
                          Wildcard       62.22      (3.2%)       62.94      (2.7%)    1.2% (  -4% -    7%)
                            IntNRQ       17.88      (5.2%)       18.16      (5.7%)    1.6% (  -8% -   13%)
                           Prefix3       45.56      (4.9%)       46.48      (4.1%)    2.0% (  -6% -   11%)
                  HighSloppyPhrase        0.80      (9.7%)        0.84      (8.5%)    4.9% ( -12% -   25%)
                        HighPhrase       13.52      (7.7%)       15.09      (8.1%)   11.6% (  -3% -   29%)
                   LowSloppyPhrase       15.02      (3.9%)       17.15      (4.0%)   14.1% (   5% -   22%)
                         LowPhrase       14.14      (4.3%)       16.77      (4.9%)   18.6% (   8% -   29%)
                   MedSloppyPhrase       14.81      (2.6%)       18.33      (2.7%)   23.7% (  17% -   29%)
                            Fuzzy2       27.57      (2.6%)       34.95      (3.1%)   26.8% (  20% -   33%)
                       AndHighHigh        9.39      (1.6%)       11.92      (1.4%)   27.0% (  23% -   30%)
                           MedTerm       14.63      (2.2%)       18.89      (1.7%)   29.1% (  24% -   33%)
                          HighTerm        5.28      (1.8%)        7.02      (2.4%)   33.0% (  28% -   37%)
                            Fuzzy1       20.79      (2.1%)       27.71      (2.8%)   33.3% (  27% -   39%)
                         OrHighLow        4.82      (1.8%)        6.70      (2.6%)   39.1% (  34% -   44%)
                         OrHighMed        4.74      (1.8%)        6.61      (3.0%)   39.4% (  34% -   44%)
                        OrHighHigh        2.68      (1.8%)        3.77      (2.9%)   40.9% (  35% -   46%)
                         MedPhrase       39.21      (3.6%)       55.35      (3.6%)   41.2% (  32% -   50%)
                        AndHighMed       36.29      (3.5%)       51.92      (2.0%)   43.1% (  36% -   50%)
                           LowTerm       27.96      (3.2%)       41.47      (2.2%)   48.3% (  41% -   55%)
                        AndHighLow       64.36      (5.4%)      107.94      (5.7%)   67.7% (  53% -   83%)
                       MedSpanNear       70.17      (6.1%)      123.23      (7.4%)   75.6% (  58% -   94%)
                       LowSpanNear       70.35      (6.0%)      123.59      (7.1%)   75.7% (  58% -   94%)
                      HighSpanNear       70.35      (6.1%)      123.69      (7.8%)   75.8% (  58% -   95%)
          

          These are nice gains!

          Show
          Michael McCandless added a comment - ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp): Task QPS base StdDev QPS comp StdDev Pct diff Respell 55.89 (3.2%) 55.13 (3.9%) -1.4% ( -8% - 5%) PKLookup 207.52 (1.6%) 206.95 (1.4%) -0.3% ( -3% - 2%) Wildcard 62.22 (3.2%) 62.94 (2.7%) 1.2% ( -4% - 7%) IntNRQ 17.88 (5.2%) 18.16 (5.7%) 1.6% ( -8% - 13%) Prefix3 45.56 (4.9%) 46.48 (4.1%) 2.0% ( -6% - 11%) HighSloppyPhrase 0.80 (9.7%) 0.84 (8.5%) 4.9% ( -12% - 25%) HighPhrase 13.52 (7.7%) 15.09 (8.1%) 11.6% ( -3% - 29%) LowSloppyPhrase 15.02 (3.9%) 17.15 (4.0%) 14.1% ( 5% - 22%) LowPhrase 14.14 (4.3%) 16.77 (4.9%) 18.6% ( 8% - 29%) MedSloppyPhrase 14.81 (2.6%) 18.33 (2.7%) 23.7% ( 17% - 29%) Fuzzy2 27.57 (2.6%) 34.95 (3.1%) 26.8% ( 20% - 33%) AndHighHigh 9.39 (1.6%) 11.92 (1.4%) 27.0% ( 23% - 30%) MedTerm 14.63 (2.2%) 18.89 (1.7%) 29.1% ( 24% - 33%) HighTerm 5.28 (1.8%) 7.02 (2.4%) 33.0% ( 28% - 37%) Fuzzy1 20.79 (2.1%) 27.71 (2.8%) 33.3% ( 27% - 39%) OrHighLow 4.82 (1.8%) 6.70 (2.6%) 39.1% ( 34% - 44%) OrHighMed 4.74 (1.8%) 6.61 (3.0%) 39.4% ( 34% - 44%) OrHighHigh 2.68 (1.8%) 3.77 (2.9%) 40.9% ( 35% - 46%) MedPhrase 39.21 (3.6%) 55.35 (3.6%) 41.2% ( 32% - 50%) AndHighMed 36.29 (3.5%) 51.92 (2.0%) 43.1% ( 36% - 50%) LowTerm 27.96 (3.2%) 41.47 (2.2%) 48.3% ( 41% - 55%) AndHighLow 64.36 (5.4%) 107.94 (5.7%) 67.7% ( 53% - 83%) MedSpanNear 70.17 (6.1%) 123.23 (7.4%) 75.6% ( 58% - 94%) LowSpanNear 70.35 (6.0%) 123.59 (7.1%) 75.7% ( 58% - 94%) HighSpanNear 70.35 (6.1%) 123.69 (7.8%) 75.8% ( 58% - 95%) These are nice gains!
          Hide
          Michael McCandless added a comment -

          NO_PARENTS CountingFacetsCollector vs itself (ie all differences are noise). Use the absolute QPS to compare to the "QPS comp" column above, eg MedTerm was 18.89 QPS above with ALL_PARENTS and with NO_PARENTS MedTerm is 22.67-22.80 QPS:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                        AndHighLow       85.20      (5.0%)       83.74      (5.7%)   -1.7% ( -11% -    9%)
                       LowSpanNear       95.25      (5.5%)       93.67      (6.8%)   -1.7% ( -13% -   11%)
                      HighSpanNear       95.19      (5.4%)       93.80      (6.7%)   -1.5% ( -12% -   11%)
                       MedSpanNear       94.97      (5.4%)       93.59      (6.8%)   -1.5% ( -12% -   11%)
                        AndHighMed       45.68      (2.8%)       45.29      (2.9%)   -0.9% (  -6% -    4%)
                         OrHighLow        7.62      (2.2%)        7.55      (2.2%)   -0.8% (  -5% -    3%)
                        OrHighHigh        4.33      (2.2%)        4.29      (2.2%)   -0.8% (  -5% -    3%)
                           LowTerm       38.17      (2.0%)       37.90      (2.2%)   -0.7% (  -4% -    3%)
                         OrHighMed        7.54      (2.2%)        7.49      (2.1%)   -0.7% (  -4% -    3%)
                           Prefix3       45.95      (4.3%)       45.68      (4.4%)   -0.6% (  -8% -    8%)
                           MedTerm       22.80      (2.2%)       22.67      (2.1%)   -0.6% (  -4% -    3%)
                            Fuzzy1       26.16      (1.9%)       26.04      (2.0%)   -0.4% (  -4% -    3%)
                            IntNRQ       17.94      (6.1%)       17.86      (6.2%)   -0.4% ( -11% -   12%)
                       AndHighHigh       12.33      (1.2%)       12.29      (1.3%)   -0.4% (  -2% -    2%)
                            Fuzzy2       32.00      (2.8%)       31.89      (3.0%)   -0.3% (  -5% -    5%)
                         MedPhrase       49.48      (3.9%)       49.32      (4.4%)   -0.3% (  -8% -    8%)
                          HighTerm        8.02      (2.1%)        8.00      (2.0%)   -0.2% (  -4% -    3%)
                          PKLookup      211.76      (1.4%)      211.32      (1.8%)   -0.2% (  -3% -    3%)
                          Wildcard       62.37      (2.3%)       62.28      (2.3%)   -0.1% (  -4% -    4%)
                   MedSloppyPhrase       17.49      (2.5%)       17.52      (2.7%)    0.2% (  -4% -    5%)
                           Respell       55.68      (5.0%)       55.85      (3.3%)    0.3% (  -7% -    9%)
                   LowSloppyPhrase       16.29      (4.7%)       16.43      (5.2%)    0.9% (  -8% -   11%)
                         LowPhrase       15.68      (5.3%)       15.81      (5.4%)    0.9% (  -9% -   12%)
                        HighPhrase       14.22      (8.7%)       14.45      (8.9%)    1.6% ( -14% -   21%)
                  HighSloppyPhrase        0.83      (9.3%)        0.85     (11.9%)    2.1% ( -17% -   25%)
          
          Show
          Michael McCandless added a comment - NO_PARENTS CountingFacetsCollector vs itself (ie all differences are noise). Use the absolute QPS to compare to the "QPS comp" column above, eg MedTerm was 18.89 QPS above with ALL_PARENTS and with NO_PARENTS MedTerm is 22.67-22.80 QPS: Task QPS base StdDev QPS comp StdDev Pct diff AndHighLow 85.20 (5.0%) 83.74 (5.7%) -1.7% ( -11% - 9%) LowSpanNear 95.25 (5.5%) 93.67 (6.8%) -1.7% ( -13% - 11%) HighSpanNear 95.19 (5.4%) 93.80 (6.7%) -1.5% ( -12% - 11%) MedSpanNear 94.97 (5.4%) 93.59 (6.8%) -1.5% ( -12% - 11%) AndHighMed 45.68 (2.8%) 45.29 (2.9%) -0.9% ( -6% - 4%) OrHighLow 7.62 (2.2%) 7.55 (2.2%) -0.8% ( -5% - 3%) OrHighHigh 4.33 (2.2%) 4.29 (2.2%) -0.8% ( -5% - 3%) LowTerm 38.17 (2.0%) 37.90 (2.2%) -0.7% ( -4% - 3%) OrHighMed 7.54 (2.2%) 7.49 (2.1%) -0.7% ( -4% - 3%) Prefix3 45.95 (4.3%) 45.68 (4.4%) -0.6% ( -8% - 8%) MedTerm 22.80 (2.2%) 22.67 (2.1%) -0.6% ( -4% - 3%) Fuzzy1 26.16 (1.9%) 26.04 (2.0%) -0.4% ( -4% - 3%) IntNRQ 17.94 (6.1%) 17.86 (6.2%) -0.4% ( -11% - 12%) AndHighHigh 12.33 (1.2%) 12.29 (1.3%) -0.4% ( -2% - 2%) Fuzzy2 32.00 (2.8%) 31.89 (3.0%) -0.3% ( -5% - 5%) MedPhrase 49.48 (3.9%) 49.32 (4.4%) -0.3% ( -8% - 8%) HighTerm 8.02 (2.1%) 8.00 (2.0%) -0.2% ( -4% - 3%) PKLookup 211.76 (1.4%) 211.32 (1.8%) -0.2% ( -3% - 3%) Wildcard 62.37 (2.3%) 62.28 (2.3%) -0.1% ( -4% - 4%) MedSloppyPhrase 17.49 (2.5%) 17.52 (2.7%) 0.2% ( -4% - 5%) Respell 55.68 (5.0%) 55.85 (3.3%) 0.3% ( -7% - 9%) LowSloppyPhrase 16.29 (4.7%) 16.43 (5.2%) 0.9% ( -8% - 11%) LowPhrase 15.68 (5.3%) 15.81 (5.4%) 0.9% ( -9% - 12%) HighPhrase 14.22 (8.7%) 14.45 (8.9%) 1.6% ( -14% - 21%) HighSloppyPhrase 0.83 (9.3%) 0.85 (11.9%) 2.1% ( -17% - 25%)
          Hide
          Michael McCandless added a comment -

          Also, total __dv. file size is 445 MB for ALL_PARENTS and 351 MB for
          NO_PARENTS.

          Show
          Michael McCandless added a comment - Also, total _ _dv. file size is 445 MB for ALL_PARENTS and 351 MB for NO_PARENTS.
          Hide
          Michael McCandless added a comment - - edited

          base = ALL_PARENTS, comp = NO_PARENTS:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                       MedSpanNear      125.77      (2.0%)       79.31      (0.8%)  -36.9% ( -38% -  -34%)
                       LowSpanNear      124.86      (2.7%)       79.23      (0.5%)  -36.5% ( -38% -  -34%)
                      HighSpanNear      124.23      (2.3%)       79.44      (0.8%)  -36.1% ( -38% -  -33%)
                        AndHighLow      107.24      (1.4%)       72.70      (0.7%)  -32.2% ( -33% -  -30%)
                         MedPhrase       55.98      (0.6%)       44.89      (1.4%)  -19.8% ( -21% -  -17%)
                        AndHighMed       52.06      (0.7%)       43.20      (0.0%)  -17.0% ( -17% -  -16%)
                            Fuzzy2       35.71      (0.6%)       30.42      (1.6%)  -14.8% ( -16% -  -12%)
                         LowPhrase       17.27      (0.3%)       15.21      (3.2%)  -11.9% ( -15% -   -8%)
                        HighPhrase       15.20      (6.2%)       13.50      (4.7%)  -11.2% ( -20% -    0%)
                           LowTerm       41.68      (0.4%)       37.49      (0.4%)  -10.1% ( -10% -   -9%)
                   LowSloppyPhrase       17.31      (2.9%)       15.75      (0.9%)   -9.0% ( -12% -   -5%)
                            Fuzzy1       28.11      (0.3%)       25.63      (0.0%)   -8.8% (  -9% -   -8%)
                   MedSloppyPhrase       18.42      (1.5%)       17.25      (0.1%)   -6.3% (  -7% -   -4%)
                           Respell       56.32      (0.3%)       54.41      (2.2%)   -3.4% (  -5% -    0%)
                  HighSloppyPhrase        0.83      (6.8%)        0.81      (1.0%)   -2.3% (  -9% -    5%)
                          Wildcard       63.43      (1.9%)       61.96      (0.3%)   -2.3% (  -4% -    0%)
                           Prefix3       45.60      (0.5%)       45.70      (0.7%)    0.2% (  -1% -    1%)
                            IntNRQ       17.54      (0.6%)       17.60      (1.4%)    0.3% (  -1% -    2%)
                          PKLookup      205.89      (0.5%)      210.73      (0.7%)    2.4% (   1% -    3%)
                       AndHighHigh       11.89      (0.2%)       12.48      (0.3%)    5.0% (   4% -    5%)
                          HighTerm        7.00      (0.2%)        8.09      (0.1%)   15.6% (  15% -   16%)
                        OrHighHigh        3.77      (0.6%)        4.36      (0.3%)   15.6% (  14% -   16%)
                         OrHighLow        6.65      (0.1%)        7.69      (1.5%)   15.6% (  14% -   17%)
                         OrHighMed        6.61      (0.4%)        7.66      (0.2%)   15.8% (  15% -   16%)
                           MedTerm       18.86      (0.4%)       22.13      (0.4%)   17.3% (  16% -   18%)
          

          I think because this test has 2.5M ords ... the cost of "rolling up" in the end is non-trivial ...

          Show
          Michael McCandless added a comment - - edited base = ALL_PARENTS, comp = NO_PARENTS: Task QPS base StdDev QPS comp StdDev Pct diff MedSpanNear 125.77 (2.0%) 79.31 (0.8%) -36.9% ( -38% - -34%) LowSpanNear 124.86 (2.7%) 79.23 (0.5%) -36.5% ( -38% - -34%) HighSpanNear 124.23 (2.3%) 79.44 (0.8%) -36.1% ( -38% - -33%) AndHighLow 107.24 (1.4%) 72.70 (0.7%) -32.2% ( -33% - -30%) MedPhrase 55.98 (0.6%) 44.89 (1.4%) -19.8% ( -21% - -17%) AndHighMed 52.06 (0.7%) 43.20 (0.0%) -17.0% ( -17% - -16%) Fuzzy2 35.71 (0.6%) 30.42 (1.6%) -14.8% ( -16% - -12%) LowPhrase 17.27 (0.3%) 15.21 (3.2%) -11.9% ( -15% - -8%) HighPhrase 15.20 (6.2%) 13.50 (4.7%) -11.2% ( -20% - 0%) LowTerm 41.68 (0.4%) 37.49 (0.4%) -10.1% ( -10% - -9%) LowSloppyPhrase 17.31 (2.9%) 15.75 (0.9%) -9.0% ( -12% - -5%) Fuzzy1 28.11 (0.3%) 25.63 (0.0%) -8.8% ( -9% - -8%) MedSloppyPhrase 18.42 (1.5%) 17.25 (0.1%) -6.3% ( -7% - -4%) Respell 56.32 (0.3%) 54.41 (2.2%) -3.4% ( -5% - 0%) HighSloppyPhrase 0.83 (6.8%) 0.81 (1.0%) -2.3% ( -9% - 5%) Wildcard 63.43 (1.9%) 61.96 (0.3%) -2.3% ( -4% - 0%) Prefix3 45.60 (0.5%) 45.70 (0.7%) 0.2% ( -1% - 1%) IntNRQ 17.54 (0.6%) 17.60 (1.4%) 0.3% ( -1% - 2%) PKLookup 205.89 (0.5%) 210.73 (0.7%) 2.4% ( 1% - 3%) AndHighHigh 11.89 (0.2%) 12.48 (0.3%) 5.0% ( 4% - 5%) HighTerm 7.00 (0.2%) 8.09 (0.1%) 15.6% ( 15% - 16%) OrHighHigh 3.77 (0.6%) 4.36 (0.3%) 15.6% ( 14% - 16%) OrHighLow 6.65 (0.1%) 7.69 (1.5%) 15.6% ( 14% - 17%) OrHighMed 6.61 (0.4%) 7.66 (0.2%) 15.8% ( 15% - 16%) MedTerm 18.86 (0.4%) 22.13 (0.4%) 17.3% ( 16% - 18%) I think because this test has 2.5M ords ... the cost of "rolling up" in the end is non-trivial ...
          Hide
          Shai Erera added a comment -

          Thanks for running this. I think that given these results, making NO_PARENTS the default policy is not that good. I anyway think it's not a good default, because it forces the user to stop and think if the documents that he'll index share or not parents. This looks like an advanced setting to me, i.e. if you want to get "expert" and really know your content, then you can choose to index like so. Plus, given those statistics, I'd say that you have to test before you go to production with it (i.e. looks like it may be expensive as the number of ordinals grow...).

          Mike found a bug in how I count up the parents in the NO_PARENTS case, so I fixed it (and added a test). I'll run tests a couple of times and commit this.

          Show
          Shai Erera added a comment - Thanks for running this. I think that given these results, making NO_PARENTS the default policy is not that good. I anyway think it's not a good default, because it forces the user to stop and think if the documents that he'll index share or not parents. This looks like an advanced setting to me, i.e. if you want to get "expert" and really know your content, then you can choose to index like so. Plus, given those statistics, I'd say that you have to test before you go to production with it (i.e. looks like it may be expensive as the number of ordinals grow...). Mike found a bug in how I count up the parents in the NO_PARENTS case, so I fixed it (and added a test). I'll run tests a couple of times and commit this.
          Hide
          Michael McCandless added a comment -

          The performance depends heavily on how many ords your taxo index has ... my last test was ~2.5M ords, but when I build an index leaving out the two dimensions (categories, username) with the most ords, leaving 4703 unique ords, the numbers are much better:

                              Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                           Prefix3      161.48      (6.1%)      161.99      (7.4%)    0.3% ( -12% -   14%)
                          PKLookup      235.50      (2.4%)      236.41      (2.1%)    0.4% (  -4% -    5%)
                           Respell       85.41      (4.4%)       85.92      (4.2%)    0.6% (  -7% -    9%)
                        AndHighLow     1196.56      (2.1%)     1204.67      (3.4%)    0.7% (  -4% -    6%)
                            IntNRQ      104.88      (6.7%)      105.77      (9.0%)    0.9% ( -13% -   17%)
                          Wildcard      215.17      (2.2%)      217.13      (2.6%)    0.9% (  -3% -    5%)
                  HighSloppyPhrase        3.24      (8.2%)        3.27      (9.2%)    1.0% ( -15% -   19%)
                       LowSpanNear       42.80      (3.0%)       43.68      (2.8%)    2.1% (  -3% -    8%)
                            Fuzzy2       84.83      (3.6%)       86.70      (2.8%)    2.2% (  -4% -    8%)
                      HighSpanNear       11.42      (1.9%)       11.70      (2.3%)    2.4% (  -1% -    6%)
                         LowPhrase       71.69      (6.8%)       73.91      (6.2%)    3.1% (  -9% -   17%)
                            Fuzzy1       75.53      (3.4%)       78.81      (2.7%)    4.3% (  -1% -   10%)
                        HighPhrase       42.58     (11.4%)       44.61     (11.5%)    4.8% ( -16% -   31%)
                   LowSloppyPhrase       80.22      (2.3%)       84.49      (3.1%)    5.3% (   0% -   10%)
                       MedSpanNear       85.37      (1.9%)       91.16      (1.8%)    6.8% (   3% -   10%)
                   MedSloppyPhrase       86.55      (2.7%)       92.84      (3.2%)    7.3% (   1% -   13%)
                         MedPhrase      145.23      (5.6%)      156.11      (6.1%)    7.5% (  -3% -   20%)
                        AndHighMed      321.74      (1.2%)      346.20      (1.5%)    7.6% (   4% -   10%)
                       AndHighHigh       84.28      (1.6%)       96.80      (1.7%)   14.9% (  11% -   18%)
                        OrHighHigh       35.03      (2.9%)       42.53      (4.6%)   21.4% (  13% -   29%)
                         OrHighMed       51.75      (3.0%)       63.90      (4.6%)   23.5% (  15% -   32%)
                         OrHighLow       50.41      (3.0%)       62.51      (4.7%)   24.0% (  15% -   32%)
                          HighTerm       58.55      (3.0%)       74.59      (4.2%)   27.4% (  19% -   35%)
                           LowTerm      355.14      (1.6%)      480.44      (2.3%)   35.3% (  30% -   39%)
                           MedTerm      206.44      (2.0%)      286.54      (3.1%)   38.8% (  33% -   44%)
          

          I also separately fixed a silly bug in luceneutil which was causing the Span queries to get 0 hits.

          Show
          Michael McCandless added a comment - The performance depends heavily on how many ords your taxo index has ... my last test was ~2.5M ords, but when I build an index leaving out the two dimensions (categories, username) with the most ords, leaving 4703 unique ords, the numbers are much better: Task QPS base StdDev QPS comp StdDev Pct diff Prefix3 161.48 (6.1%) 161.99 (7.4%) 0.3% ( -12% - 14%) PKLookup 235.50 (2.4%) 236.41 (2.1%) 0.4% ( -4% - 5%) Respell 85.41 (4.4%) 85.92 (4.2%) 0.6% ( -7% - 9%) AndHighLow 1196.56 (2.1%) 1204.67 (3.4%) 0.7% ( -4% - 6%) IntNRQ 104.88 (6.7%) 105.77 (9.0%) 0.9% ( -13% - 17%) Wildcard 215.17 (2.2%) 217.13 (2.6%) 0.9% ( -3% - 5%) HighSloppyPhrase 3.24 (8.2%) 3.27 (9.2%) 1.0% ( -15% - 19%) LowSpanNear 42.80 (3.0%) 43.68 (2.8%) 2.1% ( -3% - 8%) Fuzzy2 84.83 (3.6%) 86.70 (2.8%) 2.2% ( -4% - 8%) HighSpanNear 11.42 (1.9%) 11.70 (2.3%) 2.4% ( -1% - 6%) LowPhrase 71.69 (6.8%) 73.91 (6.2%) 3.1% ( -9% - 17%) Fuzzy1 75.53 (3.4%) 78.81 (2.7%) 4.3% ( -1% - 10%) HighPhrase 42.58 (11.4%) 44.61 (11.5%) 4.8% ( -16% - 31%) LowSloppyPhrase 80.22 (2.3%) 84.49 (3.1%) 5.3% ( 0% - 10%) MedSpanNear 85.37 (1.9%) 91.16 (1.8%) 6.8% ( 3% - 10%) MedSloppyPhrase 86.55 (2.7%) 92.84 (3.2%) 7.3% ( 1% - 13%) MedPhrase 145.23 (5.6%) 156.11 (6.1%) 7.5% ( -3% - 20%) AndHighMed 321.74 (1.2%) 346.20 (1.5%) 7.6% ( 4% - 10%) AndHighHigh 84.28 (1.6%) 96.80 (1.7%) 14.9% ( 11% - 18%) OrHighHigh 35.03 (2.9%) 42.53 (4.6%) 21.4% ( 13% - 29%) OrHighMed 51.75 (3.0%) 63.90 (4.6%) 23.5% ( 15% - 32%) OrHighLow 50.41 (3.0%) 62.51 (4.7%) 24.0% ( 15% - 32%) HighTerm 58.55 (3.0%) 74.59 (4.2%) 27.4% ( 19% - 35%) LowTerm 355.14 (1.6%) 480.44 (2.3%) 35.3% ( 30% - 39%) MedTerm 206.44 (2.0%) 286.54 (3.1%) 38.8% ( 33% - 44%) I also separately fixed a silly bug in luceneutil which was causing the Span queries to get 0 hits.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Shai Erera
          http://svn.apache.org/viewvc?view=revision&revision=1436435

          LUCENE-4600: add CountingFacetsCollector

          Show
          Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436435 LUCENE-4600 : add CountingFacetsCollector
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Shai Erera
          http://svn.apache.org/viewvc?view=revision&revision=1436446

          LUCENE-4600: add CountingFacetsCollector

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1436446 LUCENE-4600 : add CountingFacetsCollector
          Hide
          Shai Erera added a comment -

          Committed to trunk and 4x. Let's see if it makes nightly happy!

          Show
          Shai Erera added a comment - Committed to trunk and 4x. Let's see if it makes nightly happy!
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Shai Erera
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development