Lucene - Core
  1. Lucene - Core
  2. LUCENE-5015

Unexpected performance difference between SamplingAccumulator and StandardFacetAccumulator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.3
    • Fix Version/s: 4.4, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I have an unexpected performance difference between the SamplingAccumulator and the StandardFacetAccumulator.

      The case is an index with about 5M documents and each document containing about 10 fields. I created a facet on each of those fields. When searching to retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is about twice as fast as the StandardFacetAccumulator. This is expected and a nice speed-up.

      However, when I use more CountFacetRequests to retrieve facet-counts for more than one field, the speeds of the SampingAccumulator decreases, to the point where the StandardFacetAccumulator is faster.

       
      FacetRequests  Sampling    Standard
       1               391 ms     1100 ms
       2               531 ms     1095 ms 
       3               948 ms     1108 ms
       4              1400 ms     1110 ms
       5              1901 ms     1102 ms
      

      Is this behaviour normal? I did not expect it, as the SamplingAccumulator needs to do less work?

      Some code to show what I do:

      	searcher.search( facetsQuery, facetsCollector );
      	final List<FacetResult> collectedFacets = facetsCollector.getFacetResults();
      
      final FacetSearchParams facetSearchParams = new FacetSearchParams( facetRequests );
      
      FacetsCollector facetsCollector;
      
      if ( isSampled )
      {
      	facetsCollector =
      		FacetsCollector.create( new SamplingAccumulator( new RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
      }
      else
      {
      	facetsCollector = FacetsCollector.create( FacetsAccumulator.create( facetSearchParams, searcher.getIndexReader(), taxo ) );
      
      1. LUCENE-5015.patch
        21 kB
        Shai Erera
      2. LUCENE-5015.patch
        19 kB
        Gilad Barkai
      3. LUCENE-5015.patch
        21 kB
        Gilad Barkai
      4. LUCENE-5015.patch
        18 kB
        Gilad Barkai
      5. LUCENE-5015.patch
        18 kB
        Gilad Barkai
      6. LUCENE-5015.patch
        18 kB
        Gilad Barkai

        Activity

        Hide
        Gilad Barkai added a comment -

        Hello Rob,

        Indeed that looks unexpected.
        The immediate suspect is the "fixing" part of the sampling, where after sampled top-cK are computed for each facet request, each of the candidates for top-K gets a real count computation, rather than a count over the sampled set of results.

        How many results are in the result set? All the documents?

        Show
        Gilad Barkai added a comment - Hello Rob, Indeed that looks unexpected. The immediate suspect is the "fixing" part of the sampling, where after sampled top-cK are computed for each facet request, each of the candidates for top-K gets a real count computation, rather than a count over the sampled set of results. How many results are in the result set? All the documents?
        Hide
        Rob Audenaerde added a comment - - edited

        I use a MatchAddDocsQuery(), so I retrieve all the 5 million documents as hits.

        Show
        Rob Audenaerde added a comment - - edited I use a MatchAddDocsQuery(), so I retrieve all the 5 million documents as hits.
        Hide
        Gilad Barkai added a comment -

        Sampling, with its defaults, has its toll.

        In its defaults, Sampling aims to produce the exact top-K results for each request, as if a StandardFacetAccumulator would have been used. Meaning it aims at producing the same top-K with the same counts.

        The process begins with sampling the result set and computers the top-cK candidates for each of the M facet requests, producing amortized results. That part is faster than StandardFacetAccumulator because less documents' facets information gets processed.

        The next part is the "fixing", using a SampleFixer retrieved from a Sampler, in which "fixed" counts are produced which correlate better with the original document result set, rather than the sampled one. The default (and currently only implementation) for such fixer is TakmiSampleFixer which produced exact counts for each of the cK candidates for each of the M facet requests. The counts are not computed against the facet information of each document, but rather matching the skiplist of the drill-down term, of each such candidate category with the bitset of the (actual) document results. The amount of matches is the count.
        This is equivalent to total-hit collector with a drilldown query for the candidate category over original query.
        There's tipping point in which not sampling is faster than sampling and fixing using c x K x M skiplists matches against the bitset representing the document results. c defaults to 2 (see overSampleFactor in SamplingParams);

        Over-sampling (a.k.a c) is important for exact counts, as it is conceivable that the accuracy of a sampled top-k is not 100%, but according to some measures we once ran it is very likely that the true top-K results are within the sampled 2K results. Fixing those 2K with their actual counts and re-sorting them accordingly yields much more accurate top-K.

        E.g Requesting 5 count requests for top-10 with overSampleFactor of 2, results in 100 skiplist matching against the document results bitset.

        If amortized results suffice, a different SampleFixer could be coded - which E.g amortize the true count from the sampling ration. E.g if category C got count of 3, and the sample was of 1,000 results out of a 1,000,000 than the "AmortizedSampleFixer" would fix the count of C to be 3,000.
        Such fixing is very fast, and the overSampleFactor should be set to 1.0.

        Edit:
        I now see that it is not that easy to code a different SampleFixer, nor get it the information needed for the amortized result fixing as suggested above.
        I'll try to open the API some and make it more convenient.

        Show
        Gilad Barkai added a comment - Sampling, with its defaults, has its toll. In its defaults, Sampling aims to produce the exact top-K results for each request, as if a StandardFacetAccumulator would have been used. Meaning it aims at producing the same top-K with the same counts. The process begins with sampling the result set and computers the top- cK candidates for each of the M facet requests, producing amortized results. That part is faster than StandardFacetAccumulator because less documents' facets information gets processed. The next part is the "fixing", using a SampleFixer retrieved from a Sampler , in which "fixed" counts are produced which correlate better with the original document result set, rather than the sampled one. The default (and currently only implementation) for such fixer is TakmiSampleFixer which produced exact counts for each of the cK candidates for each of the M facet requests. The counts are not computed against the facet information of each document, but rather matching the skiplist of the drill-down term, of each such candidate category with the bitset of the (actual) document results. The amount of matches is the count. This is equivalent to total-hit collector with a drilldown query for the candidate category over original query. There's tipping point in which not sampling is faster than sampling and fixing using c x K x M skiplists matches against the bitset representing the document results. c defaults to 2 (see overSampleFactor in SamplingParams); Over-sampling (a.k.a c ) is important for exact counts, as it is conceivable that the accuracy of a sampled top-k is not 100%, but according to some measures we once ran it is very likely that the true top-K results are within the sampled 2K results. Fixing those 2K with their actual counts and re-sorting them accordingly yields much more accurate top-K. E.g Requesting 5 count requests for top-10 with overSampleFactor of 2, results in 100 skiplist matching against the document results bitset. If amortized results suffice, a different SampleFixer could be coded - which E.g amortize the true count from the sampling ration. E.g if category C got count of 3, and the sample was of 1,000 results out of a 1,000,000 than the "AmortizedSampleFixer" would fix the count of C to be 3,000. Such fixing is very fast, and the overSampleFactor should be set to 1.0. Edit: I now see that it is not that easy to code a different SampleFixer, nor get it the information needed for the amortized result fixing as suggested above. I'll try to open the API some and make it more convenient.
        Hide
        Rob Audenaerde added a comment -

        Hi Gilad,

        Thanks for the swift and very useful reply, it has given me good insight in the sampling process.

        As I see it, I could benefit from a more straight-forward SampleFixer; like the behaviour of the AmortizedSampleFixer you described. It would be great to have one, or be able to code one up, so +1 for the API improvements which are needed for this.

        Show
        Rob Audenaerde added a comment - Hi Gilad, Thanks for the swift and very useful reply, it has given me good insight in the sampling process. As I see it, I could benefit from a more straight-forward SampleFixer; like the behaviour of the AmortizedSampleFixer you described. It would be great to have one, or be able to code one up, so +1 for the API improvements which are needed for this.
        Hide
        Gilad Barkai added a comment -

        Added a parameter to SamplingParams named fixToExact which defaults to false.
        I think it is probable that one who uses sampling may not be interested in exact results.

        In the proposed approach, the Sampler would create either the old, slow, and accurate TakmiSampleFixer if SamplingParams.shouldFixToExact() is true. Otherwise the much (much!} faster AmortizedSampleFixer would be used, when it only take under account the sampling ratio, assuming the sampled set represent the whole set with 100% accuracy.

        With these changes, the code above should already use the amortized fixer, as the default is now it.
        If the old fixer is to be used - for comparison - the code could look as follows:

        final FacetSearchParams facetSearchParams = new FacetSearchParams( facetRequests );
        
        FacetsCollector facetsCollector;
        
        if ( isSampled )
        {
        	// Create SamplingParams which denotes fixing to exact
        	SamplingParams samplingParams = new SamplingParams();
        	samplingParams.setFixToExact(true);
        
        	// Use the custom sampling params while creating the RandomSampler
        	facetsCollector =
        		FacetsCollector.create( new SamplingAccumulator( new RandomSampler(samplingParams, new Random(someSeed)), facetSearchParams, searcher.getIndexReader(), taxo ) );
        }
        else
        {
        	facetsCollector = FacetsCollector.create( FacetsAccumulator.create( facetSearchParams, searcher.getIndexReader(), taxo ) );
        }
        

        The sampling tests still use the "exact" fixer, as it is not easy asserting against amortized results. I'm still looking into creating a complete faceted search flow test with the amortized-fixer.

        Show
        Gilad Barkai added a comment - Added a parameter to SamplingParams named fixToExact which defaults to false . I think it is probable that one who uses sampling may not be interested in exact results. In the proposed approach, the Sampler would create either the old, slow, and accurate TakmiSampleFixer if SamplingParams.shouldFixToExact() is true . Otherwise the much (much!} faster AmortizedSampleFixer would be used, when it only take under account the sampling ratio, assuming the sampled set represent the whole set with 100% accuracy. With these changes, the code above should already use the amortized fixer, as the default is now it. If the old fixer is to be used - for comparison - the code could look as follows: final FacetSearchParams facetSearchParams = new FacetSearchParams( facetRequests ); FacetsCollector facetsCollector; if ( isSampled ) { // Create SamplingParams which denotes fixing to exact SamplingParams samplingParams = new SamplingParams(); samplingParams.setFixToExact( true ); // Use the custom sampling params while creating the RandomSampler facetsCollector = FacetsCollector.create( new SamplingAccumulator( new RandomSampler(samplingParams, new Random(someSeed)), facetSearchParams, searcher.getIndexReader(), taxo ) ); } else { facetsCollector = FacetsCollector.create( FacetsAccumulator.create( facetSearchParams, searcher.getIndexReader(), taxo ) ); } The sampling tests still use the "exact" fixer, as it is not easy asserting against amortized results. I'm still looking into creating a complete faceted search flow test with the amortized-fixer.
        Hide
        Gilad Barkai added a comment -

        Older patch was against trunk/lucene/facet. This one is rooted with trunk.

        Show
        Gilad Barkai added a comment - Older patch was against trunk/lucene/facet. This one is rooted with trunk.
        Hide
        Shai Erera added a comment -

        Gilad this looks good! I have few comments:

        • AmortizedSampleFixer's jdocs need a <p> tag instead of the empty line. Otherwise I think this does not render as expected.
          • Same in TakmiSamplerFixer
        • SampleFixer has a TODO next to the new param
        • AmortizedSamplerFixerTest and SamplerTest should extend FacetTestCase so that it doesn't use Lucene3x codec accidentally (which doesn't support DocValues and hence facets)

        In general, what do you think if SamplingParams take a SampleFixer instead of fixToExact?

        • We could default to Amortized, while the current sampling tests will set Takmi
        • It will allow someone who doesn't care about the value at all to not fix it. I.e., if I just want to show 5% in the UI, I don't really need Amortized right?
        • It will allow to experiment with other SampleFixers implementations, e.g. maybe Takmi can be made more efficient or something.

        Currently SampleFixer is public though there's really no point to override it since you cannot pass it anywhere? Therefore I think that taking a fixer is better.

        Show
        Shai Erera added a comment - Gilad this looks good! I have few comments: AmortizedSampleFixer's jdocs need a <p> tag instead of the empty line. Otherwise I think this does not render as expected. Same in TakmiSamplerFixer SampleFixer has a TODO next to the new param AmortizedSamplerFixerTest and SamplerTest should extend FacetTestCase so that it doesn't use Lucene3x codec accidentally (which doesn't support DocValues and hence facets) In general, what do you think if SamplingParams take a SampleFixer instead of fixToExact ? We could default to Amortized, while the current sampling tests will set Takmi It will allow someone who doesn't care about the value at all to not fix it. I.e., if I just want to show 5% in the UI, I don't really need Amortized right? It will allow to experiment with other SampleFixers implementations, e.g. maybe Takmi can be made more efficient or something. Currently SampleFixer is public though there's really no point to override it since you cannot pass it anywhere? Therefore I think that taking a fixer is better.
        Hide
        Gilad Barkai added a comment -

        Shai, thanks for the comments.
        First three points are taken care of in the new patch.
        As for SamplingParams taking a SampleFixer, it's a good idea, and I've been there, but it makes it harder on the e.g. SamplingAccumulator to figure out whether to oversample - and trim - for this SampleFixer. It would than move this logic to the SampleFixer.
        That's not bad, but it changes the API a bit more, also the name SampleFixer does not match the functionality any more (perhaps it should oversample and trim itself?)

        Show
        Gilad Barkai added a comment - Shai, thanks for the comments. First three points are taken care of in the new patch. As for SamplingParams taking a SampleFixer, it's a good idea, and I've been there, but it makes it harder on the e.g. SamplingAccumulator to figure out whether to oversample - and trim - for this SampleFixer. It would than move this logic to the SampleFixer. That's not bad, but it changes the API a bit more, also the name SampleFixer does not match the functionality any more (perhaps it should oversample and trim itself?)
        Hide
        Shai Erera added a comment -

        Well, as long as we keep SampleFixer hidden, users will not be able to solve sampling issues on their own. So the API has to be open on that end too. Maybe SamplingAccumulator can have a protected shouldOverSample with a default impl that handles the two known fixers and otherwise returns false? Then the user who provides his own fixer, can provide his accumulator too.

        Show
        Shai Erera added a comment - Well, as long as we keep SampleFixer hidden, users will not be able to solve sampling issues on their own. So the API has to be open on that end too. Maybe SamplingAccumulator can have a protected shouldOverSample with a default impl that handles the two known fixers and otherwise returns false? Then the user who provides his own fixer, can provide his accumulator too.
        Hide
        Shai Erera added a comment -

        Looking at SamplingParams, isn't overSampleFactor enough to decide whether to over sample or not? It can default to not oversample, with the default Amortized fixer or some other default (2.0?) if fixer is Takmi. Then user can change it, and if he passes a fixer which requires over sampling, he should set that too.

        Show
        Shai Erera added a comment - Looking at SamplingParams, isn't overSampleFactor enough to decide whether to over sample or not? It can default to not oversample, with the default Amortized fixer or some other default (2.0?) if fixer is Takmi. Then user can change it, and if he passes a fixer which requires over sampling, he should set that too.
        Hide
        Gilad Barkai added a comment -

        True, looking at overSampleFactor is enough, but it's not obvious that TakmiFixer should be used with overSampleFactor > 1, to better the chances of the result top-k being accurate.
        I'll add some documentation w.r.t this issue, I hope it will do.

        New patch defaults to NoopSampleFixer which does not touch the results at all - if the need is only for a top-k and their counts does not matter, this is the least expensive one.
        Also if instead of counts, a percentage sould be displayed (as how much of the results match this category), the sampled valued out of the sample size would yield the same result as the amortized fixed results out of the actual result set size. That might render the amortized fixer moot..

        New patch account of SampleFixer being set in SamplingParams

        Show
        Gilad Barkai added a comment - True, looking at overSampleFactor is enough, but it's not obvious that TakmiFixer should be used with overSampleFactor > 1, to better the chances of the result top-k being accurate. I'll add some documentation w.r.t this issue, I hope it will do. New patch defaults to NoopSampleFixer which does not touch the results at all - if the need is only for a top-k and their counts does not matter, this is the least expensive one. Also if instead of counts, a percentage sould be displayed (as how much of the results match this category), the sampled valued out of the sample size would yield the same result as the amortized fixed results out of the actual result set size. That might render the amortized fixer moot.. New patch account of SampleFixer being set in SamplingParams
        Hide
        Shai Erera added a comment -

        Thanks Gilad. Now that we have SampleFixer on SamplingParams, I wonder why we need Noop and Amortized? Could we just make the default fixer null and not oversample + fix if it's null? And Amortized ... well as you said, it looks kind of redundant now... I don't think it's rocket science for an app to do value/ratio on its own, yet it's one more class that we need to maintain going forward?

        Show
        Shai Erera added a comment - Thanks Gilad. Now that we have SampleFixer on SamplingParams, I wonder why we need Noop and Amortized? Could we just make the default fixer null and not oversample + fix if it's null? And Amortized ... well as you said, it looks kind of redundant now... I don't think it's rocket science for an app to do value/ratio on its own, yet it's one more class that we need to maintain going forward?
        Hide
        Gilad Barkai added a comment -

        Shai, I think you're right, a null SampleFixer makes more sense.

        While working on a test which validates that a flow works with the null fixer, I found it it did not. The reason is Complements. By default the complements kicks in when enough results are found. I think this may hold the key to the performance differences as well.

        Rod, could you please try the following code and report the results?

            SamplingAccumulator accumulator = new SamplingAccumulator( new RandomSampler(),  facetSearchParams, searcher.getIndexReader, taxo);
        
            // Make sure no complements are in action
            accumulator.setComplementThreshold(StandardFacetsAccumulator.DISABLE_COMPLEMENT);
            
            facetsCollector = FacetsCollector.create(accumulator);
        
        

        For the mean time, made the changes to the patch, and added the test for null fixer.

        Show
        Gilad Barkai added a comment - Shai, I think you're right, a null SampleFixer makes more sense. While working on a test which validates that a flow works with the null fixer, I found it it did not. The reason is Complements. By default the complements kicks in when enough results are found. I think this may hold the key to the performance differences as well. Rod, could you please try the following code and report the results? SamplingAccumulator accumulator = new SamplingAccumulator( new RandomSampler(), facetSearchParams, searcher.getIndexReader, taxo); // Make sure no complements are in action accumulator.setComplementThreshold(StandardFacetsAccumulator.DISABLE_COMPLEMENT); facetsCollector = FacetsCollector.create(accumulator); For the mean time, made the changes to the patch, and added the test for null fixer.
        Hide
        Rob Audenaerde added a comment - - edited

        Hi all, thanks for all the progress.

        I will try to build a Lucene with the latests patch and give it a go..

        (do I take the 4.3 release version? or is there a 4.3 development branch where the patch has to be applied?)

        Show
        Rob Audenaerde added a comment - - edited Hi all, thanks for all the progress. I will try to build a Lucene with the latests patch and give it a go.. (do I take the 4.3 release version? or is there a 4.3 development branch where the patch has to be applied?)
        Hide
        Rob Audenaerde added a comment -

        I took the revisionnumber that was in the patchfile and checked that out.

        svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk@1486401 .

        After installing Ivy I'm now building Lucene myself for the first time

        Show
        Rob Audenaerde added a comment - I took the revisionnumber that was in the patchfile and checked that out. svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk@1486401 . After installing Ivy I'm now building Lucene myself for the first time
        Hide
        Shai Erera added a comment -

        Rob, you don't need to build Lucene to try what Gilad suggested, just modify your search code to disable complements. The problem is that if complements indeed kick in, and from the setup your describe it seems that they do because you search with MADQ, then sampling isn't done at all, yet the accumulator still corrects the counts.

        After you try it, we can tell if the performance overhead is indeed because of complements or that the counts are corrected. In either case, I think it will be good to open up the SampleFixer.

        Show
        Shai Erera added a comment - Rob, you don't need to build Lucene to try what Gilad suggested, just modify your search code to disable complements. The problem is that if complements indeed kick in, and from the setup your describe it seems that they do because you search with MADQ, then sampling isn't done at all, yet the accumulator still corrects the counts. After you try it, we can tell if the performance overhead is indeed because of complements or that the counts are corrected. In either case, I think it will be good to open up the SampleFixer.
        Hide
        Rob Audenaerde added a comment - - edited

        Hi Shai,

        I will check tomorrow. Just to be sure, this is what I will do:

        • Lucene 4.3 release
        • FacetsAccumulator with and without complements
          • With the 'default' TakmiSampleFixer
          • With a NOOP empty Sampler implementation that I will return by overriding the 'getSampleFixer' method in the Sampler that I will provide.
        • MADQ with 1..5 selected facets
        • some other query that will return about 50% of the documents, also with 1..5 facets

        I currently have a nice 15M document set, I will use that as a basis.

        Show
        Rob Audenaerde added a comment - - edited Hi Shai, I will check tomorrow. Just to be sure, this is what I will do: Lucene 4.3 release FacetsAccumulator with and without complements With the 'default' TakmiSampleFixer With a NOOP empty Sampler implementation that I will return by overriding the 'getSampleFixer' method in the Sampler that I will provide. MADQ with 1..5 selected facets some other query that will return about 50% of the documents, also with 1..5 facets I currently have a nice 15M document set, I will use that as a basis.
        Hide
        Shai Erera added a comment -

        yes that sounds good. If you don't want to apply the patch so you can use the Noop fixer, that's fine too. I think the main goal is to see whether the complements that kicked in were in the way.

        Show
        Shai Erera added a comment - yes that sounds good. If you don't want to apply the patch so you can use the Noop fixer, that's fine too. I think the main goal is to see whether the complements that kicked in were in the way.
        Hide
        Shai Erera added a comment -

        Patch adds CHANGES entry as well as makes SampleFixer and TakmiSampleFixer public. I think this is ready but let's wait for Rob's input.

        Show
        Shai Erera added a comment - Patch adds CHANGES entry as well as makes SampleFixer and TakmiSampleFixer public. I think this is ready but let's wait for Rob's input.
        Hide
        Rob Audenaerde added a comment - - edited

        Time in ms.

         
                MADQ				75% hits			
                Complements	DISABLE Com.	Complements	DISABLE complements	
        #facets Takmi 	Noop	Takmi	Noop	Takmi	Noop	Takmi	Noop
        1         999	433	1024	393	1239	541	969	432
        2        2292	388	2877	512	2379	609	2489	457
        3        2501	219	3228	413	2477	569	2590	434
        4        3589	224	5052	392	3372	562	4093	503
        5        4764	247	6863	493	4356	577	5103	533
        
         					
        SamplingParams sampleParams = new SamplingParams();									
        sampleParams.setMaxSampleSize( 5000 );									
        sampleParams.setMinSampleSize( 5000 );									
        sampleParams.setSamplingThreshold( 75000 );									
        sampleParams.setOversampleFactor( 1.0d );									
        
        Show
        Rob Audenaerde added a comment - - edited Time in ms. MADQ 75% hits Complements DISABLE Com. Complements DISABLE complements #facets Takmi Noop Takmi Noop Takmi Noop Takmi Noop 1 999 433 1024 393 1239 541 969 432 2 2292 388 2877 512 2379 609 2489 457 3 2501 219 3228 413 2477 569 2590 434 4 3589 224 5052 392 3372 562 4093 503 5 4764 247 6863 493 4356 577 5103 533 SamplingParams sampleParams = new SamplingParams(); sampleParams.setMaxSampleSize( 5000 ); sampleParams.setMinSampleSize( 5000 ); sampleParams.setSamplingThreshold( 75000 ); sampleParams.setOversampleFactor( 1.0d );
        Hide
        Shai Erera added a comment -

        Thanks Rob. This shows that complements don't affect the performance much, and Takmi is the main issue. This is good!
        And also, Noop is stable with the number of growing facet requests, which is expected because it doesn't do any more work, while Takmi gets worse as more requests are added.
        Actually, you use overSampleFactor=1, which is a bit optimistic for Takmi. Usually we use 2. That would show an even worse running time.

        W.r.t running the test, do you loop through the number of requests, or start a new JVM for each testcase? Do you do "warmup" runs to exclude their results before the actual measure? This won't change the fact that Takmi is slower than Noop, just perhaps explain why Noop w/ 5 requests is faster than 1 (which makes no sense, I take it it's an OS cache, or no warmup run).

        Anyway, I think this proves we need to make the default fixer null (which is equivalent to noop). I'll go ahead and commit the changes.

        Show
        Shai Erera added a comment - Thanks Rob. This shows that complements don't affect the performance much, and Takmi is the main issue. This is good! And also, Noop is stable with the number of growing facet requests, which is expected because it doesn't do any more work, while Takmi gets worse as more requests are added. Actually, you use overSampleFactor=1, which is a bit optimistic for Takmi. Usually we use 2. That would show an even worse running time. W.r.t running the test, do you loop through the number of requests, or start a new JVM for each testcase? Do you do "warmup" runs to exclude their results before the actual measure? This won't change the fact that Takmi is slower than Noop, just perhaps explain why Noop w/ 5 requests is faster than 1 (which makes no sense, I take it it's an OS cache, or no warmup run). Anyway, I think this proves we need to make the default fixer null (which is equivalent to noop). I'll go ahead and commit the changes.
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Rob for reporting and taking the time to test this and Gilad for the fix!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Rob for reporting and taking the time to test this and Gilad for the fix!
        Hide
        Rob Audenaerde added a comment -

        Thank you too

        Some more test-details:

        Each 'column' of 5 facets is done in a new JVM. Each individual cell is 4 searches, the first is disregarded, the three left are averaged.

        For the SampingParams, I reduced the numbers from the defaults to speed up testing.

        Show
        Rob Audenaerde added a comment - Thank you too Some more test-details: Each 'column' of 5 facets is done in a new JVM. Each individual cell is 4 searches, the first is disregarded, the three left are averaged. For the SampingParams , I reduced the numbers from the defaults to speed up testing.
        Hide
        Shai Erera added a comment -

        Thanks. I usually take the minimum, not average, since technically it's the fastest we could get. Also, discarding only one run is not always enough, since it may take the OS cache more time to warm up. But anyway, the numbers are clear. Thanks for doing this Rob!

        Show
        Shai Erera added a comment - Thanks. I usually take the minimum, not average, since technically it's the fastest we could get. Also, discarding only one run is not always enough, since it may take the OS cache more time to warm up. But anyway, the numbers are clear. Thanks for doing this Rob!
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Shai Erera
            Reporter:
            Rob Audenaerde
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development