Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5725

Efficient facets without counts for enum method

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.3, master (7.0)
    • Component/s: search
    • Labels:
      None

      Description

      UPD: Specification

      To cap facet counts by 1 specify facet.exists=true. It can be used with facet.method=enum or when it's omitted. It can be used only on non-trie fields i.e. strings. It may speed up facet counting on large indices and/or high-cardinality facet values..

      Shot version:

      This improves performance for facet.method=enum when it's enough to know that facet count>0, for example when you it's when you dynamically populate filters on search form. New method checks if two bitsets intersect instead of counting intersection size.

      Long version:

      We have a dataset containing hundreds of millions of records, we facet by dozens of fields with many of facet-excludes and have relatively small number of unique values in fields, around thousands.
      Before executing search, users work with "advanced search" form, our goal is to populate dozens of filters with values which are applicable with other selected values, so basically this is a use case for facets with mincount=1, but without need in actual counts.

      Our performance tests showed that facet.method=enum works much better than fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". For example average execution of query time with method fc=1500ms, fcs=2600ms and with enum=280ms. Profiling indicated the majority time for enum was spent on intersecting docsets.

      Hers's a patch that introduces an extension to facet calculation for method=enum. Basically it uses docSetA.intersects(docSetB) instead of docSetA. intersectionSize (docSetB).

      As a result we were able to reduce our average query time from 280ms to 60ms.

      1. facet.limit=0&facet.missing=true discrepancy between cloud and non-distr.txt
        4 kB
        Mikhail Khludnev
      2. SOLR-5725.patch
        45 kB
        Mikhail Khludnev
      3. SOLR-5725.patch
        45 kB
        Mikhail Khludnev
      4. SOLR-5725.patch
        56 kB
        Mikhail Khludnev
      5. SOLR-5725.patch
        56 kB
        Mikhail Khludnev
      6. SOLR-5725.patch
        48 kB
        Mikhail Khludnev
      7. SOLR-5725.patch
        47 kB
        Mikhail Khludnev
      8. SOLR-5725.patch
        3 kB
        Alexey Kozhemiakin
      9. SOLR-5725-5x.patch
        18 kB
        Sebastian Koziel
      10. SOLR-5725-master.patch
        120 kB
        Radoslaw Zielinski

        Issue Links

          Activity

          Hide
          alexey_kozhemiakin Alexey Kozhemiakin added a comment -

          Draft version of patch attached. Tests and refined names of method and params will follow soon.

          Patched checked on 4.6

          Show
          alexey_kozhemiakin Alexey Kozhemiakin added a comment - Draft version of patch attached. Tests and refined names of method and params will follow soon. Patched checked on 4.6
          Hide
          skoziel Sebastian Koziel added a comment -

          New patch checked on branch 5x.

          The faceting method is called 'intersect' as it's based on intersection of docsSets. There are also tests added.

          Show
          skoziel Sebastian Koziel added a comment - New patch checked on branch 5x. The faceting method is called 'intersect' as it's based on intersection of docsSets. There are also tests added.
          Hide
          mkhludnev Mikhail Khludnev added a comment - - edited

          Hello, I skim through the patch. Generally it's ok, here are some notes:

          • as far the proposed method is an optimization of enum, it's reasonable to name it starting with it, eg enumprobing;
          • it's preferable to avoid code diverging, instead of introducing SimpleFacets.getFacetTermIntersection() it's better to introduce conditional code branches (yep, this is sad) into getFacetTermEnumCounts().
          • implementation should be consistent, thus it's worth to implement the similar idea in else branch of {{if (df >= minDfFilterCache) }}
          • it should be resilient for other parameter, eg it should throw an exception if combined with mincount=2 or more
          • when merge facet from shards we need to cap them by 1, otherwise it will flip from 1 to number of shards
          • the test case is beefy enough, but it's also necessary to include new method into existing regression tests with amending asserts, yep. it's pain.
          Show
          mkhludnev Mikhail Khludnev added a comment - - edited Hello, I skim through the patch. Generally it's ok, here are some notes: as far the proposed method is an optimization of enum , it's reasonable to name it starting with it, eg enumprobing ; it's preferable to avoid code diverging, instead of introducing SimpleFacets.getFacetTermIntersection() it's better to introduce conditional code branches (yep, this is sad) into getFacetTermEnumCounts() . implementation should be consistent, thus it's worth to implement the similar idea in else branch of {{if (df >= minDfFilterCache) }} it should be resilient for other parameter, eg it should throw an exception if combined with mincount=2 or more when merge facet from shards we need to cap them by 1, otherwise it will flip from 1 to number of shards the test case is beefy enough, but it's also necessary to include new method into existing regression tests with amending asserts, yep. it's pain.
          Hide
          dsmiley David Smiley added a comment -

          Cool feature.

          As I've been looking at the JSON Facets API recently... I'm reminded it's sad we have 2 main facet impls... sigh

          Show
          dsmiley David Smiley added a comment - Cool feature. As I've been looking at the JSON Facets API recently... I'm reminded it's sad we have 2 main facet impls... sigh
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Yep, it's not a joy, for sure. Do you prefer to add this algorithm staright into json.facets only?

          Show
          mkhludnev Mikhail Khludnev added a comment - Yep, it's not a joy, for sure. Do you prefer to add this algorithm staright into json.facets only?
          Hide
          dsmiley David Smiley added a comment -

          Do you prefer to add this algorithm staright into json.facets only?

          I'd "prefer" that but I'm definitely not standing in the way of any progress for our classic/standard facets impl. Ideally this new capability will be added to JSON-Facets too. BTW the pertinent issue is: SOLR-7296

          Show
          dsmiley David Smiley added a comment - Do you prefer to add this algorithm staright into json.facets only? I'd "prefer" that but I'm definitely not standing in the way of any progress for our classic/standard facets impl. Ideally this new capability will be added to JSON-Facets too. BTW the pertinent issue is: SOLR-7296
          Hide
          ishi Radoslaw Zielinski added a comment - - edited

          I've altered the patch to be applicable on the current master branch. I had to do this by hand because there were to many changes.

          I've applied Mikhail suggestions as well. Can you please look at this?

          What is done:

          • renamed the faceting method to enumprobing
          • switched to conditional code branching inside getFacetTermEnumCounts() method
          • covered the else branch of {{if (df >= minDfFilterCache) }}
          • throw an exception if combined with mincount=2 or more
          • handled per field faceting type setting
          • handled distributed environment
          • updated/added test to cover more use cases
          Show
          ishi Radoslaw Zielinski added a comment - - edited I've altered the patch to be applicable on the current master branch. I had to do this by hand because there were to many changes. I've applied Mikhail suggestions as well. Can you please look at this? What is done: renamed the faceting method to enumprobing switched to conditional code branching inside getFacetTermEnumCounts() method covered the else branch of {{if (df >= minDfFilterCache) }} throw an exception if combined with mincount=2 or more handled per field faceting type setting handled distributed environment updated/added test to cover more use cases
          Hide
          mkhludnev Mikhail Khludnev added a comment - - edited

          squashed commits, tweaked tests - especially TestRandomFaceting for facet.sort=counts with this method it's pain.
          I will work on it, mostly checking new tests and case w/o filterCache (facet.enum.cache.minDf>0).
          Colleagues, please confirm that you are happy to have a new facet.method=enumprobing, otherwise chime in!

          Show
          mkhludnev Mikhail Khludnev added a comment - - edited squashed commits, tweaked tests - especially TestRandomFaceting for facet.sort=counts with this method it's pain. I will work on it, mostly checking new tests and case w/o filterCache ( facet.enum.cache.minDf>0 ). Colleagues , please confirm that you are happy to have a new facet.method=enumprobing , otherwise chime in!
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          new patch with better distributed search coverage.
          however it's worth to be considered for moving to SolrCloudTestCase like it's suggested at SOLR-9065. TBC

          Show
          mkhludnev Mikhail Khludnev added a comment - new patch with better distributed search coverage. however it's worth to be considered for moving to SolrCloudTestCase like it's suggested at SOLR-9065 . TBC
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          better distributed support with tests. I'm still considering SolrCloudTestCase.

          Opinions? Do you think it's ready to go?

          Show
          mkhludnev Mikhail Khludnev added a comment - better distributed support with tests. I'm still considering SolrCloudTestCase. Opinions? Do you think it's ready to go?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          enumprobing is a strange name, and doesn't convey the fact that this is the first "method" of faceting that actually doesn't calculate counts.
          Ideas for better names:

          • facet.method=nonzero
          • facet.method=exists
          • facet.method=matches

          I don't see any description of the proposed output format, but I'll guess that it's the same as normal facet.field, but every included count is set to "1"?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - enumprobing is a strange name, and doesn't convey the fact that this is the first "method" of faceting that actually doesn't calculate counts. Ideas for better names: facet.method=nonzero facet.method=exists facet.method=matches I don't see any description of the proposed output format, but I'll guess that it's the same as normal facet.field, but every included count is set to "1"?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Thinking about how we would expose this in the JSON Facet API, I suppose it would be
          method=exists (or "matches", or whatever name we pick) in a facet field block... and the output format would be the same (but counts would not be accurate).
          It would only be an execution hint and would be ignored if not applicable (e.g. if one is sorting by something else and the optimization no longer makes sense).

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Thinking about how we would expose this in the JSON Facet API, I suppose it would be method=exists (or "matches", or whatever name we pick) in a facet field block... and the output format would be the same (but counts would not be accurate). It would only be an execution hint and would be ignored if not applicable (e.g. if one is sorting by something else and the optimization no longer makes sense).
          Hide
          dsmiley David Smiley added a comment -

          +1 to facet.method=nonzero. Yonik's other suggestions are fine to me too. I hate "enumprobing".

          But I think there should be some parameter other than facet.method that enables this. facet.count=false? facet.exists=true? I see facet.method generally as a hint. Indeed it is a hint in JSON Faceting API (except for method=stream and I'm fixing that in SOLR-9142). I don't think it should control what the output is in any way.

          Show
          dsmiley David Smiley added a comment - +1 to facet.method=nonzero. Yonik's other suggestions are fine to me too. I hate "enumprobing". But I think there should be some parameter other than facet.method that enables this. facet.count=false? facet.exists=true? I see facet.method generally as a hint . Indeed it is a hint in JSON Faceting API (except for method=stream and I'm fixing that in SOLR-9142 ). I don't think it should control what the output is in any way.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Right. There is no change in format every included count is set to "1" or "0" (on mincount=0)

          Show
          mkhludnev Mikhail Khludnev added a comment - Right. There is no change in format every included count is set to "1" or "0" (on mincount=0)
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          yep. JSON Facets, is the next aim for sure (as well as query facet). fwiw, for now this optimization works for both facet.sort options.

          Show
          mkhludnev Mikhail Khludnev added a comment - yep. JSON Facets, is the next aim for sure (as well as query facet). fwiw, for now this optimization works for both facet.sort options.
          Hide
          mkhludnev Mikhail Khludnev added a comment -
          • enumprobing is not the best name. Got it. facet.method=nonzero is good one. Don't you afraid, that user will confuse it with legacy facet.zeros?

            there should be some parameter other than facet.method that enables this.

            I'm afraid users will try to combine it with fc,fcs,uif, that's why I prefer to switch it as method.
            Thus, I like facet.method=exists from proposed ones. Btw, initially this name was intersects. WDYT? My point was that this method works like enum distinguishing from other ones, but this one is faster. What about fastenum or enumbit?

          Show
          mkhludnev Mikhail Khludnev added a comment - enumprobing is not the best name. Got it. facet.method=nonzero is good one. Don't you afraid, that user will confuse it with legacy facet.zeros ? there should be some parameter other than facet.method that enables this. I'm afraid users will try to combine it with fc , fcs , uif , that's why I prefer to switch it as method. Thus, I like facet.method=exists from proposed ones. Btw, initially this name was intersects . WDYT? My point was that this method works like enum distinguishing from other ones, but this one is faster. What about fastenum or enumbit ?
          Hide
          dsmiley David Smiley added a comment -

          I've never heard of facet.zeros, so probably not a big concern.

          I'm afraid users will try to combine it with fc,fcs,uif, that's why I prefer to switch it as method.

          I think it's okay to return an error when a particular facet.method's requirements aren't met, even though we don't normally do that. But I think it's bad to have facet.method otherwise have a visible effect on the response. We already overrule what the user specifies in facet.method when we have to. This can be no different... set a proposed facet.count=false and then facet.method is ignored because we're going to effectively do that exactly one way.

          Show
          dsmiley David Smiley added a comment - I've never heard of facet.zeros , so probably not a big concern. I'm afraid users will try to combine it with fc,fcs,uif, that's why I prefer to switch it as method. I think it's okay to return an error when a particular facet.method's requirements aren't met, even though we don't normally do that. But I think it's bad to have facet.method otherwise have a visible effect on the response. We already overrule what the user specifies in facet.method when we have to. This can be no different... set a proposed facet.count=false and then facet.method is ignored because we're going to effectively do that exactly one way.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          But I think it's bad to have facet.method otherwise have a visible effect on the response.

          Aghhh. Got this point. This option doesn't remove counts but capping them to 1. What about facet.maxcount=1 which will only valid with enum and allow only this value (1) for a while?

          Show
          mkhludnev Mikhail Khludnev added a comment - But I think it's bad to have facet.method otherwise have a visible effect on the response. Aghhh. Got this point. This option doesn't remove counts but capping them to 1. What about facet.maxcount=1 which will only valid with enum and allow only this value (1) for a while?
          Hide
          dsmiley David Smiley added a comment -

          Refining what I said before... we don't even need a new facet.method for this, although it's fine to have one.

          code review:

          • What's with the rb.req.getParams() at the top of modifyRequestForFieldFacets?
          • for the loop label in SimpleFacets.getFacetTermEnumCounts, can you please use all-caps – SEGMENTS or SEGMENTS_LOOP.
          • In TestRandomFaceting.getExpectationForSortByCount: why are offset & limit multiplied by 2?

          (not introduced by this patch):

          • Hmmm, I'm looking at the algorithm in SimpleFacets.getFacetTermEnumCounts for when the df is < minDfFilterCache – which loops over docs in the PostingsEnum and checks fastForRandomSet. Wouldn't it be better to leap-frog (and then don't need a "fast-for-random-set" but do need it to be sorted)? What do you think Yonik Seeley?
          Show
          dsmiley David Smiley added a comment - Refining what I said before... we don't even need a new facet.method for this, although it's fine to have one. code review: What's with the rb.req.getParams() at the top of modifyRequestForFieldFacets? for the loop label in SimpleFacets.getFacetTermEnumCounts, can you please use all-caps – SEGMENTS or SEGMENTS_LOOP . In TestRandomFaceting.getExpectationForSortByCount: why are offset & limit multiplied by 2? (not introduced by this patch): Hmmm, I'm looking at the algorithm in SimpleFacets.getFacetTermEnumCounts for when the df is < minDfFilterCache – which loops over docs in the PostingsEnum and checks fastForRandomSet. Wouldn't it be better to leap-frog (and then don't need a "fast-for-random-set" but do need it to be sorted)? What do you think Yonik Seeley ?
          Hide
          dsmiley David Smiley added a comment -

          facet.maxcount=1

          I don't like that suggestion for two reasons: (1) the maxcount name might suggest we don't return values that would have a higher count – and that's not true, and (2) This feature is for using just '1'... and this parameter suggests you can set it to some other number and we don't support that.

          Show
          dsmiley David Smiley added a comment - facet.maxcount=1 I don't like that suggestion for two reasons: (1) the maxcount name might suggest we don't return values that would have a higher count – and that's not true, and (2) This feature is for using just '1'... and this parameter suggests you can set it to some other number and we don't support that.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          What's with the rb.req.getParams() at the top of modifyRequestForFieldFacets?

          ops.. I'm sorry. I started type there, but then all magic went to .FacetComponent.EnumProbingDistribFacetField. Fixed in attached patch.

          for the loop label in SimpleFacets.getFacetTermEnumCounts, can you please use all-caps

          Sure. Fixed in patch.

          In TestRandomFaceting.getExpectationForSortByCount: why are offset & limit multiplied by 2?

          facets in json response come as a flat list like ["a", 1, "c", 1, "e", 4], thus every tuple takes a pair of elems in array.

          Ok. Let's confirm the spec:

          facet.exists=true is valid only for facet.method=enum, i.e. it throws exceptions on other methods. It caps facet counts by 1, hopefully does it faster than usual enum.

          Does it work for everyone?

          Show
          mkhludnev Mikhail Khludnev added a comment - What's with the rb.req.getParams() at the top of modifyRequestForFieldFacets? ops.. I'm sorry. I started type there, but then all magic went to .FacetComponent.EnumProbingDistribFacetField . Fixed in attached patch. for the loop label in SimpleFacets.getFacetTermEnumCounts, can you please use all-caps Sure. Fixed in patch. In TestRandomFaceting.getExpectationForSortByCount: why are offset & limit multiplied by 2? facets in json response come as a flat list like ["a", 1, "c", 1, "e", 4], thus every tuple takes a pair of elems in array. Ok. Let's confirm the spec: facet.exists=true is valid only for facet.method=enum , i.e. it throws exceptions on other methods. It caps facet counts by 1, hopefully does it faster than usual enum . Does it work for everyone?
          Hide
          dsmiley David Smiley added a comment -

          facet.exists=true looks good to me... but it should be okay to not specify facet.method – it will be enum to support this feature. I'm cool with it being an error if for some reason you set it to something other than enum.

          BTW this feature might theoretically be brought into the other facet types in the future. e.g.

          facet.query={! facet.exists=true}field:val

          or facet.range.exists=true

          Show
          dsmiley David Smiley added a comment - facet.exists=true looks good to me... but it should be okay to not specify facet.method – it will be enum to support this feature. I'm cool with it being an error if for some reason you set it to something other than enum. BTW this feature might theoretically be brought into the other facet types in the future. e.g. facet.query={! facet.exists=true}field:val or facet.range.exists=true
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Hmmm, I'm looking at the algorithm in SimpleFacets.getFacetTermEnumCounts for when the df is < minDfFilterCache – which loops over docs in the PostingsEnum and checks fastForRandomSet. Wouldn't it be better to leap-frog (and then don't need a "fast-for-random-set" but do need it to be sorted)?

          I had the exact same thought!
          This code was written a long time ago, when skipping on postings was slower. The more sparse a docset is, the better performing skipping should be.
          In the case of a base docset that matches many documents and a term that matches few, it may still be faster to not skip. Also, a low minDfFilterCache would likely always be fine not skipping. The potentially big performance improvement would be when minDfFilterCache is large (and we encounter terms with large df as well). We'd need benchmarking to try and determine where the crossover point is today.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Hmmm, I'm looking at the algorithm in SimpleFacets.getFacetTermEnumCounts for when the df is < minDfFilterCache – which loops over docs in the PostingsEnum and checks fastForRandomSet. Wouldn't it be better to leap-frog (and then don't need a "fast-for-random-set" but do need it to be sorted)? I had the exact same thought! This code was written a long time ago, when skipping on postings was slower. The more sparse a docset is, the better performing skipping should be. In the case of a base docset that matches many documents and a term that matches few, it may still be faster to not skip. Also, a low minDfFilterCache would likely always be fine not skipping. The potentially big performance improvement would be when minDfFilterCache is large (and we encounter terms with large df as well). We'd need benchmarking to try and determine where the crossover point is today.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Here is how I caught the logic for facet.exists=true. Please have a look! Alan Woodward you are also invited to look at it.

             /**
              * @param existsRequested facet.exists=true is passed for the given field
              * */
            static FacetMethod selectFacetMethod(String fieldName, 
                                                 SchemaField field, FacetMethod method, Integer mincount,
                                                 boolean existsRequested) {
              if (existsRequested) {
                //checkMincountOnEnumprobing(fieldName, mincount);
                if (mincount > 1) {
                  throw new SolrException (ErrorCode.BAD_REQUEST,
                      FacetParams.FACET_MINCOUNT + "=" + mincount + " exceed 1 that's not supported with " + 
                          FacetParams.FACET_EXISTS + "=true for " + fieldName
                  );
                }
                // suggesting default if absent
                if (method == null) {
                  method = FacetMethod.ENUM;
                }
              }
              final FacetMethod facetMethod = selectFacetMethod(field, method, mincount);
              
              if (existsRequested && facetMethod != FacetMethod.ENUM) {
                throw new SolrException (ErrorCode.BAD_REQUEST, 
                    FacetParams.FACET_EXISTS + "=true is requested, but " +
                    FacetParams.FACET_METHOD + "=" + FacetParams.FACET_METHOD_enum + " can't be used with " + fieldName
                );
              }
              return facetMethod;
            }
          

          I'm waiting for confirmation before shaking up the tests.

          Show
          mkhludnev Mikhail Khludnev added a comment - Here is how I caught the logic for facet.exists=true . Please have a look! Alan Woodward you are also invited to look at it. /** * @param existsRequested facet.exists= true is passed for the given field * */ static FacetMethod selectFacetMethod( String fieldName, SchemaField field, FacetMethod method, Integer mincount, boolean existsRequested) { if (existsRequested) { //checkMincountOnEnumprobing(fieldName, mincount); if (mincount > 1) { throw new SolrException (ErrorCode.BAD_REQUEST, FacetParams.FACET_MINCOUNT + "=" + mincount + " exceed 1 that's not supported with " + FacetParams.FACET_EXISTS + "= true for " + fieldName ); } // suggesting default if absent if (method == null ) { method = FacetMethod.ENUM; } } final FacetMethod facetMethod = selectFacetMethod(field, method, mincount); if (existsRequested && facetMethod != FacetMethod.ENUM) { throw new SolrException (ErrorCode.BAD_REQUEST, FacetParams.FACET_EXISTS + "= true is requested, but " + FacetParams.FACET_METHOD + "=" + FacetParams.FACET_METHOD_enum + " can't be used with " + fieldName ); } return facetMethod; } I'm waiting for confirmation before shaking up the tests.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Attached reworked patch for facet.exists=true. Please review!

          Show
          mkhludnev Mikhail Khludnev added a comment - Attached reworked patch for facet.exists=true . Please review!
          Hide
          dsmiley David Smiley added a comment -

          +1 – Nice. I'm relieved to see no mention of "probing" anymore

          One small bit of improvement IMO is the short description describing this setting. Right now on the constant you have:

          Boolean parameters to indicate that Solr should check terms docs for intersection with result docset without calculating exact facet counts

          IMO that's too much implementation detail and isn't immediately obvious what effect it has. I suggest: A boolean parameter that caps the facet counts at 1. With this set, a returned count will only be 0 or 1. For apps that don't need the count, this should be an optimization.

          Show
          dsmiley David Smiley added a comment - +1 – Nice. I'm relieved to see no mention of "probing" anymore One small bit of improvement IMO is the short description describing this setting. Right now on the constant you have: Boolean parameters to indicate that Solr should check terms docs for intersection with result docset without calculating exact facet counts IMO that's too much implementation detail and isn't immediately obvious what effect it has. I suggest: A boolean parameter that caps the facet counts at 1. With this set, a returned count will only be 0 or 1. For apps that don't need the count, this should be an optimization.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          ok. This line is better, I put it in.
          Is it worth to wait for more review feedback, or it's ready to be committed?

          Show
          mkhludnev Mikhail Khludnev added a comment - ok. This line is better, I put it in. Is it worth to wait for more review feedback, or it's ready to be committed?
          Hide
          dsmiley David Smiley added a comment -

          it's ready to be committed?

          I think so. You got at least one review of the final patch which is pretty good – it seems most patches get none. And the previous similar patch has existed for a bit longer and got someone else's eyes on it. If someone jumps in for more feedback after your commit; it's not set in stone for more changes.

          Show
          dsmiley David Smiley added a comment - it's ready to be committed? I think so. You got at least one review of the final patch which is pretty good – it seems most patches get none. And the previous similar patch has existed for a bit longer and got someone else's eyes on it. If someone jumps in for more feedback after your commit; it's not set in stone for more changes.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ff69d14868555c43708823df23c90abd58a14d86 in lucene-solr's branch refs/heads/master from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff69d14 ]

          SOLR-5725: facet.exists=true caps counts by 1 to make facet.method=enum
          faster.

          Show
          jira-bot ASF subversion and git services added a comment - Commit ff69d14868555c43708823df23c90abd58a14d86 in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff69d14 ] SOLR-5725 : facet.exists=true caps counts by 1 to make facet.method=enum faster.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ba380500f928859584581d11f2d9b431c07b996d in lucene-solr's branch refs/heads/branch_6x from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ba38050 ]

          SOLR-5725: facet.exists=true caps counts by 1 to make facet.method=enum
          faster.

          Show
          jira-bot ASF subversion and git services added a comment - Commit ba380500f928859584581d11f2d9b431c07b996d in lucene-solr's branch refs/heads/branch_6x from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ba38050 ] SOLR-5725 : facet.exists=true caps counts by 1 to make facet.method=enum faster.
          Hide
          mkhludnev Mikhail Khludnev added a comment - - edited

          you know what??? http://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3523/

          junit.framework.AssertionFailedError: .facet_counts.facet_fields.t_s.null:0!=null
          at __randomizedtesting.SeedInfo.seed([1CFBD6041AD904D3:94AFE9DEB425692B]:0)
          at junit.framework.Assert.fail(Assert.java:50)
          at org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:913)
          at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:932)
          at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:607)
          at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:574)
          at org.apache.solr.handler.component.DistributedFacetExistsSmallTest.checkRandomParams(DistributedFacetExistsSmallTest.java:136)

          NOTE: reproduce with: ant test -Dtestcase=DistributedFacetExistsSmallTest -Dtests.method=test -Dtests.seed=1CFBD6041AD904D3 -Dtests.slow=true -Dtests.locale=es-SV -Dtests.timezone=Pacific/Guadalcanal -Dtests.asserts=true -Dtests.file.encoding=US-ASCII

          It's perfectly reproduced with the seed. The reason is: with distrib=false facet.limit=0 bypasses facet.missing processing. It's AS-WAS pre SOLR-5725 behavior. However, distrib=true calculates missing count disregards facet.limit=0. I'm going to commit test fix. And think what to do then.

          Show
          mkhludnev Mikhail Khludnev added a comment - - edited you know what??? http://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3523/ junit.framework.AssertionFailedError: .facet_counts.facet_fields.t_s.null:0!=null at __randomizedtesting.SeedInfo.seed( [1CFBD6041AD904D3:94AFE9DEB425692B] :0) at junit.framework.Assert.fail(Assert.java:50) at org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:913) at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:932) at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:607) at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:574) at org.apache.solr.handler.component.DistributedFacetExistsSmallTest.checkRandomParams(DistributedFacetExistsSmallTest.java:136) NOTE: reproduce with: ant test -Dtestcase=DistributedFacetExistsSmallTest -Dtests.method=test -Dtests.seed=1CFBD6041AD904D3 -Dtests.slow=true -Dtests.locale=es-SV -Dtests.timezone=Pacific/Guadalcanal -Dtests.asserts=true -Dtests.file.encoding=US-ASCII It's perfectly reproduced with the seed. The reason is: with distrib=false facet.limit=0 bypasses facet.missing processing. It's AS-WAS pre SOLR-5725 behavior. However, distrib=true calculates missing count disregards facet.limit=0 . I'm going to commit test fix. And think what to do then.
          Show
          mkhludnev Mikhail Khludnev added a comment - test fix for master and branch_6x https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=9ac5c1cf149fdd393209795226dd7ee792b767b2 https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=c61ee3346cc6a5b83ec2f00b14ead49ee9d50edf
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ff69d14868555c43708823df23c90abd58a14d86 in lucene-solr's branch refs/heads/apiv2 from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff69d14 ]

          SOLR-5725: facet.exists=true caps counts by 1 to make facet.method=enum
          faster.

          Show
          jira-bot ASF subversion and git services added a comment - Commit ff69d14868555c43708823df23c90abd58a14d86 in lucene-solr's branch refs/heads/apiv2 from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff69d14 ] SOLR-5725 : facet.exists=true caps counts by 1 to make facet.method=enum faster.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Closing after 6.3.0 release.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

            People

            • Assignee:
              mkhludnev Mikhail Khludnev
              Reporter:
              alexey_kozhemiakin Alexey Kozhemiakin
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development