Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      he goal here is basically flip the notion of "stats.facet" on it's head, so that instead of asking the stats component to also do some faceting (something that's never worked well with the variety of field types and has never worked in distributed mode) we instead ask the PivotFacet code to compute some stats X for each leaf in a pivot. We'll do this with the existing stats.field params, but we'll leverage the tag local param of the stats.field instances to be able to associate which stats we want hanging off of which facet.pivot

      Example...

      facet.pivot={!stats=s1}category,manufacturer
      stats.field={!key=avg_price tag=s1 mean=true}price
      stats.field={!tag=s1 min=true max=true}user_rating
      

      ...with the request above, in addition to computing the min/max user_rating and mean price (labeled "avg_price") over the entire result set, the PivotFacet component will also include those stats for every node of the tree it builds up when generating a pivot of the fields "category,manufacturer"

      1. SOLR-6351.patch
        131 kB
        Hoss Man
      2. SOLR-6351.patch
        132 kB
        Hoss Man
      3. SOLR-6351.patch
        129 kB
        Vitaliy Zhovtyuk
      4. SOLR-6351.patch
        122 kB
        Hoss Man
      5. SOLR-6351.patch
        120 kB
        Hoss Man
      6. SOLR-6351.patch
        117 kB
        Hoss Man
      7. SOLR-6351.patch
        107 kB
        Vitaliy Zhovtyuk
      8. SOLR-6351.patch
        67 kB
        Hoss Man
      9. SOLR-6351.patch
        73 kB
        Hoss Man
      10. SOLR-6351.patch
        67 kB
        Hoss Man
      11. SOLR-6351.patch
        66 kB
        Hoss Man
      12. SOLR-6351.patch
        78 kB
        Vitaliy Zhovtyuk
      13. SOLR-6351.patch
        78 kB
        Vitaliy Zhovtyuk
      14. SOLR-6351.patch
        76 kB
        Vitaliy Zhovtyuk
      15. SOLR-6351.patch
        69 kB
        Steve Molloy
      16. SOLR-6351.patch
        70 kB
        Steve Molloy
      17. SOLR-6351.patch
        66 kB
        Vitaliy Zhovtyuk
      18. SOLR-6351.patch
        59 kB
        Steve Molloy
      19. SOLR-6351.patch
        56 kB
        Vitaliy Zhovtyuk

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Proposed implementation...

          • Modify the Pivot facet code to check for a "stats" local param
          • if a stats tag name is specified, each time a NamedList<Integer> of "top" terms is computed at a given level of the pivot tree (either single node or on an initial shard request, or as part of a refinement request), each of those terms should be applied as a filter on the main doc set - the result should be used to construct a SimpleStats
          • the SimpleStats class should then be asked to compute the neccessary stats based on the tags
            • SimpleStats will need refactored: probably with a new StatsField class for modeling each stats.field param (and the StatsValues that hang off of it) and new accessor methods to get the list of all {{StatsField}}s so the pivot facet class can find the ones it needs by tag
            • in single node mode, just ask each StatsField to compute the stats
            • in cloud mode:
              • in the shard requests, ask each StatsField to compute it's distributed stats based on the params it knows about
              • in the coordinator request, once the pivot constraints are merged & refined, the refinement phase should merge all of the StatsValues at the same time it's summing up the facet counts.
          • we need to think carefully about "exclusions" because they might be different
            • ie: facet.pivot{!ex=some_fq stats=s1}foo,bar&stats.field={!ex=other_fq}price
            • i think what we want in general is for the "ex" localparam of the stats.field to be ignored when hanging off of a facet.pivot
            • OR: we union the exclusions (so the computed stats are over more docs then what the facet count returns)
            • OR: we fail if they both specify "ex" and they aren't identical
          Show
          Hoss Man added a comment - Proposed implementation... Modify the Pivot facet code to check for a "stats" local param if a stats tag name is specified, each time a NamedList<Integer> of "top" terms is computed at a given level of the pivot tree (either single node or on an initial shard request, or as part of a refinement request), each of those terms should be applied as a filter on the main doc set - the result should be used to construct a SimpleStats the SimpleStats class should then be asked to compute the neccessary stats based on the tags SimpleStats will need refactored: probably with a new StatsField class for modeling each stats.field param (and the StatsValues that hang off of it) and new accessor methods to get the list of all {{StatsField}}s so the pivot facet class can find the ones it needs by tag in single node mode, just ask each StatsField to compute the stats in cloud mode: in the shard requests, ask each StatsField to compute it's distributed stats based on the params it knows about in the coordinator request, once the pivot constraints are merged & refined, the refinement phase should merge all of the StatsValues at the same time it's summing up the facet counts. we need to think carefully about "exclusions" because they might be different ie: facet.pivot{!ex=some_fq stats=s1}foo,bar&stats.field={!ex=other_fq}price i think what we want in general is for the "ex" localparam of the stats.field to be ignored when hanging off of a facet.pivot OR: we union the exclusions (so the computed stats are over more docs then what the facet count returns) OR: we fail if they both specify "ex" and they aren't identical
          Hide
          Hoss Man added a comment -

          SimpleStats will need refactored: ...

          FWIW: I've started incorporating some of these ideas into the bug fix i'm doing in SOLR-6507 since it addresses the root problem in that bug and gives us a better place to build on for issues like this.

          Show
          Hoss Man added a comment - SimpleStats will need refactored: ... FWIW: I've started incorporating some of these ideas into the bug fix i'm doing in SOLR-6507 since it addresses the root problem in that bug and gives us a better place to build on for issues like this.
          Hide
          Hoss Man added a comment -

          I'm headed out of town for a week, but before i go – while it's fresh in my head – i wanted to post some notes on what the next steps need to be for this based on the current state of trunk (after the refactoring & cleanup done in recent issues like SOLR-6354 & SOLR-6507)

          next steps

          Single Node Pivot Tests...

          1. aparently we've managed to go this far w/o any simple single-node pivot tests other then SolrExampleTests – which requires solrj support. Since it would be nice to start with some simple proof that single node pivots+stats work, we need to start iwth some tests
          2. we should add simple FacetPivotSmallTest that uses the same basic data and assertions as DistributedFacetPivotSmallTest but with a single solr node and using xpath (instead of depending on solrj).

          Local Pivots + stats...

          1. add some logic & a getter to StatsField to make public a list of the "tags" in it's local params
          2. add a Map<String,List<StatsField>> to StatsInfo to support a new method for looking up the List<StatsField>> corrisponding to a given tag string.
          3. Modify the Pivot facet code to check for a "stats" local param:
            • the value of which may be a comma seperated list of "tags" to lookup with the StatsInfo instance of the current ResponseBuilder to get the StatsField instances we want to hang of of our pivots.
            • if there are some StatsFields to hang off of our pivot, then any code in PivotFacetProcessor which currently calls getSubsetSize() should call getSubset(); and after any call (existing or new) to getSubset() the code should (in addition to adding the set size to the response) pass that DocSet to the StatsField.computeLocalStatsValues and include the resulting StatsValues in the response.
          4. update the previously created FacetPivotSmallTest to also test hanging some stats off of pivots

          SolrJ

          1. update the SolrJ PivotField to support having a List<FieldStatsInfo> in it
          2. update the solrj codecs to know how to populate those if/when the data exists in the response
          3. add some unit tests for this in solrj (no existing unit tests of the pivot or stats object creation from responses???)
          4. update SolrExampleTests to do some pivots+stats and verify that they can be parsed correctly by solrj

          Distributed Pivot + Stats

          1. PivotFacetValue needs to know if/when it hsould have one or more StatsValues in it and get an empty instance from StatsValuesFactory for each of the applicable StatsField instances.
          2. PivotFacetValue.createFromNamedLists needs to recognize when a shard is including a a sub-NamedList of stats data, and for merge in each of those children into the appropriate StatsValues.accumulate(NamedList) (based on StatsField.getKey())
          3. at this point we should be able to update DistributedFacetPivotSmallTest to include the same types of pivot+stats additions that were made to FacetPivotSmallTest for checking the sngle node case, and see distributed pivot+stats working.

          Test, Test, Test

          1. at this point we should be able to update the other distribute pivot tests with pivot + stats cases to make sure we don't find new bugs
          2. adding in stats params & assertions to DistributedFacetPivotLargeTest and DistributedFacetPivotLongTailTest should be straight forward
          3. TestCloudPivotFacet will be more interesting due to the randomization...
            • adding new randomized stats.field params is trival given all the interesting fields already included in the docs
            • with a little record keeping of what stats.field params we add, we can easily tweak the facet.pivot params to includes a {{stats=...} local param to ask for them
            • we'll want a trace param to to know if/when to expect stats in the response (so we don't overlook bugs where stats are never computed/returned)
            • in assertPivotCountsAreCorrect, if stats are expected, then instead of a simple assertNumFound on each of the pivot values, we can actaully assert that when filtering on that pivot value, the stats we get back for the entire (filtered) request match the stats we got back as part of the pivot (ie: don't just check the pivot[count]==numFound, also check pivot[stats][foo][min]==stats[foo][min] and pivot[stats][foo][max]==stats[foo][max], etc...
              • the merge logic should be exact for the min/max/missing/count stats, but we may need some leniency here for the comparisons of some of the computed stats values like sum/mean/stddev/etc... since the order of operations involved in the merge may cause intermediate precision loss

          One final note...

          we need to think carefully about "exclusions" because they might be different

          My current thinking (reflected in the steps i've outlined above) is that we should go this route...

          i think what we want in general is for the "ex" localparam of the stats.field to be ignored when hanging off of a facet.pivot

          Of the 2 alternatives i proposed before:

          • "union the exclusions" – extremeley impractical.
          • "fail if they both specify 'ex' and they aren't identical" – very possible/easy to do if people think it's less confusing, it just requires a bit more code. we can easily go this route if we run into problems and decide it makes the API cleaner.
          Show
          Hoss Man added a comment - I'm headed out of town for a week, but before i go – while it's fresh in my head – i wanted to post some notes on what the next steps need to be for this based on the current state of trunk (after the refactoring & cleanup done in recent issues like SOLR-6354 & SOLR-6507 ) next steps Single Node Pivot Tests... aparently we've managed to go this far w/o any simple single-node pivot tests other then SolrExampleTests – which requires solrj support. Since it would be nice to start with some simple proof that single node pivots+stats work, we need to start iwth some tests we should add simple FacetPivotSmallTest that uses the same basic data and assertions as DistributedFacetPivotSmallTest but with a single solr node and using xpath (instead of depending on solrj). Local Pivots + stats... add some logic & a getter to StatsField to make public a list of the "tags" in it's local params add a Map<String,List<StatsField>> to StatsInfo to support a new method for looking up the List<StatsField>> corrisponding to a given tag string. Modify the Pivot facet code to check for a "stats" local param: the value of which may be a comma seperated list of "tags" to lookup with the StatsInfo instance of the current ResponseBuilder to get the StatsField instances we want to hang of of our pivots. if there are some StatsFields to hang off of our pivot, then any code in PivotFacetProcessor which currently calls getSubsetSize() should call getSubset() ; and after any call (existing or new) to getSubset() the code should (in addition to adding the set size to the response) pass that DocSet to the StatsField.computeLocalStatsValues and include the resulting StatsValues in the response. update the previously created FacetPivotSmallTest to also test hanging some stats off of pivots SolrJ update the SolrJ PivotField to support having a List<FieldStatsInfo> in it update the solrj codecs to know how to populate those if/when the data exists in the response add some unit tests for this in solrj (no existing unit tests of the pivot or stats object creation from responses???) update SolrExampleTests to do some pivots+stats and verify that they can be parsed correctly by solrj Distributed Pivot + Stats PivotFacetValue needs to know if/when it hsould have one or more StatsValues in it and get an empty instance from StatsValuesFactory for each of the applicable StatsField instances. PivotFacetValue.createFromNamedLists needs to recognize when a shard is including a a sub-NamedList of stats data, and for merge in each of those children into the appropriate StatsValues.accumulate(NamedList) (based on StatsField.getKey() ) at this point we should be able to update DistributedFacetPivotSmallTest to include the same types of pivot+stats additions that were made to FacetPivotSmallTest for checking the sngle node case, and see distributed pivot+stats working. Test, Test, Test at this point we should be able to update the other distribute pivot tests with pivot + stats cases to make sure we don't find new bugs adding in stats params & assertions to DistributedFacetPivotLargeTest and DistributedFacetPivotLongTailTest should be straight forward TestCloudPivotFacet will be more interesting due to the randomization... adding new randomized stats.field params is trival given all the interesting fields already included in the docs with a little record keeping of what stats.field params we add, we can easily tweak the facet.pivot params to includes a {{stats=...} local param to ask for them we'll want a trace param to to know if/when to expect stats in the response (so we don't overlook bugs where stats are never computed/returned) in assertPivotCountsAreCorrect , if stats are expected, then instead of a simple assertNumFound on each of the pivot values, we can actaully assert that when filtering on that pivot value, the stats we get back for the entire (filtered) request match the stats we got back as part of the pivot (ie: don't just check the pivot[count]==numFound , also check pivot[stats][foo][min]==stats[foo][min] and pivot[stats][foo][max]==stats[foo][max] , etc... the merge logic should be exact for the min/max/missing/count stats, but we may need some leniency here for the comparisons of some of the computed stats values like sum/mean/stddev/etc... since the order of operations involved in the merge may cause intermediate precision loss One final note... we need to think carefully about "exclusions" because they might be different My current thinking (reflected in the steps i've outlined above) is that we should go this route... i think what we want in general is for the "ex" localparam of the stats.field to be ignored when hanging off of a facet.pivot Of the 2 alternatives i proposed before: "union the exclusions" – extremeley impractical. "fail if they both specify 'ex' and they aren't identical" – very possible/easy to do if people think it's less confusing, it just requires a bit more code. we can easily go this route if we run into problems and decide it makes the API cleaner.
          Hide
          Vitaliy Zhovtyuk added a comment -

          Intermediate results:

          1. Added pivot facet test to SolrExampleTests, extended org.apache.solr.client.solrj.SolrQuery to provide multipls stats.field parameter
          2. Added FacetPivotSmallTest and moved all asserts from DistributedFacetPivotSmallTest to XPath assertions, separated it to few test methods
          3. "tag" local parameter parsing added
          4. added org.apache.solr.handler.component.StatsInfo#tagToStatsFields and org.apache.solr.handler.component.StatsInfo#getStatsFieldsByTag
          to lookup list of stats fields by tag
          5. Modified PivotFacetProcessor to collect and put StatValues for ever pivot field, added test to assert stats value of pivots
          6. Updated PivotField and org.apache.solr.client.solrj.response.QueryResponse to read stats values on pivots

          Show
          Vitaliy Zhovtyuk added a comment - Intermediate results: 1. Added pivot facet test to SolrExampleTests, extended org.apache.solr.client.solrj.SolrQuery to provide multipls stats.field parameter 2. Added FacetPivotSmallTest and moved all asserts from DistributedFacetPivotSmallTest to XPath assertions, separated it to few test methods 3. "tag" local parameter parsing added 4. added org.apache.solr.handler.component.StatsInfo#tagToStatsFields and org.apache.solr.handler.component.StatsInfo#getStatsFieldsByTag to lookup list of stats fields by tag 5. Modified PivotFacetProcessor to collect and put StatValues for ever pivot field, added test to assert stats value of pivots 6. Updated PivotField and org.apache.solr.client.solrj.response.QueryResponse to read stats values on pivots
          Hide
          Hoss Man added a comment -

          Vitaliy: i haven't had a chance to look in depth at your patch, but when i tried to run the tests all of the pivot code seemed to fail with an NPE?

             [junit4] ERROR   30.8s J2 | DistributedFacetPivotSmallTest.testDistribSearch <<<
             [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: java.lang.NullPointerException
             [junit4]    > 	at org.apache.solr.handler.component.PivotFacetProcessor.getStatsFields(PivotFacetProcessor.java:158)
             [junit4]    > 	at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:121)
             [junit4]    > 	at org.apache.solr.handler.component.PivotFacetProcessor.process(PivotFacetProcessor.java:97)
             [junit4]    > 	at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:112)
             [junit4]    > 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226)
          
          ...
          
             [junit4] ERROR   0.07s | FacetPivotSmallTest.testPivotFacetIndexSortMincountLimitAndOffsetPermutations <<<
             [junit4]    > Throwable #1: java.lang.RuntimeException: Exception during query
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([79673644714434B5:6E2351EE52D41611]:0)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:723)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:690)
             [junit4]    > 	at org.apache.solr.handler.component.FacetPivotSmallTest.testPivotFacetIndexSortMincountLimitAndOffsetPermutations(FacetPivotSmallTest.java:425)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]    > Caused by: java.lang.NullPointerException
             [junit4]    > 	at org.apache.solr.handler.component.PivotFacetProcessor.getStatsFields(PivotFacetProcessor.java:158)
             [junit4]    > 	at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:121)
          
          ...
          
          
          Show
          Hoss Man added a comment - Vitaliy: i haven't had a chance to look in depth at your patch, but when i tried to run the tests all of the pivot code seemed to fail with an NPE? [junit4] ERROR 30.8s J2 | DistributedFacetPivotSmallTest.testDistribSearch <<< [junit4] > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: java.lang.NullPointerException [junit4] > at org.apache.solr.handler.component.PivotFacetProcessor.getStatsFields(PivotFacetProcessor.java:158) [junit4] > at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:121) [junit4] > at org.apache.solr.handler.component.PivotFacetProcessor.process(PivotFacetProcessor.java:97) [junit4] > at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:112) [junit4] > at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226) ... [junit4] ERROR 0.07s | FacetPivotSmallTest.testPivotFacetIndexSortMincountLimitAndOffsetPermutations <<< [junit4] > Throwable #1: java.lang.RuntimeException: Exception during query [junit4] > at __randomizedtesting.SeedInfo.seed([79673644714434B5:6E2351EE52D41611]:0) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:723) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:690) [junit4] > at org.apache.solr.handler.component.FacetPivotSmallTest.testPivotFacetIndexSortMincountLimitAndOffsetPermutations(FacetPivotSmallTest.java:425) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > Caused by: java.lang.NullPointerException [junit4] > at org.apache.solr.handler.component.PivotFacetProcessor.getStatsFields(PivotFacetProcessor.java:158) [junit4] > at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:121) ...
          Hide
          Steve Molloy added a comment -

          Adapted patch a bit to avoid NPE when no stats are asked, modified PivotFacetValue to propagate info and other small tweaks. Tests seem to be happy and got some requests to work, so I am too...

          Show
          Steve Molloy added a comment - Adapted patch a bit to avoid NPE when no stats are asked, modified PivotFacetValue to propagate info and other small tweaks. Tests seem to be happy and got some requests to work, so I am too...
          Hide
          Vitaliy Zhovtyuk added a comment -

          Combined with previous patch.
          1. Added more solrj tests for stats on pivots
          2. Fixed stats result
          3. Minor tweaks

          Show
          Vitaliy Zhovtyuk added a comment - Combined with previous patch. 1. Added more solrj tests for stats on pivots 2. Fixed stats result 3. Minor tweaks
          Hide
          Steve Molloy added a comment -

          Vitaliy Zhovtyuk Does your patch contain changes form mine? There were some NPE as Hoss mentioned which I think I got fixed. I like the addition of tests though, so hope to get the best of both. I don't mind providing a patch combining both patches, just want to avoid us posting at about the same time again.

          Show
          Steve Molloy added a comment - Vitaliy Zhovtyuk Does your patch contain changes form mine? There were some NPE as Hoss mentioned which I think I got fixed. I like the addition of tests though, so hope to get the best of both. I don't mind providing a patch combining both patches, just want to avoid us posting at about the same time again.
          Hide
          Steve Molloy added a comment -

          Ok, applied locally and see that most is combined. One thing though, what is expected behavior for pivots where count is 0? Currently, you'll get the full entry with NaN, infinity and such in it. Should it be null or empty instead? Or should it even show up at all?

          Show
          Steve Molloy added a comment - Ok, applied locally and see that most is combined. One thing though, what is expected behavior for pivots where count is 0? Currently, you'll get the full entry with NaN, infinity and such in it. Should it be null or empty instead? Or should it even show up at all?
          Hide
          Hoss Man added a comment -

          One thing though, what is expected behavior for pivots where count is 0? Currently, you'll get the full entry with NaN, infinity and such in it. Should it be null or empty instead? Or should it even show up at all?

          Great question.

          I think in this case we should leave out "stats" subsection completely for brevity – similar to how the "pivot" subsection is left out whenever there are no sub-pivots to report counts for. it's also mostly consistent with the common case of sub-pivots when the parent count is 0 in distributed pivots since mincount=0 isn't really viable (SOLR-6329).

          But i could be persuaded that we should leave in an empty 'stats' section ... a stats section full of fields reporting a bunch of NaN and Infinity counts seems likeit's just asking for causing user error though.

          Show
          Hoss Man added a comment - One thing though, what is expected behavior for pivots where count is 0? Currently, you'll get the full entry with NaN, infinity and such in it. Should it be null or empty instead? Or should it even show up at all? Great question. I think in this case we should leave out "stats" subsection completely for brevity – similar to how the "pivot" subsection is left out whenever there are no sub-pivots to report counts for. it's also mostly consistent with the common case of sub-pivots when the parent count is 0 in distributed pivots since mincount=0 isn't really viable ( SOLR-6329 ). But i could be persuaded that we should leave in an empty 'stats' section ... a stats section full of fields reporting a bunch of NaN and Infinity counts seems likeit's just asking for causing user error though.
          Hide
          Steve Molloy added a comment -

          Augmented previous 3 patches, added logic to not include stats entry if it's empty, fixed distributed logic by actually merging stats from shards. Currently have unit tests failing in solrj that I need to look at.

          Show
          Steve Molloy added a comment - Augmented previous 3 patches, added logic to not include stats entry if it's empty, fixed distributed logic by actually merging stats from shards. Currently have unit tests failing in solrj that I need to look at.
          Hide
          Hoss Man added a comment -

          i haven't done an extensive review, but here's some quick comments/questions based on skim of the latest patch...

          1) this block of new code in PivotFacetProcessor (which pops up twice at diff points in the patch?) doesn't make any sense to me ... the StatsValues returned from computeLocalStatsValues() is totally ignored ?

          +      for(StatsField statsField : statsFields) {
          +         statsField.computeLocalStatsValues(docSet);
          +      }
          

          2) i don't think StatsValues.hasValues() really makes sense ... we shouldn't be computing StatsValues against the subset and then adding them to the response if and only if they hasValues() – we should instead be skipping the computation of the StatsValues completely unless the pivot subset is non-empty.

          this isn't just a question of optimizing away the stats computation – it's a very real difference in the fundamental logic. there could be a non-empty set of documents (ie: pivot count > 0), but the stats we've been asked to compute (ie: over some field X) might result in a stats count that's 0 (if none of hte docs in that set have a value in field X) in which case we should still include the stats in the response.

          3) why is a CommonParams.STATS constant being added? isn't this what StatsParams.STATS is for?

          4) i'm not really understanding the point of the two new SolrExampleTests.testPivotFacetsStatsNotSupported* methods ... what's the goal behind having these tests?
          If nothing else, as things stand right now, these seem like they make really brittle assumptions about the exact error message they expect – we should change them to use substring/regex to sanity check just the key pieces of information we care about finding in the error message.
          We should also assert that the error code on these exceptions is definitely a 4xx error and not a 5xx

          Show
          Hoss Man added a comment - i haven't done an extensive review, but here's some quick comments/questions based on skim of the latest patch... 1) this block of new code in PivotFacetProcessor (which pops up twice at diff points in the patch?) doesn't make any sense to me ... the StatsValues returned from computeLocalStatsValues() is totally ignored ? + for(StatsField statsField : statsFields) { + statsField.computeLocalStatsValues(docSet); + } 2) i don't think StatsValues.hasValues() really makes sense ... we shouldn't be computing StatsValues against the subset and then adding them to the response if and only if they hasValues() – we should instead be skipping the computation of the StatsValues completely unless the pivot subset is non-empty. this isn't just a question of optimizing away the stats computation – it's a very real difference in the fundamental logic. there could be a non-empty set of documents (ie: pivot count > 0), but the stats we've been asked to compute (ie: over some field X) might result in a stats count that's 0 (if none of hte docs in that set have a value in field X) in which case we should still include the stats in the response. 3) why is a CommonParams.STATS constant being added? isn't this what StatsParams.STATS is for? 4) i'm not really understanding the point of the two new SolrExampleTests.testPivotFacetsStatsNotSupported* methods ... what's the goal behind having these tests? If nothing else, as things stand right now, these seem like they make really brittle assumptions about the exact error message they expect – we should change them to use substring/regex to sanity check just the key pieces of information we care about finding in the error message. We should also assert that the error code on these exceptions is definitely a 4xx error and not a 5xx
          Hide
          Steve Molloy added a comment -

          Addressing some comments. Remove unused for-loop and CommonParams.STATS. Didn't touch the notSupprted test methods, will let Vitaliy a chance to speak for their usefulness. Also reverted the hasValues logic to replace it with checking if current pivot has positive count. Although it does produce some stats entries with Infinity minimum/maximum and NaN mean. This is what I was asking about before, I think I misunderstood the answer, but it still seems error-prone to have such entries...

          Finally, I updated some of the outputs to use NamedList instead of maps so that solrj binary works better. Did have to sort fields in QueryResponse to get tests to pass. Not sure this is the best way, but would sometimes get them out of order if I didn't.

          Show
          Steve Molloy added a comment - Addressing some comments. Remove unused for-loop and CommonParams.STATS. Didn't touch the notSupprted test methods, will let Vitaliy a chance to speak for their usefulness. Also reverted the hasValues logic to replace it with checking if current pivot has positive count. Although it does produce some stats entries with Infinity minimum/maximum and NaN mean. This is what I was asking about before, I think I misunderstood the answer, but it still seems error-prone to have such entries... Finally, I updated some of the outputs to use NamedList instead of maps so that solrj binary works better. Did have to sort fields in QueryResponse to get tests to pass. Not sure this is the best way, but would sometimes get them out of order if I didn't.
          Hide
          Vitaliy Zhovtyuk added a comment - - edited

          1. Added stats fields pivot distribution to DistributedFacetPivotSmallTest (same data and values from org.apache.solr.handler.component.FacetPivotSmallTest used)
          2. Fixed "LinkedHashMap cannot be casted to NamedList" exception, occurring on stats distribution (changed org.apache.solr.handler.component.PivotFacetHelper#convertStatsValuesToNamedList)

          3. About Testing for not supported types, i think we need to cover/document limitations as well.I'm not happy with asserting error message.
          Added http code 400 assertion and error message substring assertion.

          Show
          Vitaliy Zhovtyuk added a comment - - edited 1. Added stats fields pivot distribution to DistributedFacetPivotSmallTest (same data and values from org.apache.solr.handler.component.FacetPivotSmallTest used) 2. Fixed "LinkedHashMap cannot be casted to NamedList" exception, occurring on stats distribution (changed org.apache.solr.handler.component.PivotFacetHelper#convertStatsValuesToNamedList) 3. About Testing for not supported types, i think we need to cover/document limitations as well.I'm not happy with asserting error message. Added http code 400 assertion and error message substring assertion.
          Hide
          Steve Molloy added a comment -

          One more question around this, which applies for SOLR-6353 and SOLR-4212 as well. Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy? It would reduce amount of computation and size of response for cases where you only need it at a specific level (usually last level I guess). Something like:
          facet.pivot=

          {!stats=s1,s2}

          field1,field2
          We could us * for all levels, or something like:
          facet.pivot=

          {!stats=,,s3}

          field1,field2,field3
          to only apply at 3rd level.

          Show
          Steve Molloy added a comment - One more question around this, which applies for SOLR-6353 and SOLR-4212 as well. Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy? It would reduce amount of computation and size of response for cases where you only need it at a specific level (usually last level I guess). Something like: facet.pivot= {!stats=s1,s2} field1,field2 We could us * for all levels, or something like: facet.pivot= {!stats=,,s3} field1,field2,field3 to only apply at 3rd level.
          Hide
          Hoss Man added a comment -

          I haven't had a chance to look at the recent patches, and i probably won't today, but i wanted to post some quick replies to a few comments/questions...


          Also reverted the hasValues logic to replace it with checking if current pivot has positive count. Although it does produce some stats entries with Infinity minimum/maximum and NaN mean. This is what I was asking about before, I think I misunderstood the answer, but it still seems error-prone to have such entries...

          You may be right ... lemme talk through my thinking on this and see where we wind up:

          Forgeting about pivots for a moment, and just think about the idea of computing some arbitrary stat over a set of documents. ie: you've got N documents that match some arbitrary query, and then you want to compute stats on the "price" and "popularity" fields .. what should happen if none of those documents has a "popularity" field at all?

          my thinking, was that:

          • the behavior when hanging stats off pivots should mirror that of regular stats
          • if you ask for a stats block , you should always get that block, so the client doesn't have to conditionally check if it's there before looking at the values.
          • the included stat values matter even if no doc has the stats.field, becuase one of the stats is in fact "missing" and that if you ask for stats, you should be able to look at that missing count. (and it should match up with your doc set size if the field is completely missing, etc...)

          but looking at an example of this now, i see that for simple field stats (w/o pivots), that's not even how it currently works – consider this URL using hte example data...

          http://localhost:8983/solr/select?rows=0&q=name:utf&stats.field=foo_d&stats.field=popularity&stats=true

          • foo_d doesn't exist in the index.
          • popularity does exist in the index, but the one doc matching the query doesn't have a value in that field.

          I thought (and still think) that the "correct" behavior for this query would be to get a stats block back for those fields where things like min/max/mean are "null", count==0, and missing=1 ... but that's not how it currently works.

          so i guess the question is: is the current general stats behavior a "bug" that should be fixed, or is this the "correct" way to deal with stats when none of the documents have a value (and thus: the behavior of your "hasValue" logic was correct) ?

          i'm leaning towards" bug" ... but i'd like to think about it more and hear how others feel..


          One more question around this, which applies for SOLR-6353 and SOLR-4212 as well. Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy? It would reduce amount of computation and size of response for cases where you only need it at a specific level (usually last level I guess).

          That's a great question ... honestly it's not something i ever really thought about.

          One quick thing i will point out: the size of the response shouldn't really be a huge factor in our decisions here, because with SOLR-6349 (which i'll hopefully have a patch for in the next day or so) the response will only need to include the stats people actually care about, andsask for, so the typical result size should be much smaller.

          But you've got a good point about amount of computation done/returned at levels that people may not care about ... in my head, it seemed to make sense that the stats (and ranges, etc...) should be computed at every level just like the pivot count size – but at the intermediate levels that count is a "free" computation based on the size of the subset, and but i suspect you are correct: may people may only care about having these new stats/ranges/query on the leaves in the common case.

          I'm not really following your suggested syntax though ... you seem to be saying that in the "stats" local param, commas would be used to delimit "levels" of the pivot (corresponding to the commas in the list of pivot fields) but then i'm not really clear what you mean about using "*" (if that means all levels, how do you know what tag name to use?

          in the original examples i porposed, i was thinking that a comma seperated list could refer to multiple tag names, wimilar to how the "exlcusions" work – ie..

          facet.pivot={!stats=prices,ratings}category,manufacturer
          facet.pivot={!stats=prices,pop}reseller
          stats.field={!key=avg_list_price tag=prices mean=true}list_price
          stats.field={!tag=ratings min=true max=true}user_rating
          stats.field={!tag=ratings min=true max=true}editors_rating
          stats.field={!tag=prices min=true max=true}sale_price
          stats.field={!tag=pop}weekly_tweets
          stats.field={!tag=pop}weekly_page_views
          

          ...would result in the "category,manufacturer" pivot having stats on "avg_list_price, sale_price, user_rating, & editors_rating" while the "reseller" pivot would have stats on "avg_list_price, sale_price, weekly_tweets, & weekly_page_views"

          Thinking about it now though, if we support multiple tag names on stats.field, the same thing could be supported like this...

          facet.pivot={!stats=cm_s}category,manufacturer
          facet.pivot={!stats=r_s}reseller
          stats.field={!key=avg_list_price tag=cm_s,r_s mean=true}list_price
          stats.field={!tag=cm_s min=true max=true}user_rating
          stats.field={!tag=cm_s min=true max=true}editors_rating
          stats.field={!tag=cm_s,r_s min=true max=true}sale_price
          stats.field={!tag=r_s}weekly_tweets
          stats.field={!tag=r_s}weekly_page_views
          

          So ... if we did that, then we could start using "position" info in a comma seperated list of tag names to refer to where in the pivot "depth" those stats/ranges/queries should be computed ... the question i have is "should we" ? .. in the context of a facet.pivot param, will it be obvious to folks that there is a maping between the commas in these local params and hte commas in hte bod of the facet.pivot param, or will it confuse people who are use to seeing comma as just a way of delimiting multiple values in tag/ex params?

          my opinion: no freaking clue at the moment ... need to let it soak in my brain.

          Show
          Hoss Man added a comment - I haven't had a chance to look at the recent patches, and i probably won't today, but i wanted to post some quick replies to a few comments/questions... Also reverted the hasValues logic to replace it with checking if current pivot has positive count. Although it does produce some stats entries with Infinity minimum/maximum and NaN mean. This is what I was asking about before, I think I misunderstood the answer, but it still seems error-prone to have such entries... You may be right ... lemme talk through my thinking on this and see where we wind up: Forgeting about pivots for a moment, and just think about the idea of computing some arbitrary stat over a set of documents. ie: you've got N documents that match some arbitrary query, and then you want to compute stats on the "price" and "popularity" fields .. what should happen if none of those documents has a "popularity" field at all? my thinking, was that: the behavior when hanging stats off pivots should mirror that of regular stats if you ask for a stats block , you should always get that block, so the client doesn't have to conditionally check if it's there before looking at the values. the included stat values matter even if no doc has the stats.field, becuase one of the stats is in fact "missing" and that if you ask for stats, you should be able to look at that missing count. (and it should match up with your doc set size if the field is completely missing, etc...) but looking at an example of this now, i see that for simple field stats (w/o pivots), that's not even how it currently works – consider this URL using hte example data... http://localhost:8983/solr/select?rows=0&q=name:utf&stats.field=foo_d&stats.field=popularity&stats=true foo_d doesn't exist in the index. popularity does exist in the index, but the one doc matching the query doesn't have a value in that field. I thought (and still think) that the "correct" behavior for this query would be to get a stats block back for those fields where things like min/max/mean are "null", count==0, and missing=1 ... but that's not how it currently works. so i guess the question is: is the current general stats behavior a "bug" that should be fixed, or is this the "correct" way to deal with stats when none of the documents have a value (and thus: the behavior of your "hasValue" logic was correct) ? i'm leaning towards" bug" ... but i'd like to think about it more and hear how others feel.. One more question around this, which applies for SOLR-6353 and SOLR-4212 as well. Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy? It would reduce amount of computation and size of response for cases where you only need it at a specific level (usually last level I guess). That's a great question ... honestly it's not something i ever really thought about. One quick thing i will point out: the size of the response shouldn't really be a huge factor in our decisions here, because with SOLR-6349 (which i'll hopefully have a patch for in the next day or so) the response will only need to include the stats people actually care about, andsask for, so the typical result size should be much smaller. But you've got a good point about amount of computation done/returned at levels that people may not care about ... in my head, it seemed to make sense that the stats (and ranges, etc...) should be computed at every level just like the pivot count size – but at the intermediate levels that count is a "free" computation based on the size of the subset, and but i suspect you are correct: may people may only care about having these new stats/ranges/query on the leaves in the common case. I'm not really following your suggested syntax though ... you seem to be saying that in the "stats" local param, commas would be used to delimit "levels" of the pivot (corresponding to the commas in the list of pivot fields) but then i'm not really clear what you mean about using "*" (if that means all levels, how do you know what tag name to use? in the original examples i porposed, i was thinking that a comma seperated list could refer to multiple tag names, wimilar to how the "exlcusions" work – ie.. facet.pivot={!stats=prices,ratings}category,manufacturer facet.pivot={!stats=prices,pop}reseller stats.field={!key=avg_list_price tag=prices mean=true}list_price stats.field={!tag=ratings min=true max=true}user_rating stats.field={!tag=ratings min=true max=true}editors_rating stats.field={!tag=prices min=true max=true}sale_price stats.field={!tag=pop}weekly_tweets stats.field={!tag=pop}weekly_page_views ...would result in the "category,manufacturer" pivot having stats on "avg_list_price, sale_price, user_rating, & editors_rating" while the "reseller" pivot would have stats on "avg_list_price, sale_price, weekly_tweets, & weekly_page_views" Thinking about it now though, if we support multiple tag names on stats.field, the same thing could be supported like this... facet.pivot={!stats=cm_s}category,manufacturer facet.pivot={!stats=r_s}reseller stats.field={!key=avg_list_price tag=cm_s,r_s mean=true}list_price stats.field={!tag=cm_s min=true max=true}user_rating stats.field={!tag=cm_s min=true max=true}editors_rating stats.field={!tag=cm_s,r_s min=true max=true}sale_price stats.field={!tag=r_s}weekly_tweets stats.field={!tag=r_s}weekly_page_views So ... if we did that, then we could start using "position" info in a comma seperated list of tag names to refer to where in the pivot "depth" those stats/ranges/queries should be computed ... the question i have is "should we" ? .. in the context of a facet.pivot param, will it be obvious to folks that there is a maping between the commas in these local params and hte commas in hte bod of the facet.pivot param, or will it confuse people who are use to seeing comma as just a way of delimiting multiple values in tag/ex params? my opinion: no freaking clue at the moment ... need to let it soak in my brain.
          Hide
          Hoss Man added a comment -

          so i guess the question is: is the current general stats behavior a "bug" that should be fixed, or is this the "correct" way to deal with stats when none of the documents have a value (and thus: the behavior of your "hasValue" logic was correct) ?

          i'm leaning towards" bug" ... but i'd like to think about it more and hear how others feel..

          More on this here: https://issues.apache.org/jira/browse/SOLR-6349?focusedCommentId=14165856&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14165856

          The "right thing to do" for now is almost certainly to make the new code in SOLR-6351 for handing stats off pivots consistent with the existing code in StatsComponent...

          if (isShard == true || (Long) stv.get("count") > 0) {
            stats_fields.add(statsField.getOutputKey(), stv);
          } else {
            stats_fields.add(statsField.getOutputKey(), null);
          }
          

          ...then later, probably in SOLR-6349 where depending on the "count" stat to return other stats is going to be a blocker problem, we can revisit the question.

          Show
          Hoss Man added a comment - so i guess the question is: is the current general stats behavior a "bug" that should be fixed, or is this the "correct" way to deal with stats when none of the documents have a value (and thus: the behavior of your "hasValue" logic was correct) ? i'm leaning towards" bug" ... but i'd like to think about it more and hear how others feel.. More on this here: https://issues.apache.org/jira/browse/SOLR-6349?focusedCommentId=14165856&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14165856 The "right thing to do" for now is almost certainly to make the new code in SOLR-6351 for handing stats off pivots consistent with the existing code in StatsComponent... if (isShard == true || ( Long ) stv.get( "count" ) > 0) { stats_fields.add(statsField.getOutputKey(), stv); } else { stats_fields.add(statsField.getOutputKey(), null ); } ...then later, probably in SOLR-6349 where depending on the "count" stat to return other stats is going to be a blocker problem, we can revisit the question.
          Hide
          Vitaliy Zhovtyuk added a comment -

          During work on

          org.apache.solr.handler.component.DistributedFacetPivotLargeTest found:
          junit.framework.AssertionFailedError: .facet_counts.facet_pivot.place_s,company_t[0].stats!=pivot (unordered or missing)
          	at __randomizedtesting.SeedInfo.seed([705F7E1C2B9679AA:F1B9F0045CC91996]:0)
          	at org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:842)
          	at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:861)
          	at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:562)
          	

          This mean difference in named list order between control shard and random shard.
          Found the reason in org.apache.solr.handler.component.PivotFacetValue#convertToNamedList

          Due to this reason also updated DistributedFacetPivotSmallTest to use call org.apache.solr.BaseDistributedSearchTestCase#query(org.apache.solr.common.params.SolrParams) with response comparison.

          Test org.apache.solr.handler.component.DistributedFacetPivotLongTailTest works only on string fields, added int field to make stats on it.

          org.apache.solr.cloud.TestCloudPivotFacet: added buildRandomPivotStatsFields to build stats.filed list, this methods skip fields of string and boolean type since they are not supported.
          added tag string random generation on stat fields generated and if stats active tags also added to pivot fields.

          Added handling for not present controls stats (count=0) in TestCloudPivotFacet.
          About skipping stats if count=0, it is not really good cause we losing missing stats distribution.
          Case with count=0, but missing=1 is real.
          There are still some failures for TestCloudPivotFacet for date and double (precision?). Will check it tomorrow.

          Show
          Vitaliy Zhovtyuk added a comment - During work on org.apache.solr.handler.component.DistributedFacetPivotLargeTest found: junit.framework.AssertionFailedError: .facet_counts.facet_pivot.place_s,company_t[0].stats!=pivot (unordered or missing) at __randomizedtesting.SeedInfo.seed([705F7E1C2B9679AA:F1B9F0045CC91996]:0) at org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:842) at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:861) at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:562) This mean difference in named list order between control shard and random shard. Found the reason in org.apache.solr.handler.component.PivotFacetValue#convertToNamedList Due to this reason also updated DistributedFacetPivotSmallTest to use call org.apache.solr.BaseDistributedSearchTestCase#query(org.apache.solr.common.params.SolrParams) with response comparison. Test org.apache.solr.handler.component.DistributedFacetPivotLongTailTest works only on string fields, added int field to make stats on it. org.apache.solr.cloud.TestCloudPivotFacet: added buildRandomPivotStatsFields to build stats.filed list, this methods skip fields of string and boolean type since they are not supported. added tag string random generation on stat fields generated and if stats active tags also added to pivot fields. Added handling for not present controls stats (count=0) in TestCloudPivotFacet. About skipping stats if count=0, it is not really good cause we losing missing stats distribution. Case with count=0, but missing=1 is real. There are still some failures for TestCloudPivotFacet for date and double (precision?). Will check it tomorrow.
          Hide
          Vitaliy Zhovtyuk added a comment -

          Fixed TestCloudPivotFacet. The reason for previous random test failures were facet.limit, facet.offset, facet.overrequest.count, facet.overrequest.ratio parameters generated randomly,
          this was leading to inconsistent stats with pivot stats. Added cleanup for those parameters before stats on pivots test. All tests are passing.

          Show
          Vitaliy Zhovtyuk added a comment - Fixed TestCloudPivotFacet. The reason for previous random test failures were facet.limit, facet.offset, facet.overrequest.count, facet.overrequest.ratio parameters generated randomly, this was leading to inconsistent stats with pivot stats. Added cleanup for those parameters before stats on pivots test. All tests are passing.
          Hide
          Hoss Man added a comment -

          The reason for previous random test failures were facet.limit, facet.offset, facet.overrequest.count, facet.overrequest.ratio parameters generated randomly, this was leading to inconsistent stats with pivot stats.

          ...that comment didn't really make sense to me based on how the code (should) work, so i asked Vitaliy about it on IRC this morning. Talking it through with him neatiher one of us could explain conceptually why it should matter if those params were used – once a given pivot constraint is selected to be returned to the client, the "sausage" of why/how that constraint was selected shouldn't affect the "stats" associated with that constraint – from the point of view of the stats code, there's just a DocSet (on each shard) representing a subset of the full set of matches constrained by a term filter (the pivot constraint), and those (per-pivot-constrain) stats can then be merged on the coordinator node just like the top level stats.

          So we ended the conversation with the assumption that there must either be a bug in the test code that validates the randomly generated pivots+stats, or there must be a bug in the actual pivot+stats logic.

          After reading over the changes to TestCloudPivotFacets i couldn't spot any obvious test flaws, so i started doing some manual testing and i think i've uncovered the problem: It looks like Vitaliy's new code doesn't account for stats returned by a shard in response to refinement requests.

          This is pretty easy to reproduce if you spin up a 2 node system (ports 7777 & 8888) using hte example configs, then...

          • add few docs to each node
            curl -sS 'http://localhost:7777/solr/collection1/update?commit=true' -H 'Content-Type: application/json' --data-binary '
            [{"id": 71, "foo_s": "aaa", "bar_i": 1}, 
             {"id": 72, "foo_s": "aaa", "bar_i": 20}, 
             {'id': 73, "foo_s": "bbb", "bar_i": 300}]
            '
            curl -sS 'http://localhost:8888/solr/collection1/update?commit=true' -H 'Content-Type: application/json' --data-binary '
            [{"id": 81, "foo_s": "bbb", "bar_i": 4000}, 
             {"id": 82, "foo_s": "bbb", "bar_i": 50000}, 
             {'id': 83, "foo_s": "aaa", "bar_i": 600000}]
            '
            

            ...note that "aaa" is the dominant term in the 7777 node, but "bbb" is the dominant term in the 8888 node.

          • do a simple pivot query + stats – the default over request is more then enough to resolve pivots fuly in a single pass...
            curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0'
            
            {
              "responseHeader":{
                "status":0,
                "QTime":182,
                "params":{
                  "facet":"true",
                  "shards":"localhost:8888/solr,localhost:7777/solr",
                  "indent":"true",
                  "stats":"true",
                  "stats.field":"{!tag=sss}bar_i",
                  "q":"*:*",
                  "wt":"json",
                  "facet.pivot":"{!stats=sss}foo_s",
                  "rows":"0"}},
              "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
              },
              "facet_counts":{
                "facet_queries":{},
                "facet_fields":{},
                "facet_dates":{},
                "facet_ranges":{},
                "facet_intervals":{},
                "facet_pivot":{
                  "foo_s":[{
                      "field":"foo_s",
                      "value":"aaa",
                      "count":3,
                      "stats":{
                        "stats_fields":{
                          "bar_i":{
                            "min":1.0,
                            "max":600000.0,
                            "count":3,
                            "missing":0,
                            "sum":600021.0,
                            "sumOfSquares":3.60000000401E11,
                            "mean":200007.0,
                            "stddev":346404.0994662159,
                            "facets":{}}}}},
                    {
                      "field":"foo_s",
                      "value":"bbb",
                      "count":3,
                      "stats":{
                        "stats_fields":{
                          "bar_i":{
                            "min":300.0,
                            "max":50000.0,
                            "count":3,
                            "missing":0,
                            "sum":54300.0,
                            "sumOfSquares":2.51609E9,
                            "mean":18100.0,
                            "stddev":27688.08407961808,
                            "facets":{}}}}}]}},
                ....
            

            ...note that the stats for "bar_i" under pivot constraints "aaa" and "bbb" look correct (at least min/max/count/sum do ... i'm assuming the rest are as well)

          • do the same query again, but set facet.limit=1 – note the stats for 'aaa' are exactly the same as before due to the default overrequest values still resulting in no refinement queries needed...
            curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0&facet.limit=1
            
            {
              "responseHeader":{
                "status":0,
                "QTime":14,
                "params":{
                  "facet":"true",
                  "shards":"localhost:8888/solr,localhost:7777/solr",
                  "indent":"true",
                  "stats":"true",
                  "stats.field":"{!tag=sss}bar_i",
                  "q":"*:*",
                  "facet.limit":"1",
                  "wt":"json",
                  "facet.pivot":"{!stats=sss}foo_s",
                  "rows":"0"}},
              "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
              },
              "facet_counts":{
                "facet_queries":{},
                "facet_fields":{},
                "facet_dates":{},
                "facet_ranges":{},
                "facet_intervals":{},
                "facet_pivot":{
                  "foo_s":[{
                      "field":"foo_s",
                      "value":"aaa",
                      "count":3,
                      "stats":{
                        "stats_fields":{
                          "bar_i":{
                            "min":1.0,
                            "max":600000.0,
                            "count":3,
                            "missing":0,
                            "sum":600021.0,
                            "sumOfSquares":3.60000000401E11,
                            "mean":200007.0,
                            "stddev":346404.0994662159,
                            "facets":{}}}}}]}},
                ....
            
          • now disable overrequesting, so that a refinement request must happen – now the numbers are wrong. it's obvious from the min/max/sum that the stats are only coming from the node '7777' that returned 'aaa' as a pivot constraint on the initial request, and no new stats were merged in after the refinement request to localhost:8888...
            curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0&facet.limit=1&facet.overrequest.count=0&facet.overrequest.ratio=0'
            
            {
              "responseHeader":{
                "status":0,
                "QTime":29,
                "params":{
                  "facet.overrequest.count":"0",
                  "facet":"true",
                  "shards":"localhost:8888/solr,localhost:7777/solr",
                  "indent":"true",
                  "stats":"true",
                  "stats.field":"{!tag=sss}bar_i",
                  "q":"*:*",
                  "facet.limit":"1",
                  "facet.overrequest.ratio":"0",
                  "wt":"json",
                  "facet.pivot":"{!stats=sss}foo_s",
                  "rows":"0"}},
              "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[]
              },
              "facet_counts":{
                "facet_queries":{},
                "facet_fields":{},
                "facet_dates":{},
                "facet_ranges":{},
                "facet_intervals":{},
                "facet_pivot":{
                  "foo_s":[{
                      "field":"foo_s",
                      "value":"aaa",
                      "count":3,
                      "stats":{
                        "stats_fields":{
                          "bar_i":{
                            "min":1.0,
                            "max":20.0,
                            "count":2,
                            "missing":0,
                            "sum":21.0,
                            "sumOfSquares":401.0,
                            "mean":10.5,
                            "stddev":13.435028842544403,
                            "facets":{}}}}}]}},
                ....
            

          I know some of the existing pivot tests (like DistributedFacetPivotLongTailTest & DistributedFacetPivotLargeTest) have sections that ensure refinement is working – if we update those sections to also include some stats & assertions on those stats we should be able to reliably reproduce this bug and then work on fixing it.

          Show
          Hoss Man added a comment - The reason for previous random test failures were facet.limit, facet.offset, facet.overrequest.count, facet.overrequest.ratio parameters generated randomly, this was leading to inconsistent stats with pivot stats. ...that comment didn't really make sense to me based on how the code (should) work, so i asked Vitaliy about it on IRC this morning. Talking it through with him neatiher one of us could explain conceptually why it should matter if those params were used – once a given pivot constraint is selected to be returned to the client, the "sausage" of why/how that constraint was selected shouldn't affect the "stats" associated with that constraint – from the point of view of the stats code, there's just a DocSet (on each shard) representing a subset of the full set of matches constrained by a term filter (the pivot constraint), and those (per-pivot-constrain) stats can then be merged on the coordinator node just like the top level stats. So we ended the conversation with the assumption that there must either be a bug in the test code that validates the randomly generated pivots+stats, or there must be a bug in the actual pivot+stats logic. After reading over the changes to TestCloudPivotFacets i couldn't spot any obvious test flaws, so i started doing some manual testing and i think i've uncovered the problem: It looks like Vitaliy's new code doesn't account for stats returned by a shard in response to refinement requests. This is pretty easy to reproduce if you spin up a 2 node system (ports 7777 & 8888) using hte example configs, then... add few docs to each node curl -sS 'http://localhost:7777/solr/collection1/update?commit=true' -H 'Content-Type: application/json' --data-binary ' [{"id": 71, "foo_s": "aaa", "bar_i": 1}, {"id": 72, "foo_s": "aaa", "bar_i": 20}, {'id': 73, "foo_s": "bbb", "bar_i": 300}] ' curl -sS 'http://localhost:8888/solr/collection1/update?commit=true' -H 'Content-Type: application/json' --data-binary ' [{"id": 81, "foo_s": "bbb", "bar_i": 4000}, {"id": 82, "foo_s": "bbb", "bar_i": 50000}, {'id': 83, "foo_s": "aaa", "bar_i": 600000}] ' ...note that "aaa" is the dominant term in the 7777 node, but "bbb" is the dominant term in the 8888 node. do a simple pivot query + stats – the default over request is more then enough to resolve pivots fuly in a single pass... curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0' { "responseHeader":{ "status":0, "QTime":182, "params":{ "facet":"true", "shards":"localhost:8888/solr,localhost:7777/solr", "indent":"true", "stats":"true", "stats.field":"{!tag=sss}bar_i", "q":"*:*", "wt":"json", "facet.pivot":"{!stats=sss}foo_s", "rows":"0"}}, "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "foo_s":[{ "field":"foo_s", "value":"aaa", "count":3, "stats":{ "stats_fields":{ "bar_i":{ "min":1.0, "max":600000.0, "count":3, "missing":0, "sum":600021.0, "sumOfSquares":3.60000000401E11, "mean":200007.0, "stddev":346404.0994662159, "facets":{}}}}}, { "field":"foo_s", "value":"bbb", "count":3, "stats":{ "stats_fields":{ "bar_i":{ "min":300.0, "max":50000.0, "count":3, "missing":0, "sum":54300.0, "sumOfSquares":2.51609E9, "mean":18100.0, "stddev":27688.08407961808, "facets":{}}}}}]}}, .... ...note that the stats for "bar_i" under pivot constraints "aaa" and "bbb" look correct (at least min/max/count/sum do ... i'm assuming the rest are as well) do the same query again, but set facet.limit=1 – note the stats for 'aaa' are exactly the same as before due to the default overrequest values still resulting in no refinement queries needed... curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0&facet.limit=1 { "responseHeader":{ "status":0, "QTime":14, "params":{ "facet":"true", "shards":"localhost:8888/solr,localhost:7777/solr", "indent":"true", "stats":"true", "stats.field":"{!tag=sss}bar_i", "q":"*:*", "facet.limit":"1", "wt":"json", "facet.pivot":"{!stats=sss}foo_s", "rows":"0"}}, "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "foo_s":[{ "field":"foo_s", "value":"aaa", "count":3, "stats":{ "stats_fields":{ "bar_i":{ "min":1.0, "max":600000.0, "count":3, "missing":0, "sum":600021.0, "sumOfSquares":3.60000000401E11, "mean":200007.0, "stddev":346404.0994662159, "facets":{}}}}}]}}, .... now disable overrequesting, so that a refinement request must happen – now the numbers are wrong. it's obvious from the min/max/sum that the stats are only coming from the node '7777' that returned 'aaa' as a pivot constraint on the initial request, and no new stats were merged in after the refinement request to localhost:8888... curl -sS 'http://localhost:8888/solr/collection1/select?q=*:*&shards=localhost:8888/solr,localhost:7777/solr&facet.pivot=\{!stats=sss\}foo_s&stats.field=\{!tag=sss\}bar_i&facet=true&stats=true&wt=json&indent=true&rows=0&facet.limit=1&facet.overrequest.count=0&facet.overrequest.ratio=0' { "responseHeader":{ "status":0, "QTime":29, "params":{ "facet.overrequest.count":"0", "facet":"true", "shards":"localhost:8888/solr,localhost:7777/solr", "indent":"true", "stats":"true", "stats.field":"{!tag=sss}bar_i", "q":"*:*", "facet.limit":"1", "facet.overrequest.ratio":"0", "wt":"json", "facet.pivot":"{!stats=sss}foo_s", "rows":"0"}}, "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "foo_s":[{ "field":"foo_s", "value":"aaa", "count":3, "stats":{ "stats_fields":{ "bar_i":{ "min":1.0, "max":20.0, "count":2, "missing":0, "sum":21.0, "sumOfSquares":401.0, "mean":10.5, "stddev":13.435028842544403, "facets":{}}}}}]}}, .... I know some of the existing pivot tests (like DistributedFacetPivotLongTailTest & DistributedFacetPivotLargeTest) have sections that ensure refinement is working – if we update those sections to also include some stats & assertions on those stats we should be able to reliably reproduce this bug and then work on fixing it.
          Hide
          Hoss Man added a comment -

          It looks like Vitaliy's new code doesn't account for stats returned by a shard in response to refinement requests.

          I updated DistributedFacetPivotLongTailTest to check stats on a request where refinement was required for correct results and was able to reliable reproduce.

          Looking at the logs i realized that in the refinement requests "stats=false" was explicitly set, and traced this to an optimization in StatsComponent – it was assuming that only the initial (PURPOSE_GET_TOP_IDS) shard request needed stats computed, so i modified that to recognize (PURPOSE_REFINE_PIVOT_FACETS) as another situation where we need to leave stats=true

          This gets the tests to pass (still hammering on TestCloudPivot, but so far looks good) but i'm not really liking this solution, for reasons i noted in a StatsComponent nocommi comment...

          // nocommit: PURPOSE_REFINE_PIVOT_FACETS by itself shouldn't be enough for this...
          //
          // we need to check if the pivots actually have stats hanging off of them,
          // if they don't then we still should supress the stats param
          // (no need to compute the top level stats over and over)
          //
          // actually ... even if we do have stats hanging off of pivots,
          // we need to make stats component smart enough not to waste time re-computing
          // top level stats on every refinement request.
          //
          // so maybe StatsCOmponent should be left alone, and FacetComponent's prepare method 
          // should be modified so that *if* isShard && there are pivot refine params && those 
          // pivots have stats, then set some variable so that stats logic happens even if stats=false?
          

          I also made a few other various changes as i was reviewing the test (noted below).

          My plan is to move forward and continue reviewing more of the patch, starting with the other tests, and then dig into the code changes – writting additional test cases if/when i notice things that looks like they may not be adequately covered – and then come back and revist the question of the "stats=false" during refinement requests later (i'm certainly open to suggestions)

          changes in this iteration of the patch
          • TestCloudPivots
            • removed the bogus param "cleanup" vitaliy mentioned
            • added some nocommits as reminders for the future
          • DistributedFacetPivotLongTailTest
            • refactored query + assertFieldStats into "doTestDeepPivotStats"
              • only called once, and wasn't general in anyway - only usable for checking one specific query
            • renamed "foo_i" to "stat_i" so it's a bit more obvious why that field is there
            • added stat_i to some long tail docs & updated the existing stats assertions
            • modified existing query that required refinement to show stats aren't correct
              • "bbb0" on shard2 only gets included with refinement, but the "min" stat trivially demonstrates that the "-1" from shard2 isn't included
          • StatsComponent
            • check for PURPOSE_REFINE_PIVOT_FACETS in modifyRequest.

          NOTE: This patch is significantly smaller then the last one because i generated it using "svn diff -x --ignore-all-space" to supress a bunch of small formatted changes from the previous patches.

          Show
          Hoss Man added a comment - It looks like Vitaliy's new code doesn't account for stats returned by a shard in response to refinement requests. I updated DistributedFacetPivotLongTailTest to check stats on a request where refinement was required for correct results and was able to reliable reproduce. Looking at the logs i realized that in the refinement requests "stats=false" was explicitly set, and traced this to an optimization in StatsComponent – it was assuming that only the initial (PURPOSE_GET_TOP_IDS) shard request needed stats computed, so i modified that to recognize (PURPOSE_REFINE_PIVOT_FACETS) as another situation where we need to leave stats=true This gets the tests to pass (still hammering on TestCloudPivot, but so far looks good) but i'm not really liking this solution, for reasons i noted in a StatsComponent nocommi comment... // nocommit: PURPOSE_REFINE_PIVOT_FACETS by itself shouldn't be enough for this... // // we need to check if the pivots actually have stats hanging off of them, // if they don't then we still should supress the stats param // (no need to compute the top level stats over and over) // // actually ... even if we do have stats hanging off of pivots, // we need to make stats component smart enough not to waste time re-computing // top level stats on every refinement request. // // so maybe StatsCOmponent should be left alone, and FacetComponent's prepare method // should be modified so that *if* isShard && there are pivot refine params && those // pivots have stats, then set some variable so that stats logic happens even if stats=false? I also made a few other various changes as i was reviewing the test (noted below). My plan is to move forward and continue reviewing more of the patch, starting with the other tests, and then dig into the code changes – writting additional test cases if/when i notice things that looks like they may not be adequately covered – and then come back and revist the question of the "stats=false" during refinement requests later (i'm certainly open to suggestions) changes in this iteration of the patch TestCloudPivots removed the bogus param "cleanup" vitaliy mentioned added some nocommits as reminders for the future DistributedFacetPivotLongTailTest refactored query + assertFieldStats into "doTestDeepPivotStats" only called once, and wasn't general in anyway - only usable for checking one specific query renamed "foo_i" to "stat_i" so it's a bit more obvious why that field is there added stat_i to some long tail docs & updated the existing stats assertions modified existing query that required refinement to show stats aren't correct "bbb0" on shard2 only gets included with refinement, but the "min" stat trivially demonstrates that the "-1" from shard2 isn't included StatsComponent check for PURPOSE_REFINE_PIVOT_FACETS in modifyRequest. NOTE: This patch is significantly smaller then the last one because i generated it using "svn diff -x --ignore-all-space" to supress a bunch of small formatted changes from the previous patches.
          Hide
          Hoss Man added a comment -

          ...we need to make stats component smart enough not to waste time re-computing top level stats on every refinement request.

          actually, thinking about it a bit more - with my change to StatsComponent, it's is probably even worse then wasting work on the shards – since i've set the purpose to include PURPOSE_GET_STATS, StatsComponent on the coordinator node is going to merge those top-level stats in again every time a pivot refinement response is returned to the coordinator, so the top level stats will be totally wrong. ... gotta make sure our "final" fix accounts for that as well.

          Show
          Hoss Man added a comment - ...we need to make stats component smart enough not to waste time re-computing top level stats on every refinement request. actually, thinking about it a bit more - with my change to StatsComponent, it's is probably even worse then wasting work on the shards – since i've set the purpose to include PURPOSE_GET_STATS, StatsComponent on the coordinator node is going to merge those top-level stats in again every time a pivot refinement response is returned to the coordinator, so the top level stats will be totally wrong. ... gotta make sure our "final" fix accounts for that as well.
          Hide
          Hoss Man added a comment -

          small bits of progress on the patch review...

          • PivotField
            • fixed backcompt with constructor
            • simplified write method & fixed NPE risk + indenting
          • DistributedFacetPivotLargeTest
            • refactored query + assertFieldStats into "doTestDeepPivotStats"
              • only called once, and wasn't general in anyway - only usable for checking one specific query
            • made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs
          • DistributedFacetPivotSmallTest
            • simplified ComparablePivotField
              • it isn't used with stat assertions, and didn't check stats when it was used, so it doesn't need to know about stats in it's constructor
            • refactored query + assertFieldStats into "doTestDeepPivotStats"
              • only called once, and wasn't general in anyway - only usable for checking one specific query
              • simplified params (no need to muck with setDistributedParams
            • made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs
          • DistributedFacetPivotLongTailTest
            • did the match to fill in the expected values of the remaining nocommits
            • made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs
          Show
          Hoss Man added a comment - small bits of progress on the patch review... PivotField fixed backcompt with constructor simplified write method & fixed NPE risk + indenting DistributedFacetPivotLargeTest refactored query + assertFieldStats into "doTestDeepPivotStats" only called once, and wasn't general in anyway - only usable for checking one specific query made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs DistributedFacetPivotSmallTest simplified ComparablePivotField it isn't used with stat assertions, and didn't check stats when it was used, so it doesn't need to know about stats in it's constructor refactored query + assertFieldStats into "doTestDeepPivotStats" only called once, and wasn't general in anyway - only usable for checking one specific query simplified params (no need to muck with setDistributedParams made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs DistributedFacetPivotLongTailTest did the match to fill in the expected values of the remaining nocommits made the epsilon used in assertEqual(double) a tiny float (0.1E-7) instead of a large float, so we don't overlook any bugs
          Hide
          Hoss Man added a comment -

          quick question for Vitaliy Zhovtyuk: in an earlier comment, you mentioned creating a new "FacetPivotSmallTest.java" test based on my earlier outline of how to move forward – but in later patches that test wasn't included. was there a reason for removing it ? or was that just an oversight when generating the later patches? Can we resurect that test from some of your earlier patches?


          Been focusing on TestCloudPivotFacet. One particularly larg change to note...

          After doing a bit of refactoring of some stuff i've mentioned in previous comments, i realized that buildRandomPivotStatsFields & buildStatsTagString didn't really make a lot of sense – notably due to some leftover cut/paste comment cruft. digging in a bit, i realized that the way buildStatsTagString was being called, we were only ever using the first stats tag w/ the first facet.pivot – and obliterating the second pivot from the params if there was one.

          So i ripped those methods out, and re-vamped the way the random stats.field params were generated, and how the tags were associated with the pivots using some slightly diff helper methods.

          changes since last patch
          • TestCloudPivotFacet
            • moved stats & stats.field params from pivotP to baseP
              • this simplified a lot of request logic in assertPivotStats
            • buildRandomPivotStatsFields & buildStatsTagString
              • replaced with simpler logic via pickRandomStatsFields & buildPivotParamValue
                • this uncovered an NPE in StatsInfo.getStatsFieldsByTag (see below)
              • fixed pickRandomStatsFields to include strings (not sure why they were excluded - string stats work fine in StatsComponent)
            • assertPivotStats
              • switched from one verification query per stat field, to a single verification query that loops over each stats
              • shortened some variable names & simplified assert msgs
              • added some hack-ish sanity checks on which stats were found for each pivot
            • assertPivotData
              • wrapped up both assertNumFound & assertPivotStats so that a single query is executed and then each of those methods validates the data in the response that they care about
            • added "assertDoubles()" and "sanityCheckAssertDoubles()"
              • hammering on the test lead to a situation where stddev's were off due to double rounding because of the order that the sumOfQuares were accumulated from each shared (expected:<2.3005390038169265E9> but was:<2.300539003816927E9>)
              • so i added a helper method to compare these types of stats with a "small" epsilon relative to the size of the expected value, and a simple sanity checker to test-the-test.
          • QueryResponse
            • refactored & tightened up the pivot case statement a bit to assert on unexpected keys or value types
          • StatsComponent
            • fixed NPE in StatsInfo.getStatsFieldsByTag - if someone asks for "facet.pivot={!stats=bogus}foo" (where 'bogus' is not a valid tag on a stats.field) that was causing an NPE - should be ignored just like ex=bogus
          Show
          Hoss Man added a comment - quick question for Vitaliy Zhovtyuk : in an earlier comment, you mentioned creating a new "FacetPivotSmallTest.java" test based on my earlier outline of how to move forward – but in later patches that test wasn't included. was there a reason for removing it ? or was that just an oversight when generating the later patches? Can we resurect that test from some of your earlier patches? Been focusing on TestCloudPivotFacet. One particularly larg change to note... After doing a bit of refactoring of some stuff i've mentioned in previous comments, i realized that buildRandomPivotStatsFields & buildStatsTagString didn't really make a lot of sense – notably due to some leftover cut/paste comment cruft. digging in a bit, i realized that the way buildStatsTagString was being called, we were only ever using the first stats tag w/ the first facet.pivot – and obliterating the second pivot from the params if there was one. So i ripped those methods out, and re-vamped the way the random stats.field params were generated, and how the tags were associated with the pivots using some slightly diff helper methods. changes since last patch TestCloudPivotFacet moved stats & stats.field params from pivotP to baseP this simplified a lot of request logic in assertPivotStats buildRandomPivotStatsFields & buildStatsTagString replaced with simpler logic via pickRandomStatsFields & buildPivotParamValue this uncovered an NPE in StatsInfo.getStatsFieldsByTag (see below) fixed pickRandomStatsFields to include strings (not sure why they were excluded - string stats work fine in StatsComponent) assertPivotStats switched from one verification query per stat field, to a single verification query that loops over each stats shortened some variable names & simplified assert msgs added some hack-ish sanity checks on which stats were found for each pivot assertPivotData wrapped up both assertNumFound & assertPivotStats so that a single query is executed and then each of those methods validates the data in the response that they care about added "assertDoubles()" and "sanityCheckAssertDoubles()" hammering on the test lead to a situation where stddev's were off due to double rounding because of the order that the sumOfQuares were accumulated from each shared (expected:<2.3005390038169265E9> but was:<2.300539003816927E9>) so i added a helper method to compare these types of stats with a "small" epsilon relative to the size of the expected value, and a simple sanity checker to test-the-test. QueryResponse refactored & tightened up the pivot case statement a bit to assert on unexpected keys or value types StatsComponent fixed NPE in StatsInfo.getStatsFieldsByTag - if someone asks for " facet.pivot={!stats=bogus}foo " (where 'bogus' is not a valid tag on a stats.field) that was causing an NPE - should be ignored just like ex=bogus
          Hide
          Hoss Man added a comment -

          ...Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy?...

          I spun this off into it's own issue: SOLR-6663

          For now, i think we should focus on requiring example one tag value in the "stats" local param – i incorporated that into some of the code/tests as i was reviewing...

          changes since last patch
          • PivotFacetProcessor
            • brought back getSubsetSize - it gives us a nice optimization in the cases where the actual subset isn't needed.
            • getStatsFields
              • renamed getTaggedStatsFields so it's a little more clear it's a subset relative this pivot
              • made static so it's use of localparams wasn't soo obtuse (i hate how much "stateful" variables there are in SimpleFacets)
              • added error throwing if/when user specifies multiple tags seperated by commas since we want to reserve for now and may use it for controlling where in the pivot tree stats are computed (SOLR-6663)
              • javadocs
            • doPivots
              • refactored the new stats logic to all be in one place, and not to do anything (or allocate any extra objects) unless there are docs to compute stats over, and stats to compute for those docs
          • SolrExampleTests
            • merged testPivotFacetsStatsParsed into testPivotFacetsStats
              • testPivotFacetsStats didn't need any distinct setup from what testPivotFacetsStatsParsed, so i just moved the querys/assertions done by testPivotFacetsStats into testPivotFacetsStatsParsed and simplified the name
              • removed the SOLR-6349 style local params from the stats.fields – not supported yet, and if/when it is this test asserts that more stats are there then what those params were asking for, so we don't want it to suddenly break in the future.
              • enhance the test a bit to sanity check these assertions still pass even when extra levels of pivots are requested.
            • merge testPivotFacetsStatsNotSupportedBoolean + testPivotFacetsStatsNotSupportedString => testPivotFacetsStatsNotSupported
              • "String" was not accurate, TextField is more specific
              • also simplified the setup - no need for so many docs, and no need for it to be diff between the two diff checks
              • added ignoreException – the junit logs shouldn't misslead the user when an exception is expected
              • added some additional assertions about the error messages
              • added another assertion if multiple tags seperated by commas (SOLR-6663)
          • TestCloudPivots
            • replaced nocommit comment with a comment refering to SOLR-6663
          Show
          Hoss Man added a comment - ...Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy?... I spun this off into it's own issue: SOLR-6663 For now, i think we should focus on requiring example one tag value in the "stats" local param – i incorporated that into some of the code/tests as i was reviewing... changes since last patch PivotFacetProcessor brought back getSubsetSize - it gives us a nice optimization in the cases where the actual subset isn't needed. getStatsFields renamed getTaggedStatsFields so it's a little more clear it's a subset relative this pivot made static so it's use of localparams wasn't soo obtuse (i hate how much "stateful" variables there are in SimpleFacets) added error throwing if/when user specifies multiple tags seperated by commas since we want to reserve for now and may use it for controlling where in the pivot tree stats are computed ( SOLR-6663 ) javadocs doPivots refactored the new stats logic to all be in one place, and not to do anything (or allocate any extra objects) unless there are docs to compute stats over, and stats to compute for those docs SolrExampleTests merged testPivotFacetsStatsParsed into testPivotFacetsStats testPivotFacetsStats didn't need any distinct setup from what testPivotFacetsStatsParsed, so i just moved the querys/assertions done by testPivotFacetsStats into testPivotFacetsStatsParsed and simplified the name removed the SOLR-6349 style local params from the stats.fields – not supported yet, and if/when it is this test asserts that more stats are there then what those params were asking for, so we don't want it to suddenly break in the future. enhance the test a bit to sanity check these assertions still pass even when extra levels of pivots are requested. merge testPivotFacetsStatsNotSupportedBoolean + testPivotFacetsStatsNotSupportedString => testPivotFacetsStatsNotSupported "String" was not accurate, TextField is more specific also simplified the setup - no need for so many docs, and no need for it to be diff between the two diff checks added ignoreException – the junit logs shouldn't misslead the user when an exception is expected added some additional assertions about the error messages added another assertion if multiple tags seperated by commas ( SOLR-6663 ) TestCloudPivots replaced nocommit comment with a comment refering to SOLR-6663
          Hide
          Vitaliy Zhovtyuk added a comment -

          Restored FacetPivotSmallTest, was lost between patches.
          Added distributed test org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTestcovering 3additional cases
          1. Getting pivot stats in string stats field
          2. Getting top level stats on pivot stats
          3. Pivot stats on each shard are not the same

          Added getter to check stats values presence on org.apache.solr.handler.component.PivotFacetValue#getStatsValues
          Whitebox test assertions are not yet completed. Still working on it.

          Show
          Vitaliy Zhovtyuk added a comment - Restored FacetPivotSmallTest, was lost between patches. Added distributed test org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTestcovering 3additional cases 1. Getting pivot stats in string stats field 2. Getting top level stats on pivot stats 3. Pivot stats on each shard are not the same Added getter to check stats values presence on org.apache.solr.handler.component.PivotFacetValue#getStatsValues Whitebox test assertions are not yet completed. Still working on it.
          Hide
          Hoss Man added a comment -

          My main focus the last day or so has been reviewing PivotFacetHelper & PivotFacetValue with an eye towards simplifying the amount of redundent code between them and StatsComponent. Some details posted below but one key thing i wanted to point out...

          Even as (relatively) familiar as i am with the exsting Pivot code, it took me a long time to understand how PivotFacetHelper.getStats + PivotListEntry.STATS were working in the case of leaf level pivot values – short answer: PivotFacetHelper.getStats totally ignores the Enum value of PivotListEntry.STATS and uses "0" (something PivotFacetHelper.getPivots also does that i've never noticed before).

          Given that we plan to add more data to pivots in issues like SOLR-4212 & SOLR-6353, i really wanted to come up with a pattern for dealing with this that was less likeely to trip people up when looking at the code.

          Changes in this patch
          • StatsComponent
            • refactored out tiny little reusable "unwrapStats" utility
            • refactored out reusable "convertToResponse" utility
              • i was hoping this would help encapsulate & simplify the way the count==0 rules are applied, to make top level consistent with pivots, but that lead me down a rabbit hole of pain as far as testing and backcompat and solrj - so i just captured it in a 'force' method param.
              • But at least now, the method is consistently called everywhere that outputs stats, so if/when we change the rules for how "empty" stats are returned (see comments in SOLR-6349) we won't need to audit/change multiple pieces of code, we can just focus on callers of this method
            • Added a StatsInfo.getStatsField(key) method for use by PivotFacetHelper.mergeStats so it wouldn't need to constantly loop over every possible stats.field
          • PivotFacetValue
            • removed an unneccessary level of wrapping arround the Map<String,StatsValues>
            • switched to using StatsComponent.convertToResponse directly instead of PivotFacetHelper.convertStatsValuesToNamedList
          • PivotListEntry
            • renamed "index" to "minIndex"
            • added an extract method that knows how to correctly deal with the diff between "optional" entries that may exist starting at the minIndex, and mandatory entires (field,value,count) that must exist at the expected index.
          • PivotFacetHelper
            • changed the various "getFoo" methods to use PivotListEntry.FOO.extract
              • these methods now exact mainly just for convinience with the Object casting
              • this also ment the "retrieve" method could be removed
            • simplified mergeStats via:
              • StatsComponent.unwrapStats
              • StatsInfo.getStatsField
            • mergeStats javadocs
            • removed convertStatsValuesToNamedList
          • PivotFacetProcessor
            • switch using StatsComponent.convertToResponse
          • TestCloudPivots
            • update nocommit comment regarding 'null' actualStats based on pain encountered working on StastComponent.convertToResponse
              • added some more sanity check assertions in this case as well
          • DistributedFacetPivotSmallTest
            • added doTestPivotStatsFromOneShard to account for an edge case in merging that occured to me while reviewing PivotFacetHelper.mergeStats
              • this fails because of how +/-Infinity are treated as the min/max - i'll working on fixing this next
              • currently commented out + has some nocommits to beef up this test w/other types
          • merged my working changes with Vitaliy's additions (but have not yet actually reviewed the new tests)...
            • FacetPivotSmallTest
            • DistributedFacetPivotSmallAdvancedTest
            • PivotFacetValue.getStatsValues ... allthough it's not clear to me yet what purpose/value this adds?
          Show
          Hoss Man added a comment - My main focus the last day or so has been reviewing PivotFacetHelper & PivotFacetValue with an eye towards simplifying the amount of redundent code between them and StatsComponent. Some details posted below but one key thing i wanted to point out... Even as (relatively) familiar as i am with the exsting Pivot code, it took me a long time to understand how PivotFacetHelper.getStats + PivotListEntry.STATS were working in the case of leaf level pivot values – short answer: PivotFacetHelper.getStats totally ignores the Enum value of PivotListEntry.STATS and uses "0" (something PivotFacetHelper.getPivots also does that i've never noticed before). Given that we plan to add more data to pivots in issues like SOLR-4212 & SOLR-6353 , i really wanted to come up with a pattern for dealing with this that was less likeely to trip people up when looking at the code. Changes in this patch StatsComponent refactored out tiny little reusable "unwrapStats" utility refactored out reusable "convertToResponse" utility i was hoping this would help encapsulate & simplify the way the count==0 rules are applied, to make top level consistent with pivots, but that lead me down a rabbit hole of pain as far as testing and backcompat and solrj - so i just captured it in a 'force' method param. But at least now, the method is consistently called everywhere that outputs stats, so if/when we change the rules for how "empty" stats are returned (see comments in SOLR-6349 ) we won't need to audit/change multiple pieces of code, we can just focus on callers of this method Added a StatsInfo.getStatsField(key) method for use by PivotFacetHelper.mergeStats so it wouldn't need to constantly loop over every possible stats.field PivotFacetValue removed an unneccessary level of wrapping arround the Map<String,StatsValues> switched to using StatsComponent.convertToResponse directly instead of PivotFacetHelper.convertStatsValuesToNamedList PivotListEntry renamed "index" to "minIndex" added an extract method that knows how to correctly deal with the diff between "optional" entries that may exist starting at the minIndex, and mandatory entires (field,value,count) that must exist at the expected index. PivotFacetHelper changed the various "getFoo" methods to use PivotListEntry.FOO.extract these methods now exact mainly just for convinience with the Object casting this also ment the "retrieve" method could be removed simplified mergeStats via: StatsComponent.unwrapStats StatsInfo.getStatsField mergeStats javadocs removed convertStatsValuesToNamedList PivotFacetProcessor switch using StatsComponent.convertToResponse TestCloudPivots update nocommit comment regarding 'null' actualStats based on pain encountered working on StastComponent.convertToResponse added some more sanity check assertions in this case as well DistributedFacetPivotSmallTest added doTestPivotStatsFromOneShard to account for an edge case in merging that occured to me while reviewing PivotFacetHelper.mergeStats this fails because of how +/-Infinity are treated as the min/max - i'll working on fixing this next currently commented out + has some nocommits to beef up this test w/other types merged my working changes with Vitaliy's additions (but have not yet actually reviewed the new tests)... FacetPivotSmallTest DistributedFacetPivotSmallAdvancedTest PivotFacetValue.getStatsValues ... allthough it's not clear to me yet what purpose/value this adds?
          Hide
          Hoss Man added a comment -

          I started out focusing on fixing the +/-Infinity issue i mentioned yesterday, but while working on beefing up the asserts for this in DistributedFacetPivotSmallTest i realized the PivotField.getStatsInfos() method was returning List<FieldStatsInfo> which didn't really make a lot of sense - so i changed that to return Map<String,FieldStatsInfo> and tweaked the method name (consistent with how top level stats method works/is-named) and updated all of the affected code/tests as well.

          summay of patch changes
          • StatsValuesFactory
            • fixed NumericStatsValues so that min/max are 'null' when no values accumulated
          • changed PivotField.getStatsInfos() to getFieldStatsInfos() and made it return a map – updated the following classes...
            • PivotField
            • QueryResponse
            • SolrExampleTests
            • TestCloudPivotFacet
            • DistributedFacetPivotSmallTest
            • DistributedFacetPivotLargeTest
            • DistributedFacetPivotLongTailTest
            • DistributedFacetPivotSmallAdvancedTest
          • DistributedFacetPivotSmallTest
            • fixed some +/-Infinity assumptions based on NumericStatsValues fixes
            • beefed up testing of diff data types in doTestPivotStatsFromOneShard
          • FacetPivotSmallTest
            • fixed some +/-Infinity assumptions based on NumericStatsValues fixes
          Show
          Hoss Man added a comment - I started out focusing on fixing the +/-Infinity issue i mentioned yesterday, but while working on beefing up the asserts for this in DistributedFacetPivotSmallTest i realized the PivotField.getStatsInfos() method was returning List<FieldStatsInfo> which didn't really make a lot of sense - so i changed that to return Map<String,FieldStatsInfo> and tweaked the method name (consistent with how top level stats method works/is-named) and updated all of the affected code/tests as well. summay of patch changes StatsValuesFactory fixed NumericStatsValues so that min/max are 'null' when no values accumulated changed PivotField.getStatsInfos() to getFieldStatsInfos() and made it return a map – updated the following classes... PivotField QueryResponse SolrExampleTests TestCloudPivotFacet DistributedFacetPivotSmallTest DistributedFacetPivotLargeTest DistributedFacetPivotLongTailTest DistributedFacetPivotSmallAdvancedTest DistributedFacetPivotSmallTest fixed some +/-Infinity assumptions based on NumericStatsValues fixes beefed up testing of diff data types in doTestPivotStatsFromOneShard FacetPivotSmallTest fixed some +/-Infinity assumptions based on NumericStatsValues fixes
          Hide
          Hoss Man added a comment -

          review & some updates to the the latest tests Vitaliy added.

          On monday i'll dig into how we're dealing with stas in pivot refinements (the last remaining "nocommit" ... i can't understand how/why
          DistributedFacetPivotSmallAdvancedTest.doTestTopStatsWithRefinement passes in this patch, and that scares me (the refinement call should pollute the top level stats, double counting at least one shard) so i really want to get to the bottom of it.

          patch changes
          • FacetPivotSmallTest
            • new testBogusStatsTag: check that a bogus stats tag is ignored
            • new testStatsTagHasComma: check that comma causes error for now (SOLR-6663)
          • SolrExampleTests
            • swap two comments that were on the wrong asserts
          • DistributedFacetPivotSmallAdvancedTest
            • cleaned up some formatting
            • doTestTopStats
              • beefed up the assertions
              • added sanity checks that refinement really was happening in these requests
              • suprisingly this test still passes ... based on the "nocommit" hack i put in StatsComponent, I was farily certain this was going to fail ... i need to dig in more and make sure i understand why it doesn't fail.
              • renamed to doTestTopStatsWithRefinement to make it a bit more clear what it's doing
            • doTestDeepPivotStatsOnOverrequest
              • removed this test - i wasn't really clear on the purpose, and in asking Vitaliy about it on IRC we realized he had missunderstood some advice i'd given him on IRC a few days ago about how to "sanity check" that refinement was happening (ie: what i just added to doTestTopStatsWithRefinement) so it wasn't given us any value add.
          Show
          Hoss Man added a comment - review & some updates to the the latest tests Vitaliy added. On monday i'll dig into how we're dealing with stas in pivot refinements (the last remaining "nocommit" ... i can't understand how/why DistributedFacetPivotSmallAdvancedTest.doTestTopStatsWithRefinement passes in this patch, and that scares me (the refinement call should pollute the top level stats, double counting at least one shard) so i really want to get to the bottom of it. patch changes FacetPivotSmallTest new testBogusStatsTag: check that a bogus stats tag is ignored new testStatsTagHasComma: check that comma causes error for now ( SOLR-6663 ) SolrExampleTests swap two comments that were on the wrong asserts DistributedFacetPivotSmallAdvancedTest cleaned up some formatting doTestTopStats beefed up the assertions added sanity checks that refinement really was happening in these requests suprisingly this test still passes ... based on the "nocommit" hack i put in StatsComponent, I was farily certain this was going to fail ... i need to dig in more and make sure i understand why it doesn't fail. renamed to doTestTopStatsWithRefinement to make it a bit more clear what it's doing doTestDeepPivotStatsOnOverrequest removed this test - i wasn't really clear on the purpose, and in asking Vitaliy about it on IRC we realized he had missunderstood some advice i'd given him on IRC a few days ago about how to "sanity check" that refinement was happening (ie: what i just added to doTestTopStatsWithRefinement) so it wasn't given us any value add.
          Hide
          Vitaliy Zhovtyuk added a comment -

          Added "whitebox" DistributedFacetPivotWhiteBoxTest test simulating pivot stats shard requests in cases: get top level pivots and refinement requests. Both contains stats on pivots.

          Show
          Vitaliy Zhovtyuk added a comment - Added "whitebox" DistributedFacetPivotWhiteBoxTest test simulating pivot stats shard requests in cases: get top level pivots and refinement requests. Both contains stats on pivots.
          Hide
          Hoss Man added a comment -

          ...i can't understand how/why DistributedFacetPivotSmallAdvancedTest.doTestTopStatsWithRefinement passes in this patch, and that scares me (the refinement call should pollute the top level stats, double counting at least one shard) so i really want to get to the bottom of it.

          After doing a bunch of manual testing to confirm that the bug really did exist, i started pouring over the test logs and confirmed there was a stupid bug in the test itself...

          // i had...
          ModifiableSolrParams facetForceRefineParams = new ModifiableSolrParams(coreParams);
          
          // should have been...
          ModifiableSolrParams facetForceRefineParams = new ModifiableSolrParams(facetParams);
          

          ...so in the request where i was trying to force facet refinement by adding FacetParams.FACET_OVERREQUEST_COUNT=0 to the params, i wasn't even faceting at all.

          Once that silly mistake was fixed, the test started failing as expected.

          I then beefed up Vitaliy's new DistributedFacetPivotWhiteBoxTest to include some asserts on the top level stats, and confirmed those also failed because we're re-computing them on both the initial request, and the subsequent refinement request.

          Once i had those tests failing in a satisfactory way, finding a solution was fairly straight forward: stop including stats=true in pivot refinement requests, and instead make PivotFacetProcessor recognise when it needs to build up a StatsInfo object to handle refinement requests (rather then relying on StatsComponent to do it for us).

          I think this is basically ready to commit - i'm going to give the patch & issue comments another going over tomorrow to see if anything pops out at me while my laptop hammers away at the tests, but if no one has any other feedback i'll commit to trunk & start backporting.

          summary of changes
          • DistributedFacetPivotSmallAdvancedTest
            • fixed broken test of top level stats
            • added some assertion msg strings
          • StatsComponent
            • removed the last remaining "nocommit" hack that set stats=true on pivot refinements
          • PivotFacetProcessor
            • changed process() method:
              • init a StatsInfo object on demand in cases where it's a refinement request & there is a stats local param
              • call getTaggedStatsFields() to build up the List<StatsField> once per facet.pivot param (was being redundently called once per refinement value in old code)
            • changed processSingle() to take in the List<StatsField> as a method param
            • tweaked getTaggedStatsFields() method to take in just hte stats local param instead of the full set of localparams – simplified the parsing & error handling
          • DistributedFacetPivotWhiteBoxTest
            • beefed up test to check top level stats
            • tweaked refine params to set stats=false to match what the code now does
          • TestCloudPivotFacet
            • fixed javadoc (lint error)
          • PivotField
            • fixed javadoc (lint error)
          Show
          Hoss Man added a comment - ...i can't understand how/why DistributedFacetPivotSmallAdvancedTest.doTestTopStatsWithRefinement passes in this patch, and that scares me (the refinement call should pollute the top level stats, double counting at least one shard) so i really want to get to the bottom of it. After doing a bunch of manual testing to confirm that the bug really did exist, i started pouring over the test logs and confirmed there was a stupid bug in the test itself... // i had... ModifiableSolrParams facetForceRefineParams = new ModifiableSolrParams(coreParams); // should have been... ModifiableSolrParams facetForceRefineParams = new ModifiableSolrParams(facetParams); ...so in the request where i was trying to force facet refinement by adding FacetParams.FACET_OVERREQUEST_COUNT=0 to the params, i wasn't even faceting at all. Once that silly mistake was fixed, the test started failing as expected. I then beefed up Vitaliy's new DistributedFacetPivotWhiteBoxTest to include some asserts on the top level stats, and confirmed those also failed because we're re-computing them on both the initial request, and the subsequent refinement request. Once i had those tests failing in a satisfactory way, finding a solution was fairly straight forward: stop including stats=true in pivot refinement requests, and instead make PivotFacetProcessor recognise when it needs to build up a StatsInfo object to handle refinement requests (rather then relying on StatsComponent to do it for us). I think this is basically ready to commit - i'm going to give the patch & issue comments another going over tomorrow to see if anything pops out at me while my laptop hammers away at the tests, but if no one has any other feedback i'll commit to trunk & start backporting. summary of changes DistributedFacetPivotSmallAdvancedTest fixed broken test of top level stats added some assertion msg strings StatsComponent removed the last remaining "nocommit" hack that set stats=true on pivot refinements PivotFacetProcessor changed process() method: init a StatsInfo object on demand in cases where it's a refinement request & there is a stats local param call getTaggedStatsFields() to build up the List<StatsField> once per facet.pivot param (was being redundently called once per refinement value in old code) changed processSingle() to take in the List<StatsField> as a method param tweaked getTaggedStatsFields() method to take in just hte stats local param instead of the full set of localparams – simplified the parsing & error handling DistributedFacetPivotWhiteBoxTest beefed up test to check top level stats tweaked refine params to set stats=false to match what the code now does TestCloudPivotFacet fixed javadoc (lint error) PivotField fixed javadoc (lint error)
          Hide
          Hoss Man added a comment -

          small API tweak...

          • renamed PivotField.getFieldStatsInfo_s_() to PivotField.getFieldStatsInfo()
            • this makes it consistent with the same method name in QueryResponse
            • updated all callers

          ..planning to commit to trunk a little later today.

          Show
          Hoss Man added a comment - small API tweak... renamed PivotField.getFieldStatsInfo_s_() to PivotField.getFieldStatsInfo() this makes it consistent with the same method name in QueryResponse updated all callers ..planning to commit to trunk a little later today.
          Hide
          ASF subversion and git services added a comment -

          Commit 1636772 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1636772 ]

          SOLR-6351: Stats can now be nested under pivot values by adding a 'stats' local param

          Show
          ASF subversion and git services added a comment - Commit 1636772 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1636772 ] SOLR-6351 : Stats can now be nested under pivot values by adding a 'stats' local param
          Hide
          Hoss Man added a comment -

          FWIW: backport to 5x was completely painless – no problems from "ant precommit" and all tests pass.

          Assuming jenkins doesn't find any problems on trunk I'll plan to commit the backport sometime on thursday.

          Show
          Hoss Man added a comment - FWIW: backport to 5x was completely painless – no problems from "ant precommit" and all tests pass. Assuming jenkins doesn't find any problems on trunk I'll plan to commit the backport sometime on thursday.
          Hide
          ASF subversion and git services added a comment -

          Commit 1637204 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1637204 ]

          SOLR-6351: Stats can now be nested under pivot values by adding a 'stats' local param (merge r1636772)

          Show
          ASF subversion and git services added a comment - Commit 1637204 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1637204 ] SOLR-6351 : Stats can now be nested under pivot values by adding a 'stats' local param (merge r1636772)
          Hide
          ASF subversion and git services added a comment -

          Commit 1637248 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1637248 ]

          SOLR-6351: fix CHANGES, forgot to credit Steve Molloy

          Show
          ASF subversion and git services added a comment - Commit 1637248 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1637248 ] SOLR-6351 : fix CHANGES, forgot to credit Steve Molloy
          Hide
          ASF subversion and git services added a comment -

          Commit 1637249 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1637249 ]

          SOLR-6351: fix CHANGES, forgot to credit Steve Molloy (merge r1637248)

          Show
          ASF subversion and git services added a comment - Commit 1637249 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1637249 ] SOLR-6351 : fix CHANGES, forgot to credit Steve Molloy (merge r1637248)
          Hide
          Hoss Man added a comment -

          backported to 5x earlier today, and updated the ref guide...

          https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-CombiningStatsComponentWithPivots
          https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-TheStatsComponentandFaceting

          ...calling this done.

          Big thanks to Vitaliy & Steve for their contributions on this.

          Show
          Hoss Man added a comment - backported to 5x earlier today, and updated the ref guide... https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-CombiningStatsComponentWithPivots https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-TheStatsComponentandFaceting ...calling this done. Big thanks to Vitaliy & Steve for their contributions on this.
          Hide
          Ashish Shrowty added a comment -

          Hi all,

          Does this allow for requesting only specific stats such as an mean or countDistinct? For my use case, I don't really care about the actual distinct values. Is this what SOLR-6349 is about?

          Also, any idea when 5.0 is going to go out? This patch would save a lot of code I would need to write myself.

          Thanks for your help!

          -Ashish

          Show
          Ashish Shrowty added a comment - Hi all, Does this allow for requesting only specific stats such as an mean or countDistinct? For my use case, I don't really care about the actual distinct values. Is this what SOLR-6349 is about? Also, any idea when 5.0 is going to go out? This patch would save a lot of code I would need to write myself. Thanks for your help! -Ashish
          Hide
          Xu Zhang added a comment -

          Solr-6349 is what you asking for.

          Show
          Xu Zhang added a comment - Solr-6349 is what you asking for.
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.
          Hide
          Pablo Anzorena added a comment -

          Is there any way of requesting the facets order by the sum of price e.g.?

          Show
          Pablo Anzorena added a comment - Is there any way of requesting the facets order by the sum of price e.g.?
          Hide
          Pablo Anzorena added a comment -

          Is there any way of requesting limit 10 order by the sum of price e.g. within pivots?

          I know that in the json.facet I can do this, but it has a problem of consistency when querying across multiple shards.
          And given that pivot facets now supports distributed searching, I tried to make a similar request, but couldn't find how to do it.

          Thanks in advance!

          Show
          Pablo Anzorena added a comment - Is there any way of requesting limit 10 order by the sum of price e.g. within pivots? I know that in the json.facet I can do this, but it has a problem of consistency when querying across multiple shards. And given that pivot facets now supports distributed searching, I tried to make a similar request, but couldn't find how to do it. Thanks in advance!
          Hide
          Erick Erickson added a comment -

          Please raise questions like this on the user's list rather than JIRAs, we try to reserve JIRA entries for code issues rather than usage questions.

          Show
          Erick Erickson added a comment - Please raise questions like this on the user's list rather than JIRAs, we try to reserve JIRA entries for code issues rather than usage questions.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development