Solr
  1. Solr
  2. SOLR-2894

Implement distributed pivot faceting

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, Trunk
    • Component/s: None
    • Labels:
      None

      Description

      Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented.

      1. SOLR-2894.patch
        94 kB
        Dan Cooper
      2. SOLR-2894.patch
        94 kB
        Erik Hatcher
      3. SOLR-2894-reworked.patch
        47 kB
        Dzmitry Zhemchuhou
      4. SOLR-2894.patch
        49 kB
        Chris Russell
      5. SOLR-2894.patch
        51 kB
        Chris Russell
      6. SOLR-2894.patch
        60 kB
        Chris Russell
      7. SOLR-2894.patch
        60 kB
        Chris Russell
      8. SOLR-2894.patch
        141 kB
        Andrew Muldowney
      9. SOLR-2894.patch
        101 kB
        Andrew Muldowney
      10. SOLR-2894.patch
        98 kB
        Andrew Muldowney
      11. SOLR-2894.patch
        106 kB
        Andrew Muldowney
      12. SOLR-2894.patch
        109 kB
        Andrew Muldowney
      13. SOLR-2894.patch
        113 kB
        Andrew Muldowney
      14. SOLR-2894.patch
        116 kB
        Andrew Muldowney
      15. SOLR-2894.patch
        135 kB
        Andrew Muldowney
      16. SOLR-2894.patch
        127 kB
        Brett Lucey
      17. SOLR-2894.patch
        94 kB
        Brett Lucey
      18. SOLR-2894.patch
        101 kB
        Brett Lucey
      19. dateToObject.patch
        2 kB
        Elran Dvir
      20. SOLR-2894.patch
        102 kB
        Brett Lucey
      21. SOLR-2894.patch
        102 kB
        Brett Lucey
      22. SOLR-2894.patch
        101 kB
        Mark Miller
      23. SOLR-2894.patch
        99 kB
        Hoss Man
      24. pivot_mincount_problem.sh
        305 kB
        Hoss Man
      25. SOLR-2894_cloud_test.patch
        14 kB
        Hoss Man
      26. SOLR-2894-mincount-minification.patch
        119 kB
        Andrew Muldowney
      27. SOLR-2894.patch
        119 kB
        Andrew Muldowney
      28. SOLR-2894.patch
        121 kB
        Hoss Man
      29. SOLR-2894.patch
        149 kB
        Andrew Muldowney
      30. SOLR-2894.patch
        128 kB
        Hoss Man
      31. SOLR-2894.patch
        138 kB
        Hoss Man
      32. SOLR-2894.patch
        154 kB
        Hoss Man
      33. SOLR-2894.patch
        157 kB
        Hoss Man
      34. SOLR-2894.patch
        158 kB
        Hoss Man
      35. SOLR-2894.patch
        160 kB
        Hoss Man
      36. SOLR-2894.patch
        167 kB
        Hoss Man
      37. SOLR-2894.patch
        175 kB
        Hoss Man
      38. SOLR-2894.patch
        182 kB
        Andrew Muldowney
      39. SOLR-2894.patch
        183 kB
        Andrew Muldowney
      40. SOLR-2894.patch
        214 kB
        Andrew Muldowney
      41. SOLR-2894.patch
        211 kB
        Hoss Man
      42. SOLR-2894.patch
        216 kB
        Hoss Man
      43. SOLR-2894.patch
        217 kB
        Hoss Man
      44. SOLR-2894.patch
        218 kB
        Hoss Man
      45. SOLR-2894.patch
        227 kB
        Hoss Man
      46. SOLR-2894.patch
        231 kB
        Hoss Man
      47. SOLR-2894.patch
        233 kB
        Hoss Man
      48. SOLR-2894.patch
        233 kB
        Hoss Man
      49. SOLR-2894.patch
        233 kB
        Hoss Man
      50. SOLR-2894.patch
        233 kB
        Hoss Man
      51. pivotfail.log
        531 kB
        Mark Miller
      52. 48.pivotfails.log.bz2
        6.28 MB
        Steve Rowe

        Issue Links

          Activity

          Hide
          Ben Roubicek added a comment -

          Based on SOLR-792, it looked like there was some traction in getting distributed pivoting in the trunk codebase beyond the functional prototype. This feature has a lot of value within my company where we perform 50 separate queries where one would suffice if we had distributed pivot support.

          Show
          Ben Roubicek added a comment - Based on SOLR-792 , it looked like there was some traction in getting distributed pivoting in the trunk codebase beyond the functional prototype. This feature has a lot of value within my company where we perform 50 separate queries where one would suffice if we had distributed pivot support.
          Hide
          Antoine Le Floc'h added a comment -

          Do you think that this will be available for Solr 4.0 ? I would think that this is very similar to distributing regular facets ?

          Show
          Antoine Le Floc'h added a comment - Do you think that this will be available for Solr 4.0 ? I would think that this is very similar to distributing regular facets ?
          Hide
          Dan Cooper added a comment -

          Added a patch to provide distributed pivot faceting. We've been running this code for a while now and it seems to work OK, also created a unit test to test distributed pivot faceting on a small set of data.

          The patch was created against Solr trunk revision 1297102.

          It should perform in much the same way as single shard pivot faceting. It only sorts by count if you specify that option otherwise it returns results in the order they were generated (may be useful is performance is important but ordering is not). Most will want to specify facet.sort=count. This patch also supports limiting results using facet.limit.

          To do the merge I'm converting the NamedList objects that get returned by each shard in a giant map (should be more efficient for merging the results) and then converting back into a NamedList when the merge is complete. This merge should support N depth pivots but I've only properly tested a depth of 2.

          I've added some new parameters to support the features we require from pivot faceting and thought they may as well go in the patch in case others find them useful.

          • facet.pivot.limit.method
            • set to 'combined' if you want only the N number of top results to be returned across all pivots, where N is set by facet.limit. e.g. if you pivoted by country,manufacturer and limited by 5, obviously the top 5 countries would be returned, but only the top 5 manufacturers by combined total would be returned too. e.g. Each country would return the same 5 manufacturers (or less if no results).
          • facet.pivot.limit.ignore
            • Ignores the specified field from the limiting operations. e.g. if you pivoted by country,manufacturer and limited by 5 and set facet.pivot.limit.ignore=country then you would get all available countries returned (not limited) but only 5 manufacturers for each country.

          Can someone test the patch and give some feedback?

          Show
          Dan Cooper added a comment - Added a patch to provide distributed pivot faceting. We've been running this code for a while now and it seems to work OK, also created a unit test to test distributed pivot faceting on a small set of data. The patch was created against Solr trunk revision 1297102. It should perform in much the same way as single shard pivot faceting. It only sorts by count if you specify that option otherwise it returns results in the order they were generated (may be useful is performance is important but ordering is not). Most will want to specify facet.sort=count. This patch also supports limiting results using facet.limit. To do the merge I'm converting the NamedList objects that get returned by each shard in a giant map (should be more efficient for merging the results) and then converting back into a NamedList when the merge is complete. This merge should support N depth pivots but I've only properly tested a depth of 2. I've added some new parameters to support the features we require from pivot faceting and thought they may as well go in the patch in case others find them useful. facet.pivot.limit.method set to 'combined' if you want only the N number of top results to be returned across all pivots, where N is set by facet.limit. e.g. if you pivoted by country,manufacturer and limited by 5, obviously the top 5 countries would be returned, but only the top 5 manufacturers by combined total would be returned too. e.g. Each country would return the same 5 manufacturers (or less if no results). facet.pivot.limit.ignore Ignores the specified field from the limiting operations. e.g. if you pivoted by country,manufacturer and limited by 5 and set facet.pivot.limit.ignore=country then you would get all available countries returned (not limited) but only 5 manufacturers for each country. Can someone test the patch and give some feedback?
          Hide
          Chris Russell added a comment - - edited

          Hi Dan.
          I have been working with your patch, 2894, to Solr and I am having some issues with unit testing.

          First of all the patch doesn't seem to apply cleanly:
          crussell@WAT-CRUSSELL /cygdrive/d/matrixdev/solr_1297102/CBSolr/SolrLucene
          $ patch -p0 -i SOLR-2894.patch
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #10 FAILED at 797.
          1 out of 16 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #2 FAILED at 106.
          1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/util/NamedListHelper.java
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          $ patch --version
          patch 2.5.8
          Copyright (C) 1988 Larry Wall
          Copyright (C) 2002 Free Software Foundation, Inc.

          A lot of the contents of your original patch seemed to be formatting changes like where line breaks should go and how spacing should be handled.
          I was able to examine the patch and manually incorporate the changes from the failed chunks.
          One question, on this line (1119) of your patch, why did you choose not to initialize the map as the ones above it are? Couldn't that cause an NRE?
          + public SimpleOrderedMap<List<NamedList<Object>>> pivotFacets;

          While looking at DistributedFacetPivotTest.java I noticed an error on line 42. Your "q" should probably be ":" instead of "*". Edit: asterisk colon asterisk instead of just asterisk.

          I've attached the patch file I came up with. I added the initialization and test correction I mentioned.

          When I ran the solr/lucene unit tests after I patched, there are some unit test failures like this one:
          [junit] Testsuite: org.apache.solr.TestDistributedGrouping
          [junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedGrouping): FAILED
          [junit] .facet_counts.size()==5,4skipped=0,0
          [junit] junit.framework.AssertionFailedError: .facet_counts.size()==5,4skipped=0,0
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:656)
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:383)
          [junit] at org.apache.solr.TestDistributedGrouping.doTest(TestDistributedGrouping.java:51)
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:671)
          [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20)
          [junit] at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:736)
          [junit] at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:632)
          [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
          [junit] at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:531)
          [junit] at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:593)
          [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
          [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
          [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20)
          [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
          [junit]
          [junit]
          [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 2.685 sec
          [junit]
          [junit] ------------- Standard Error -----------------
          [junit] 2520 T1 oas.BaseDistributedSearchTestCase.compareResponses SEVERE Mismatched responses:
          [junit] {responseHeader=

          {status=0,QTime=19}

          ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={},facet_pivot={}}}
          [junit] {responseHeader=

          {status=0,QTime=18}

          ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={}}}
          [junit] NOTE: reproduce with: ant test -Dtestcase=TestDistributedGrouping -Dtestmethod=testDistribSearch -Dtests.seed=-c7cfa73dbca93c9:-4751af558bf6f59:3e523b50870b3b1b -Dargs="-Dfile.encoding=Cp1252"

          It looks like the facet_pivot is being included in the results of one query, and not the other. I'm trying to figure out why this is occurring.
          Any insight would be appreciated.

          Show
          Chris Russell added a comment - - edited Hi Dan. I have been working with your patch, 2894, to Solr and I am having some issues with unit testing. First of all the patch doesn't seem to apply cleanly: crussell@WAT-CRUSSELL /cygdrive/d/matrixdev/solr_1297102/CBSolr/SolrLucene $ patch -p0 -i SOLR-2894 .patch patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #10 FAILED at 797. 1 out of 16 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #2 FAILED at 106. 1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/util/NamedListHelper.java patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java $ patch --version patch 2.5.8 Copyright (C) 1988 Larry Wall Copyright (C) 2002 Free Software Foundation, Inc. A lot of the contents of your original patch seemed to be formatting changes like where line breaks should go and how spacing should be handled. I was able to examine the patch and manually incorporate the changes from the failed chunks. One question, on this line (1119) of your patch, why did you choose not to initialize the map as the ones above it are? Couldn't that cause an NRE? + public SimpleOrderedMap<List<NamedList<Object>>> pivotFacets; While looking at DistributedFacetPivotTest.java I noticed an error on line 42. Your "q" should probably be " : " instead of "*". Edit: asterisk colon asterisk instead of just asterisk. I've attached the patch file I came up with. I added the initialization and test correction I mentioned. When I ran the solr/lucene unit tests after I patched, there are some unit test failures like this one: [junit] Testsuite: org.apache.solr.TestDistributedGrouping [junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedGrouping): FAILED [junit] .facet_counts.size()==5,4skipped=0,0 [junit] junit.framework.AssertionFailedError: .facet_counts.size()==5,4skipped=0,0 [junit] at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:656) [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:383) [junit] at org.apache.solr.TestDistributedGrouping.doTest(TestDistributedGrouping.java:51) [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:671) [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20) [junit] at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:736) [junit] at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:632) [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) [junit] at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:531) [junit] at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:593) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20) [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) [junit] [junit] [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 2.685 sec [junit] [junit] ------------- Standard Error ----------------- [junit] 2520 T1 oas.BaseDistributedSearchTestCase.compareResponses SEVERE Mismatched responses: [junit] {responseHeader= {status=0,QTime=19} ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={},facet_pivot={}}} [junit] {responseHeader= {status=0,QTime=18} ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={}}} [junit] NOTE: reproduce with: ant test -Dtestcase=TestDistributedGrouping -Dtestmethod=testDistribSearch -Dtests.seed=-c7cfa73dbca93c9:-4751af558bf6f59:3e523b50870b3b1b -Dargs="-Dfile.encoding=Cp1252" It looks like the facet_pivot is being included in the results of one query, and not the other. I'm trying to figure out why this is occurring. Any insight would be appreciated.
          Hide
          Chris Russell added a comment -

          Some modifications to SOLR-2894.patch that I made while trying to get it to patch on rev 1297102.

          Show
          Chris Russell added a comment - Some modifications to SOLR-2894 .patch that I made while trying to get it to patch on rev 1297102.
          Hide
          Chris Russell added a comment -

          I figured out the unit test. It's because facet_pivot is different from the other facet_blah in that it only comes back when you request pivots, whereas the others always come back. In Dan's patch he had facet_pivot coming back even when it was empty or not requested, and this did not match the behavior of pivots in a non-distributed setting. I am working on an update to my patch.

          Show
          Chris Russell added a comment - I figured out the unit test. It's because facet_pivot is different from the other facet_blah in that it only comes back when you request pivots, whereas the others always come back. In Dan's patch he had facet_pivot coming back even when it was empty or not requested, and this did not match the behavior of pivots in a non-distributed setting. I am working on an update to my patch.
          Hide
          Chris Russell added a comment -

          facet_pivot will not show up in distrib search if no contents, reversed behavior of sorting to comply with solr standard for facet.sort

          Show
          Chris Russell added a comment - facet_pivot will not show up in distrib search if no contents, reversed behavior of sorting to comply with solr standard for facet.sort
          Hide
          Hoss Man added a comment -

          Erik: Can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?

          Show
          Hoss Man added a comment - Erik: Can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?
          Hide
          Trey Grainger added a comment -

          For what it's worth, we're actively using the April 25th version of this patch in production at CareerBuilder (with an older version of trunk) with no issues.

          Show
          Trey Grainger added a comment - For what it's worth, we're actively using the April 25th version of this patch in production at CareerBuilder (with an older version of trunk) with no issues.
          Hide
          Erik Hatcher added a comment -

          Trey - thanks for the positive feedback. I'll apply the patch, run the tests, review the code, and so on. Might be a couple of weeks, unless I can get to this today.

          Show
          Erik Hatcher added a comment - Trey - thanks for the positive feedback. I'll apply the patch, run the tests, review the code, and so on. Might be a couple of weeks, unless I can get to this today.
          Hide
          Erik Hatcher added a comment -

          Patch updated to 4x branch.

          Simon, just for you, I removed NamedListHelper as well (folded its one method into PivotFacetHelper)

          Tests pass.

          Show
          Erik Hatcher added a comment - Patch updated to 4x branch. Simon, just for you, I removed NamedListHelper as well (folded its one method into PivotFacetHelper) Tests pass.
          Hide
          Erik Hatcher added a comment -

          Trey - would you be in a position to test out the latest patch? I built my latest one by starting with the March 5, 2012 SOLR-2894.patch file.

          Show
          Erik Hatcher added a comment - Trey - would you be in a position to test out the latest patch? I built my latest one by starting with the March 5, 2012 SOLR-2894 .patch file.
          Hide
          Chris Russell added a comment -

          Erik, what revision of solr did you apply the patch to? Did you not encounter the issues I encountered?

          Show
          Chris Russell added a comment - Erik, what revision of solr did you apply the patch to? Did you not encounter the issues I encountered?
          Hide
          Chris Russell added a comment -

          Erik, I can't get your patch to apply cleanly to solr 1350445

          $ patch -p0 -i SOLR-2894.patch
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #2 FAILED at 103.
          1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #11 FAILED at 799.
          1 out of 17 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          patching file solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java

          Show
          Chris Russell added a comment - Erik, I can't get your patch to apply cleanly to solr 1350445 $ patch -p0 -i SOLR-2894 .patch patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #2 FAILED at 103. 1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #11 FAILED at 799. 1 out of 17 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java patching file solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java
          Hide
          Erik Hatcher added a comment -

          Chris - I generated the patch from an r1350348 checkout (which is not branch_4x as I mentioned above). It might be a couple of weeks before I can get back to this and sort it out though, unfortunately. Sorry.

          Show
          Erik Hatcher added a comment - Chris - I generated the patch from an r1350348 checkout (which is not branch_4x as I mentioned above). It might be a couple of weeks before I can get back to this and sort it out though, unfortunately. Sorry.
          Hide
          Chris Russell added a comment -

          I have posted an enhancement to this patch as SOLR-3583. It is based on the Apr 25th version.

          Show
          Chris Russell added a comment - I have posted an enhancement to this patch as SOLR-3583 . It is based on the Apr 25th version.
          Hide
          Trey Grainger added a comment -

          Hi Erik,

          Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version.

          I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend.

          Thanks,

          -Trey

          Show
          Trey Grainger added a comment - Hi Erik, Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version. I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend. Thanks, -Trey
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Erik Hatcher added a comment -

          No way I'm getting to this any time soon, sorry. Trey - feel free to prod this one forward with more testing/feedback.

          Show
          Erik Hatcher added a comment - No way I'm getting to this any time soon, sorry. Trey - feel free to prod this one forward with more testing/feedback.
          Hide
          Dzmitry Zhemchuhou added a comment -

          I have reapplied the SOLR-2894 patch from Jun 14th to the trunk while removing most of code formatting changes that were in it. On top of that I changed the FacetComponent.refineFacets() method to add facet_pivot key-value only when there are values in the pivotFacets map, which fixes distributed search unit tests.

          Show
          Dzmitry Zhemchuhou added a comment - I have reapplied the SOLR-2894 patch from Jun 14th to the trunk while removing most of code formatting changes that were in it. On top of that I changed the FacetComponent.refineFacets() method to add facet_pivot key-value only when there are values in the pivotFacets map, which fixes distributed search unit tests.
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Robert Muir added a comment -

          moving all 4.0 issues not touched in a month to 4.1

          Show
          Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
          Hide
          Chris Russell added a comment - - edited

          Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis?
          So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible?
          For example a user might set:

          f.field1.facet.limit=10
          f.field1.facet.pivot.limit.method=combined
          f.field2.facet.limit=20

          And the combined method will not be used...
          If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to?

          Show
          Chris Russell added a comment - - edited Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis? So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible? For example a user might set: f.field1.facet.limit=10 f.field1.facet.pivot.limit.method=combined f.field2.facet.limit=20 And the combined method will not be used... If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to?
          Hide
          Chris Russell added a comment -

          In my experience using this patch, it seems that it does not over-request when enforcing a limit?
          This is problematic because, for example, in a situation where you have many slaves and you are pivoting on a fairly evenly distributed field and setting your facet limit to X, the Xth distinct value for that field by document count on each slave is likely to be different. The result is that some facet values close to your limit boundary will not get reported for aggregation, which will make your ultimate results somewhat inaccurate.

          It was my impression that other facet-based features of solr over-request when there is a limit to combat this situation? For example if you specify limit 10, the distributed query might have limit 100 or 1000, and then during aggregation it would be limited to the top 10.

          I am working on similar functionality for this patch.

          Show
          Chris Russell added a comment - In my experience using this patch, it seems that it does not over-request when enforcing a limit? This is problematic because, for example, in a situation where you have many slaves and you are pivoting on a fairly evenly distributed field and setting your facet limit to X, the Xth distinct value for that field by document count on each slave is likely to be different. The result is that some facet values close to your limit boundary will not get reported for aggregation, which will make your ultimate results somewhat inaccurate. It was my impression that other facet-based features of solr over-request when there is a limit to combat this situation? For example if you specify limit 10, the distributed query might have limit 100 or 1000, and then during aggregation it would be limited to the top 10. I am working on similar functionality for this patch.
          Hide
          Chris Russell added a comment -

          In regards to my above comment, I have determined that it is because if you specify a limit for a field that you are not requesting facet counts for, solr will not automatically over-request on that field.
          i.e.
          facet.pivot=somefield
          f.somefield.facet.limit=10

          This will make your pivots weird because the limit of 10 will not be over requested unless you add this line:
          facet.field=somefield

          Since solr does not do distributed pivoting yet, this has not been an issue yet.
          I am working on an update to the patch that will correct this issue.

          Show
          Chris Russell added a comment - In regards to my above comment, I have determined that it is because if you specify a limit for a field that you are not requesting facet counts for, solr will not automatically over-request on that field. i.e. facet.pivot=somefield f.somefield.facet.limit=10 This will make your pivots weird because the limit of 10 will not be over requested unless you add this line: facet.field=somefield Since solr does not do distributed pivoting yet, this has not been an issue yet. I am working on an update to the patch that will correct this issue.
          Hide
          Chris Russell added a comment - - edited

          Updated to apply to trunk 1404975. (Based on Dzmitry's update)

          Added ability to limit on individual fields. (f.fieldname.facet.limit)
          Added pivot fields to fields being over-requested during distributed queries.
          Made it so you can pivot on a single field.

          Show
          Chris Russell added a comment - - edited Updated to apply to trunk 1404975. (Based on Dzmitry's update) Added ability to limit on individual fields. (f.fieldname.facet.limit) Added pivot fields to fields being over-requested during distributed queries. Made it so you can pivot on a single field.
          Hide
          Shahar Davidson added a comment -

          Just thought I'd add some feedback on this valuable patch.

          I run some tests with the latest patch (Nov. 12) and the limit-per-field feature seems to be working alright. (Nice job Chris!)

          I did, however, encounter 2 other issues (which I guess are related):
          (1) There's no default sorting method. i.e. if no facet.sort is specified then results are not sorted. (this is a deviation from the current non-distributed pivot faceting behavior)
          (2) Sorting per-field does not work. (i.e. f.<field>.facet.sort=<method> does not work)

          Show
          Shahar Davidson added a comment - Just thought I'd add some feedback on this valuable patch. I run some tests with the latest patch (Nov. 12) and the limit-per-field feature seems to be working alright. (Nice job Chris!) I did, however, encounter 2 other issues (which I guess are related): (1) There's no default sorting method. i.e. if no facet.sort is specified then results are not sorted. (this is a deviation from the current non-distributed pivot faceting behavior) (2) Sorting per-field does not work. (i.e. f.<field>.facet.sort=<method> does not work)
          Hide
          Chris Russell added a comment -

          Thanks Shahar.

          In regard to #1, there is (count) and there are tests that cover that in DistributedFacetPivotTest.java. Are you sure the patch applied correctly to your version of solr?

          In regard to #2, that is correct. Does per field sorting work with non-distributed pivots? I guess it was never implemented by the original author.

          Show
          Chris Russell added a comment - Thanks Shahar. In regard to #1, there is (count) and there are tests that cover that in DistributedFacetPivotTest.java. Are you sure the patch applied correctly to your version of solr? In regard to #2, that is correct. Does per field sorting work with non-distributed pivots? I guess it was never implemented by the original author.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          #1 Yes, I believe the patch applied correctly. Once more, default sorting method is not defined and I didn't see where DistributedFacetPivotTest is testing the default sort. (I did however see where facet.sort is tested after facet.sort=count was explicitly specified).
          #2 Yes, Per field sorting works with non-distributed pivots. We were working in a non-distributed configuration up to some point and, until then, per field sorting worked properly (that was on Solr 4.0)

          We also encountered another issue when sorting by count:
          There might be a case where 2 of the returned values have the same count. In such a case, the PivotNamedListCountComparator attempts to sort by value. The problem is that when facet.missing=true is specified then one of the 2 values which have the same count may be null (missing) and in such cases the get("value").toString() operation will fail (NullPointerException).

          Show
          Shahar Davidson added a comment - Hi Chris, #1 Yes, I believe the patch applied correctly. Once more, default sorting method is not defined and I didn't see where DistributedFacetPivotTest is testing the default sort. (I did however see where facet.sort is tested after facet.sort=count was explicitly specified). #2 Yes, Per field sorting works with non-distributed pivots. We were working in a non-distributed configuration up to some point and, until then, per field sorting worked properly (that was on Solr 4.0) We also encountered another issue when sorting by count: There might be a case where 2 of the returned values have the same count. In such a case, the PivotNamedListCountComparator attempts to sort by value. The problem is that when facet.missing=true is specified then one of the 2 values which have the same count may be null (missing) and in such cases the get("value").toString() operation will fail (NullPointerException).
          Hide
          Chris Russell added a comment -

          Implemented default pivot facet sort.
          Implemented per-field pivot facet sorting.
          Fixed NRE with sorting when facet.missing is on.

          Show
          Chris Russell added a comment - Implemented default pivot facet sort. Implemented per-field pivot facet sorting. Fixed NRE with sorting when facet.missing is on.
          Hide
          Chris Russell added a comment -

          Shahar, I have corrected the issues that you mentioned, I believe.
          Default sorting is now count if a facet limit exists, otherwise index.
          I fixed the facet.missing stuff, and I went with the sorting convention that the rest of solr seems to have which is that null values always go to the bottom of the list.

          Try the new patch.

          Show
          Chris Russell added a comment - Shahar, I have corrected the issues that you mentioned, I believe. Default sorting is now count if a facet limit exists, otherwise index. I fixed the facet.missing stuff, and I went with the sorting convention that the rest of solr seems to have which is that null values always go to the bottom of the list. Try the new patch.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          I appreciate your efforts on this.
          I tried the new patch and run into numerous NPEs and didn't get to verify the sorting.

          Here's what I'm getting:

          SEVERE: null:java.lang.NullPointerException
                  at java.util.TreeMap.compare(TreeMap.java:1188)
                  at java.util.TreeMap.put(TreeMap.java:531)
                  at org.apache.solr.handler.component.PivotFacetHelper.convertPivotsToMaps(PivotFacetHelper.java:317)
                  at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:542)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
                  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) 
                  ....
          

          And also:

          SEVERE: null:java.lang.NullPointerException
                  at java.util.TreeMap.getEntry(TreeMap.java:342)
                  at java.util.TreeMap.get(TreeMap.java:273)
                  at org.apache.solr.handler.component.FacetComponent.mergePivotFacet(FacetComponent.java:692)
                  at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:552)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
                  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
                  ....
          

          Since I couldn't get to test the sorting fix, I peaked at the PivotNamedListCountComparator fix and I believe it will sort the "null" values on top (due to the return values of handleSortWhenOneValueIsNull) and regardless of the count number (due to the fact that the null values are checked before comparing the counts) - correct me if I'm wrong.

          Appreciate your help with this,

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, I appreciate your efforts on this. I tried the new patch and run into numerous NPEs and didn't get to verify the sorting. Here's what I'm getting: SEVERE: null:java.lang.NullPointerException at java.util.TreeMap.compare(TreeMap.java:1188) at java.util.TreeMap.put(TreeMap.java:531) at org.apache.solr.handler.component.PivotFacetHelper.convertPivotsToMaps(PivotFacetHelper.java:317) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:542) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) .... And also: SEVERE: null:java.lang.NullPointerException at java.util.TreeMap.getEntry(TreeMap.java:342) at java.util.TreeMap.get(TreeMap.java:273) at org.apache.solr.handler.component.FacetComponent.mergePivotFacet(FacetComponent.java:692) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:552) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) .... Since I couldn't get to test the sorting fix, I peaked at the PivotNamedListCountComparator fix and I believe it will sort the "null" values on top (due to the return values of handleSortWhenOneValueIsNull) and regardless of the count number (due to the fact that the null values are checked before comparing the counts) - correct me if I'm wrong. Appreciate your help with this, Shahar.
          Hide
          Chris Russell added a comment -

          That is odd, it worked when I tested it on my box. I will take another look.

          Show
          Chris Russell added a comment - That is odd, it worked when I tested it on my box. I will take another look.
          Hide
          Chris Russell added a comment - - edited

          Fixed NRE when using facet.missing.
          Added test for over-requesting / refinement.
          Fixed issue that broke over-requesting when local params were present in facet.field.

          In correcting the issue with over-requesting I noticed that no refinement is being done for distributed pivot facets.
          This is an issue because it means that your pivot facet counts may not be correct.
          I am working on an enhancement to add refinement for pivot facets.

          Show
          Chris Russell added a comment - - edited Fixed NRE when using facet.missing. Added test for over-requesting / refinement. Fixed issue that broke over-requesting when local params were present in facet.field. In correcting the issue with over-requesting I noticed that no refinement is being done for distributed pivot facets. This is an issue because it means that your pivot facet counts may not be correct. I am working on an enhancement to add refinement for pivot facets.
          Hide
          Chris Russell added a comment -

          Shahar, I think I've corrected the issues you reported. Take a look.
          And yes, the sorting will always put the nulls on one side, although I believe it is the bottom of the list based on my testing. This matches current single-core behavior from solr, as far as I can tell.

          Show
          Chris Russell added a comment - Shahar, I think I've corrected the issues you reported. Take a look. And yes, the sorting will always put the nulls on one side, although I believe it is the bottom of the list based on my testing. This matches current single-core behavior from solr, as far as I can tell.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          Thanks for the updated patch.

          I started testing this latest patch and encountered a few problems when NULL values are present:
          (1) If, for example, FIELD_A contains null values and FIELD_B does not, then "facet.pivot=FIELD_A,FIELD_B" will return more than 1 entry of NULLs for FIELD_A. To be exact, it return an NULL entry per shard. (In non-distributed search, there is a single NULL entry for FIELD_A)
          (2) If facet.pivot=FIELD_B,FIELD_A then one may see that for a given of field FIELD_B there is more than 1 entry of FIELD_A with NULL value. (In non-distributed search, there's only 1 entry of a NULL value under a given FIELD_B)

          It seems as if there's a problem when merging null values from cross-shard pivots.
          Does the relevant test-case include some sort of check on data with null values?

          As far as sorting is concerned, results seem to be sorted properly (per field as well).

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, Thanks for the updated patch. I started testing this latest patch and encountered a few problems when NULL values are present: (1) If, for example, FIELD_A contains null values and FIELD_B does not, then "facet.pivot=FIELD_A,FIELD_B" will return more than 1 entry of NULLs for FIELD_A. To be exact, it return an NULL entry per shard. (In non-distributed search, there is a single NULL entry for FIELD_A) (2) If facet.pivot=FIELD_B,FIELD_A then one may see that for a given of field FIELD_B there is more than 1 entry of FIELD_A with NULL value. (In non-distributed search, there's only 1 entry of a NULL value under a given FIELD_B) It seems as if there's a problem when merging null values from cross-shard pivots. Does the relevant test-case include some sort of check on data with null values? As far as sorting is concerned, results seem to be sorted properly (per field as well). Shahar.
          Hide
          Chris Russell added a comment -

          Corrected null aggregation issues when docs contain null values for fields pivoting on. Added logic to remove local params from pivot QS vars when determining over-request.

          Show
          Chris Russell added a comment - Corrected null aggregation issues when docs contain null values for fields pivoting on. Added logic to remove local params from pivot QS vars when determining over-request.
          Hide
          Ken Ip added a comment -

          Hi Chris,

          Thanks for the patch. Any chance this can be applied to 4_0 or 4_1 branch? We have no problem applying it to truck but it can't be applied to 4_0 nor 4_1. Appreciated.

          ➜ lucene_solr_4_1 patch -p0 -i SOLR-2894.patch --dry-run
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/core/src/test/org/apache/solr/SingleDocShardFeeder.java
          patching file solr/core/src/test/org/apache/solr/TestRefinementAndOverrequestingForFieldFacetCounts.java
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #1 FAILED at 16.
          Hunk #2 FAILED at 35.
          Hunk #5 succeeded at 287 with fuzz 2.
          2 out of 5 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/NullGoesLastComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #1 FAILED at 17.
          Hunk #2 FAILED at 43.
          2 out of 11 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java

          Show
          Ken Ip added a comment - Hi Chris, Thanks for the patch. Any chance this can be applied to 4_0 or 4_1 branch? We have no problem applying it to truck but it can't be applied to 4_0 nor 4_1. Appreciated. ➜ lucene_solr_4_1 patch -p0 -i SOLR-2894 .patch --dry-run patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/core/src/test/org/apache/solr/SingleDocShardFeeder.java patching file solr/core/src/test/org/apache/solr/TestRefinementAndOverrequestingForFieldFacetCounts.java patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #1 FAILED at 16. Hunk #2 FAILED at 35. Hunk #5 succeeded at 287 with fuzz 2. 2 out of 5 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/NullGoesLastComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 43. 2 out of 11 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          Once again, your efforts are much appreciated!!

          I tested the latest patch (dated Jan. 14th) and the null aggregation issues were indeed resolved - thanks!

          Any chance this could be integrated into the upcoming release? (4.2?)

          Thanks,

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, Once again, your efforts are much appreciated!! I tested the latest patch (dated Jan. 14th) and the null aggregation issues were indeed resolved - thanks! Any chance this could be integrated into the upcoming release? (4.2?) Thanks, Shahar.
          Hide
          Chris Russell added a comment -

          Ken and Shahar, thank you both for your interest in getting this patch usable with the current and upcoming releases.
          At this point in time, since refinement is not present in distributed pivoting, I would be hesitant to integrate this with release versions of solr. Indeed, where I work we have put features related to this functionality on hold until refinement can be implemented.
          I started working on implementing refinement for distributed pivots a couple of weeks ago and have handed it off to a teammate who will be seeing it through to completion.
          Currently we expect this will take another two to four weeks.
          At that time we will update what is here and look in to backporting / getting this committed.
          Thanks again.

          Show
          Chris Russell added a comment - Ken and Shahar, thank you both for your interest in getting this patch usable with the current and upcoming releases. At this point in time, since refinement is not present in distributed pivoting, I would be hesitant to integrate this with release versions of solr. Indeed, where I work we have put features related to this functionality on hold until refinement can be implemented. I started working on implementing refinement for distributed pivots a couple of weeks ago and have handed it off to a teammate who will be seeing it through to completion. Currently we expect this will take another two to four weeks. At that time we will update what is here and look in to backporting / getting this committed. Thanks again.
          Hide
          Andrew Muldowney added a comment - - edited

          As Chris said above me I've been working on distributed pivot faceting. The model had to be reworked to support this change. The new workflow is each shards result goes into a shardRequest map, those maps are combined and converted into a list what is what is is looked at. The combined results are compared to each shard's and refinement requests are made. Once all the refinement requests are made another round of refinement is queued up, this time going one level deeper than the previous refinement requests. Because of the nature of the system distributedprocess(rb) needed to be called from the refinepivotfacets to actually take the enqueued refinement requests and make them into fully fledged refinement queries. This caused some issue where other refinment types would get recalled as their refinement identifier was never cleared, Field facets are slightly modified to have a boolean for "needsRefinements" and one for "wasRefined" to help distinguish.

          Show
          Andrew Muldowney added a comment - - edited As Chris said above me I've been working on distributed pivot faceting. The model had to be reworked to support this change. The new workflow is each shards result goes into a shardRequest map, those maps are combined and converted into a list what is what is is looked at. The combined results are compared to each shard's and refinement requests are made. Once all the refinement requests are made another round of refinement is queued up, this time going one level deeper than the previous refinement requests. Because of the nature of the system distributedprocess(rb) needed to be called from the refinepivotfacets to actually take the enqueued refinement requests and make them into fully fledged refinement queries. This caused some issue where other refinment types would get recalled as their refinement identifier was never cleared, Field facets are slightly modified to have a boolean for "needsRefinements" and one for "wasRefined" to help distinguish.
          Hide
          Andrew Muldowney added a comment - - edited

          An update to my previous patch. After more consideration, I've pulled the pivot facet logic out of DistributedProcess and call those specific parts when doing the iterative refinement process to avoid any side effects of other people putting code into DistributedProcess. I've also added better support for null values and added tests to ensure that you can send refinement requests for null values and not blow up the stack.

          Show
          Andrew Muldowney added a comment - - edited An update to my previous patch. After more consideration, I've pulled the pivot facet logic out of DistributedProcess and call those specific parts when doing the iterative refinement process to avoid any side effects of other people putting code into DistributedProcess. I've also added better support for null values and added tests to ensure that you can send refinement requests for null values and not blow up the stack.
          Hide
          Shawn Heisey added a comment -

          Andrew, someone on IRC is trying to apply your patch, but we can't find a revision that it will apply to successfully, and there's no revision information in the patch. Also, whatever diff utility you used has put backslashes into the paths and that has to be manually fixed before it'll find the files on Linux. Can you update your source tree to the current version and use 'svn diff' (or 'git diff' if you've checked out with git) to create the patch and re-upload?

          Show
          Shawn Heisey added a comment - Andrew, someone on IRC is trying to apply your patch, but we can't find a revision that it will apply to successfully, and there's no revision information in the patch. Also, whatever diff utility you used has put backslashes into the paths and that has to be manually fixed before it'll find the files on Linux. Can you update your source tree to the current version and use 'svn diff' (or 'git diff' if you've checked out with git) to create the patch and re-upload?
          Hide
          Andrew Muldowney added a comment -

          New patch file for trunk, its a git patch from trunk solr pulled today, let me know if there are any issues applying.

          Show
          Andrew Muldowney added a comment - New patch file for trunk, its a git patch from trunk solr pulled today, let me know if there are any issues applying.
          Hide
          Mark Miller added a comment -

          Thanks Andrew - this isn't my area of expertise, but hopefully someone will get to this eventually. Lot's of votes and watchers

          Show
          Mark Miller added a comment - Thanks Andrew - this isn't my area of expertise, but hopefully someone will get to this eventually. Lot's of votes and watchers
          Hide
          Monica Skidmore added a comment -

          Thanks, Andrew! We're upgrading our version of Solr, and this will make our users very happy. I'm hoping it will be committed now...

          Show
          Monica Skidmore added a comment - Thanks, Andrew! We're upgrading our version of Solr, and this will make our users very happy. I'm hoping it will be committed now...
          Hide
          William Harris added a comment - - edited

          Hey, Andrew. I'm that someone from IRC.
          I managed to apply your patch ( even though a single hunk failed ), but I get a NullPointerException when I attempt a query with 3 pivots.

          SEVERE: null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.PivotFacetHelper.getCountFromPath(PivotFacetHelper.java:373)
                  at org.apache.solr.handler.component.FacetComponent.processTopElement(FacetComponent.java:779)
                  ...
          
          Show
          William Harris added a comment - - edited Hey, Andrew. I'm that someone from IRC. I managed to apply your patch ( even though a single hunk failed ), but I get a NullPointerException when I attempt a query with 3 pivots. SEVERE: null :java.lang.NullPointerException at org.apache.solr.handler.component.PivotFacetHelper.getCountFromPath(PivotFacetHelper.java:373) at org.apache.solr.handler.component.FacetComponent.processTopElement(FacetComponent.java:779) ...
          Hide
          Andrew Muldowney added a comment -

          This patch applies cleanly to trunk for me, apologies as the last one upon review was flawed.

          Show
          Andrew Muldowney added a comment - This patch applies cleanly to trunk for me, apologies as the last one upon review was flawed.
          Hide
          Chris Russell added a comment -

          I have created a unit test that causes an NPE on a 3 pivot request, I am taking a look now.

          Show
          Chris Russell added a comment - I have created a unit test that causes an NPE on a 3 pivot request, I am taking a look now.
          Hide
          Andrew Muldowney added a comment -

          Fixes the NPEs in 3pivot along with solving a hidden issue with null values when .missing was false and dealt with an issue where the facet.mincount is different from the facet.pivot.mincount (which is true for the default). The mincount issue only showed itself during index sorting.

          Show
          Andrew Muldowney added a comment - Fixes the NPEs in 3pivot along with solving a hidden issue with null values when .missing was false and dealt with an issue where the facet.mincount is different from the facet.pivot.mincount (which is true for the default). The mincount issue only showed itself during index sorting.
          Hide
          William Harris added a comment -

          I get the following error with the latest patch.

          org.apache.solr.search.SyntaxError: Expected identifier at pos 28 str='{!terms=xxxx xxxx 2}field1,field2,field3'
          

          As far as I can tell it seems to occur whenever it encounters values in field1 that end with an integer, or contain non-alphanumeric characters.

          Show
          William Harris added a comment - I get the following error with the latest patch. org.apache.solr.search.SyntaxError: Expected identifier at pos 28 str='{!terms=xxxx xxxx 2}field1,field2,field3' As far as I can tell it seems to occur whenever it encounters values in field1 that end with an integer, or contain non-alphanumeric characters.
          Hide
          Andrew Muldowney added a comment -

          The issue lies in how the refinement requests were formatted and how they were parsed on the shard side, I've made changes that should alleviate this issue and I'll push out a patch soon

          Show
          Andrew Muldowney added a comment - The issue lies in how the refinement requests were formatted and how they were parsed on the shard side, I've made changes that should alleviate this issue and I'll push out a patch soon
          Hide
          Shahar Davidson added a comment -

          Hi Andrew (and Chris), I just wanted to report a problem that we found in the patch from Jan 14th.

          In short, the problem seems to be related to facet.limit and the symptom is that a distributed pivot returns less terms than expected.

          Here's a simple scenario:
          If I run a (non-distributed) pivot such as:

          http://myHost:8999/solr/core-A/select?q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count

          then I would get N terms for field_A. (where, in my case, N is in the thousands)

          BUT, if I run a distributed pivot such as:

          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count

          then I would get at most 160 terms for field_A.
          (Why exactly 160?? I have no idea)

          On the other hand, if I use f.<field_name>.facet.limit=-1 then things work as expected. For example:

          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&f.field_A.facet.limit=-1&f.field_B.facet.limit=-1&facet.sort=count

          This will return exactly N terms for field_A as expected.

          I'll appreciate your help with this.

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Andrew (and Chris), I just wanted to report a problem that we found in the patch from Jan 14th. In short, the problem seems to be related to facet.limit and the symptom is that a distributed pivot returns less terms than expected. Here's a simple scenario: If I run a (non-distributed) pivot such as: http://myHost:8999/solr/core-A/select?q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count then I would get N terms for field_A. (where, in my case, N is in the thousands) BUT, if I run a distributed pivot such as: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count then I would get at most 160 terms for field_A. (Why exactly 160?? I have no idea) On the other hand, if I use f.<field_name>.facet.limit=-1 then things work as expected. For example: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&f.field_A.facet.limit=-1&f.field_B.facet.limit=-1&facet.sort=count This will return exactly N terms for field_A as expected. I'll appreciate your help with this. Shahar.
          Hide
          Andrew Muldowney added a comment -

          The Jan 14th patch is totally missing the refinement step, so its hard for me to say what problem is causing your issue. Please download the newest patch and let me know if that problem continues.

          Show
          Andrew Muldowney added a comment - The Jan 14th patch is totally missing the refinement step, so its hard for me to say what problem is causing your issue. Please download the newest patch and let me know if that problem continues.
          Hide
          Vishal Deshmukh added a comment -

          Hi, we are looking for this feature desperately. can anyone give ETA on this? Thanks in advance.

          Show
          Vishal Deshmukh added a comment - Hi, we are looking for this feature desperately. can anyone give ETA on this? Thanks in advance.
          Hide
          Andrew Muldowney added a comment -

          This was built for 4_1_0 but git thinks it'll apply to trunk no problem.

          This solves a myriad of issues surrounding the formatting of the refinement requests, should support all field types and deals with jagged pivot facet result sets due to nulls or empty data on pivoted fields.

          Show
          Andrew Muldowney added a comment - This was built for 4_1_0 but git thinks it'll apply to trunk no problem. This solves a myriad of issues surrounding the formatting of the refinement requests, should support all field types and deals with jagged pivot facet result sets due to nulls or empty data on pivoted fields.
          Hide
          Stein J. Gran added a comment -

          Andrew, which version does the latest patch apply to? I've tried applying it to trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing this patch in a SolrCloud environment with lots of pivot facet queries.

          For trunk I get this:
          patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java'
          Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines).
          Hunk #2 FAILED at 374.
          1 out of 2 hunks FAILED – saving rejects to solr/core/src/java/org/apache/solr/
          request/SimpleFacets.java.rej

          The rej file seems similar for trunk and the 4.2.1 tag

          Show
          Stein J. Gran added a comment - Andrew, which version does the latest patch apply to? I've tried applying it to trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing this patch in a SolrCloud environment with lots of pivot facet queries. For trunk I get this: patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java' Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines). Hunk #2 FAILED at 374. 1 out of 2 hunks FAILED – saving rejects to solr/core/src/java/org/apache/solr/ request/SimpleFacets.java.rej The rej file seems similar for trunk and the 4.2.1 tag
          Hide
          Sviatoslav Lisenkin added a comment -

          Hello, everyone.
          I had applied the latest patch two weeks ago (rev.1465879), faced the issues with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them manually (just renaming). Simple pivot faceting via web UI and sample Solr installation with two nodes worked fine. I really appreciate if someone have a chance to test it under load etc.
          Hope, this patch (and feature) will be included in the upcoming release.

          Show
          Sviatoslav Lisenkin added a comment - Hello, everyone. I had applied the latest patch two weeks ago (rev.1465879), faced the issues with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them manually (just renaming). Simple pivot faceting via web UI and sample Solr installation with two nodes worked fine. I really appreciate if someone have a chance to test it under load etc. Hope, this patch (and feature) will be included in the upcoming release.
          Hide
          Andrew Muldowney added a comment -

          It was built for 4.1. There are a few application failures with 4_2 but nothing major from what I've seen.

          Show
          Andrew Muldowney added a comment - It was built for 4.1. There are a few application failures with 4_2 but nothing major from what I've seen.
          Hide
          Stein J. Gran added a comment -

          Sviatoslav and Andrew: Thank you, only small changes to the SimpleFacets.java file was necessary to get the patch in for the 4.2.1 tag.

          I have now been testing the patch in a small SolrCloud environment with two shards (-DnumShards=2), and I have found the following:
          1. Distributed pivot facets work great on string fields
          2. No values are returned if one of the facet.pivot fields is a date field

          For scenario 2:
          a) There are no error messages in the Solr log file

          b) The URL I use is http://localhost:8983/solr/coxitocollection/select?facet=true&facet.sort=true&q=*:*&facet.limit=1000&facet.pivot=dateday_datetime,firstreplytime

          c) If I add "&distrib=false" to the URL, I get values back

          d) The fields used are defined like this in schema.xml:
          <field name="dateday_datetime" type="date" indexed="true" stored="true" multiValued="false" />
          <field name="firstreplytime" type="int" stored="true" multiValued="false" />

          e) I tried using the tdate field instead of date, but this had no effect

          f) The date and tdate fields are defined like this in schema.xml:
          <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
          <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>

          g) If I run with -DnumShards=1 this scenario works great, both with and without "distrib=false"

          h) This was tested with 4.2.1 with the patch from March 21st with the following change: The non-existing variable "mincount" was replaced with "incomingMinCount" in SimpleFacets.java

          Show
          Stein J. Gran added a comment - Sviatoslav and Andrew: Thank you, only small changes to the SimpleFacets.java file was necessary to get the patch in for the 4.2.1 tag. I have now been testing the patch in a small SolrCloud environment with two shards (-DnumShards=2), and I have found the following: 1. Distributed pivot facets work great on string fields 2. No values are returned if one of the facet.pivot fields is a date field For scenario 2: a) There are no error messages in the Solr log file b) The URL I use is http://localhost:8983/solr/coxitocollection/select?facet=true&facet.sort=true&q=*:*&facet.limit=1000&facet.pivot=dateday_datetime,firstreplytime c) If I add "&distrib=false" to the URL, I get values back d) The fields used are defined like this in schema.xml: <field name="dateday_datetime" type="date" indexed="true" stored="true" multiValued="false" /> <field name="firstreplytime" type="int" stored="true" multiValued="false" /> e) I tried using the tdate field instead of date, but this had no effect f) The date and tdate fields are defined like this in schema.xml: <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/> g) If I run with -DnumShards=1 this scenario works great, both with and without "distrib=false" h) This was tested with 4.2.1 with the patch from March 21st with the following change: The non-existing variable "mincount" was replaced with "incomingMinCount" in SimpleFacets.java
          Hide
          Elran Dvir added a comment -

          Hi,

          I want to report a problem that we found in the patch of March 21st.
          It seems that the problem Shahar reported is now solved, but there is another similar problem.
          In short, the problem seems to be related to facet.limit per field definition and the symptom is that a distributed pivot returns less terms than expected.
          Here's a simple scenario:

          if I run a distributed pivot such as:
          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&facet.limit=-1&facet.sort=index

          it will return exactly number of terms for field_A as expected.

          On the other hand, if I use f.<field_name>.facet.limit=-1:
          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&f.field_A.facet.limit=-1&facet.sort=index

          then it will return at most 100 terms for field_A.

          I'll appreciate your help with this.

          Thanks.

          Show
          Elran Dvir added a comment - Hi, I want to report a problem that we found in the patch of March 21st. It seems that the problem Shahar reported is now solved, but there is another similar problem. In short, the problem seems to be related to facet.limit per field definition and the symptom is that a distributed pivot returns less terms than expected. Here's a simple scenario: if I run a distributed pivot such as: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&facet.limit=-1&facet.sort=index it will return exactly number of terms for field_A as expected. On the other hand, if I use f.<field_name>.facet.limit=-1: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&f.field_A.facet.limit=-1&facet.sort=index then it will return at most 100 terms for field_A. I'll appreciate your help with this. Thanks.
          Hide
          Chris Russell added a comment -

          I will take a look.

          Show
          Chris Russell added a comment - I will take a look.
          Hide
          Otis Gospodnetic added a comment -

          Tons of votes and watchers on this issue!
          Chris Russell, Andrew Muldowney, and Trey Grainger - any luck with this by any chance? It would be great to get this in 4.4!

          Show
          Otis Gospodnetic added a comment - Tons of votes and watchers on this issue! Chris Russell , Andrew Muldowney , and Trey Grainger - any luck with this by any chance? It would be great to get this in 4.4!
          Hide
          Andrew Muldowney added a comment -

          Im working on this patch again, looking into the limit issue and the fact that exclusion tags aren't being respected. They both boil down to improperly formatted refinement requests, so I'm going through and cleaning those up to look more and more like the distributed field facet code. Should also have time to get to the datetime problem, where you cannot refine on datetimes because the datetime format returned by the shards is not queryable when refining.

          Show
          Andrew Muldowney added a comment - Im working on this patch again, looking into the limit issue and the fact that exclusion tags aren't being respected. They both boil down to improperly formatted refinement requests, so I'm going through and cleaning those up to look more and more like the distributed field facet code. Should also have time to get to the datetime problem, where you cannot refine on datetimes because the datetime format returned by the shards is not queryable when refining.
          Hide
          Trey Grainger added a comment -

          @Otis Gospodnetic, we have this patch live in production for several use cases (as a pre-requisite for SOLR-3583, which we've also worked on @CareerBuilder), but the currently known issues which would prevent this from being committed include:
          1) Tags and Excludes are not being respected beyond the first level
          2) The facet.limit=-1 issue (not returning all values)
          3) The lack of support for datetimes

          We need #1 and Andrew is working on a project currently to fix this. He's also looking to fix #3 and find a reasonably scalable solution to #2. I'm not sure when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks until this patch is all wrapped up.

          Meanwhile, if anyone else finds any issues with the patch, please let us know so they can be looked into. Thanks!

          Show
          Trey Grainger added a comment - @ Otis Gospodnetic , we have this patch live in production for several use cases (as a pre-requisite for SOLR-3583 , which we've also worked on @CareerBuilder), but the currently known issues which would prevent this from being committed include: 1) Tags and Excludes are not being respected beyond the first level 2) The facet.limit=-1 issue (not returning all values) 3) The lack of support for datetimes We need #1 and Andrew is working on a project currently to fix this. He's also looking to fix #3 and find a reasonably scalable solution to #2. I'm not sure when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks until this patch is all wrapped up. Meanwhile, if anyone else finds any issues with the patch, please let us know so they can be looked into. Thanks!
          Hide
          Andrew Muldowney added a comment - - edited

          Built on 4_4
          This version fixes the following:

          1) Indecisive faceting not being respected on refinement queries
          2) Key not being respected
          3) Facet.offset not being respected
          4) datetimes breaking when trying to refine

          One point of contention is this:
          The SolrExampleTests.java (for the SolrJ stuff) had a check that required pivot facet boolean results as strict Boolean.TRUE as opposed to the string "true".

          This came about from the change that was required to fix datetime.

          I can't find anywhere else where we require a boolean field's value to equal Boolean.True so I think this test was just an artifact of how the original pivot facetting code was written.

          As it stands now the SolrExampleTests.doPivotFacetTest:1151 has been changed to "true" instead of Boolean.TRUE

          Show
          Andrew Muldowney added a comment - - edited Built on 4_4 This version fixes the following: 1) Indecisive faceting not being respected on refinement queries 2) Key not being respected 3) Facet.offset not being respected 4) datetimes breaking when trying to refine One point of contention is this: The SolrExampleTests.java (for the SolrJ stuff) had a check that required pivot facet boolean results as strict Boolean.TRUE as opposed to the string "true". This came about from the change that was required to fix datetime. I can't find anywhere else where we require a boolean field's value to equal Boolean.True so I think this test was just an artifact of how the original pivot facetting code was written. As it stands now the SolrExampleTests.doPivotFacetTest:1151 has been changed to "true" instead of Boolean.TRUE
          Hide
          Elran Dvir added a comment -

          Andrew, Thank you very much for the fix!

          Does this version fix the issue of f.field.facet.limit not being respected?

          Thanks.

          Show
          Elran Dvir added a comment - Andrew, Thank you very much for the fix! Does this version fix the issue of f.field.facet.limit not being respected? Thanks.
          Hide
          Andrew Muldowney added a comment - - edited

          Yes, it should

          Show
          Andrew Muldowney added a comment - - edited Yes, it should
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          I have tried applying latest patch to 4.2.1 and there were a few problems.
          Which version does it apply to?

          Thanks.

          Show
          Elran Dvir added a comment - Hi Andrew, I have tried applying latest patch to 4.2.1 and there were a few problems. Which version does it apply to? Thanks.
          Hide
          Otis Gospodnetic added a comment -

          Elran Dvir - didn't try applying it yet, but 99.9% sure it is/was the trunk.

          Show
          Otis Gospodnetic added a comment - Elran Dvir - didn't try applying it yet, but 99.9% sure it is/was the trunk.
          Hide
          Andrew Muldowney added a comment -

          I built it for 4_4.

          But I didn't have trouble patching it to 4_2_1. Did you pull from git or svn?

          Show
          Andrew Muldowney added a comment - I built it for 4_4. But I didn't have trouble patching it to 4_2_1. Did you pull from git or svn?
          Hide
          Elran Dvir added a comment -

          I have downloaded the source code from Solr's website.
          Then opened it with my IDE: Intellij.
          when I tried applying the patch, Intellij reported there were problems with some files.

          Thanks.

          Show
          Elran Dvir added a comment - I have downloaded the source code from Solr's website. Then opened it with my IDE: Intellij. when I tried applying the patch, Intellij reported there were problems with some files. Thanks.
          Hide
          Andrew Muldowney added a comment -

          Fixed an issue where commas in string fields would cause infinite refinement loops.

          Show
          Andrew Muldowney added a comment - Fixed an issue where commas in string fields would cause infinite refinement loops.
          Hide
          Stein J. Gran added a comment -

          I have now re-tested the scenarios I used on April 10th (see my comment above from that date), and all of those issues I found then are now resolved I applied the July 25th patch to the lucene_solr_4_4 branch (Github) and performed the tests on this version.

          Well done Andrew Thumbs up from me.

          Show
          Stein J. Gran added a comment - I have now re-tested the scenarios I used on April 10th (see my comment above from that date), and all of those issues I found then are now resolved I applied the July 25th patch to the lucene_solr_4_4 branch (Github) and performed the tests on this version. Well done Andrew Thumbs up from me.
          Hide
          Andrew Muldowney added a comment -

          I've found a small error which causes largely sharded (30+) data to spiral out of control on refinement requests.

          I've fixed the error on a previous version of solr and I'll be forward porting it to my 4_4 build by tomorrow.

          If you are having issues with complex string fields this should help.

          Show
          Andrew Muldowney added a comment - I've found a small error which causes largely sharded (30+) data to spiral out of control on refinement requests. I've fixed the error on a previous version of solr and I'll be forward porting it to my 4_4 build by tomorrow. If you are having issues with complex string fields this should help.
          Hide
          Otis Gospodnetic added a comment -

          Andrew Muldowney - I didn't have time to link issues, but Joel Bernstein is working on at least one issue that is, if I recall correctly, an alternative implementation of this....

          Show
          Otis Gospodnetic added a comment - Andrew Muldowney - I didn't have time to link issues, but Joel Bernstein is working on at least one issue that is, if I recall correctly, an alternative implementation of this....
          Hide
          Andrew Muldowney added a comment -

          Fixed the run-away-but-eventually-coalesing refinement query issue

          At this point all known issues have been resolved.

          Show
          Andrew Muldowney added a comment - Fixed the run-away-but-eventually-coalesing refinement query issue At this point all known issues have been resolved.
          Hide
          William Harris added a comment - - edited

          Hey, Andrew. Really appreciate the effort here!
          I am seeing this error with the latest patch.

          "error": {
              "msg": "java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format",
              "trace": "org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang.Thread.run(Thread.java:724)\nCaused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:110)\n\tat org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:407)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:155)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t... 1 more\n",
              "code": 500
            }
          

          Trying it with 2 pivots across 8 shards with a total of ~3 million docs.
          Not sure what's causing it, but let me know if I can do anything to help!

          Show
          William Harris added a comment - - edited Hey, Andrew. Really appreciate the effort here! I am seeing this error with the latest patch. "error" : { "msg" : "java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format" , "trace" : "org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang. Thread .run( Thread .java:724)\nCaused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:110)\n\tat org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:407)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:155)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t... 1 more\n" , "code" : 500 } Trying it with 2 pivots across 8 shards with a total of ~3 million docs. Not sure what's causing it, but let me know if I can do anything to help!
          Hide
          Andrew Muldowney added a comment -

          What are your exact field types? What is your query?
          From what I'm seeing online that error is because a shard failed for some reason. Knowing why a shard failed will yield a much better error message.

          Show
          Andrew Muldowney added a comment - What are your exact field types? What is your query? From what I'm seeing online that error is because a shard failed for some reason. Knowing why a shard failed will yield a much better error message.
          Hide
          William Harris added a comment - - edited

          its a pretty simple query: q=<star>:<star>&facet=on&facet.pivot=fieldA,fieldB , both regular single valued solr.TextFields with solr.LowerCaseFilterFactory filters. All shards work well individually.
          I'm looking at the logs but unfortunately I'm not seeing any other error messages there.

          It works as long as I use less than 6 shards. With 6 or more it fails with that error, regardless of which shards I use.

          Show
          William Harris added a comment - - edited its a pretty simple query: q=<star>:<star>&facet=on&facet.pivot=fieldA,fieldB , both regular single valued solr.TextFields with solr.LowerCaseFilterFactory filters. All shards work well individually. I'm looking at the logs but unfortunately I'm not seeing any other error messages there. It works as long as I use less than 6 shards. With 6 or more it fails with that error, regardless of which shards I use.
          Hide
          Andrew Muldowney added a comment -

          It shouldn't be a shard amount issue, were running this patch on a 50 shard cluster over several servers with solid results.
          Try the shards.tolerant=true parameter on your distributed search? It supposedly includes error information if available.

          Show
          Andrew Muldowney added a comment - It shouldn't be a shard amount issue, were running this patch on a 50 shard cluster over several servers with solid results. Try the shards.tolerant=true parameter on your distributed search? It supposedly includes error information if available.
          Hide
          William Harris added a comment - - edited

          shards.tolerant=true did indeed yield a more descriptive error:

          ERROR - 2013-08-21 12:54:17.392; org.apache.solr.common.SolrException; null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.FacetComponent.refinePivotFacets(FacetComponent.java:882)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:411)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
                  at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
                  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
                  at java.lang.Thread.run(Thread.java:724
          

          I also reindexed everything replacing the values of all string fields with their corresponding hashes in order to see if the error could be caused by some odd strings, but the same error occurs.
          I am also seeing this error after i switched to MD5 hashes for document IDs:

          ERROR - 2013-08-22 14:28:25.248; org.apache.solr.common.SolrException; null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:903)
                  at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649)
                  at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
                  at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
                  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
                  at java.lang.Thread.run(Thread.java:724
          
          Show
          William Harris added a comment - - edited shards.tolerant=true did indeed yield a more descriptive error: ERROR - 2013-08-21 12:54:17.392; org.apache.solr.common.SolrException; null :java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refinePivotFacets(FacetComponent.java:882) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:411) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang. Thread .run( Thread .java:724 I also reindexed everything replacing the values of all string fields with their corresponding hashes in order to see if the error could be caused by some odd strings, but the same error occurs. I am also seeing this error after i switched to MD5 hashes for document IDs: ERROR - 2013-08-22 14:28:25.248; org.apache.solr.common.SolrException; null :java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:903) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang. Thread .run( Thread .java:724
          Hide
          Andrew Muldowney added a comment -

          The first error occurs on a line where the refinement response is being mined for its information. The line asks for the value and it gets an NPE. Does your data contain nulls? I have code in to deal with that situation but its possible I'm missing an edge case. Do you have any suggestions for a test case that would create this error?

          The second error never gets to anything I've changed so I think MD5ing your docIDs is causing all sorts of other issues unrelated to this patch.

          Show
          Andrew Muldowney added a comment - The first error occurs on a line where the refinement response is being mined for its information. The line asks for the value and it gets an NPE. Does your data contain nulls? I have code in to deal with that situation but its possible I'm missing an edge case. Do you have any suggestions for a test case that would create this error? The second error never gets to anything I've changed so I think MD5ing your docIDs is causing all sorts of other issues unrelated to this patch.
          Hide
          William Harris added a comment - - edited

          I thought the issue might be related to me not assigning those fields values in every document, but I tried reindexing giving them all values and the error still occurs.
          I tampered with the source a bit, and managed to trace the error to srsp.getSolrResponse().getResponse(), meaning getSolrResponse() is returning null. Hope that helps.

          Show
          William Harris added a comment - - edited I thought the issue might be related to me not assigning those fields values in every document, but I tried reindexing giving them all values and the error still occurs. I tampered with the source a bit, and managed to trace the error to srsp.getSolrResponse().getResponse(), meaning getSolrResponse() is returning null. Hope that helps.
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          in class PivotFacetProcessor in function doPivots, I have noticed a change in the code.
          the current implementation is:
          SimpleOrderedMap<Object> pivot = new SimpleOrderedMap<Object>();
          pivot.add( "field", field );
          if (null == fieldValue)

          { pivot.add( "value", null ); }

          else

          { termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( "value", fieldValue ); }

          pivot.add( "count", kv.getValue() );

          It means we are getting the values as strings in pivots.

          In the past the implementation was "pivot.add( "value", ftype.toObject(sfield, termval))" - meaning we were getting the values as objects.
          I am using SolrJ and it's very important to me to get the values as objects.

          Is there a reason not returning the values as objects?

          Thank you very much.

          Show
          Elran Dvir added a comment - Hi Andrew, in class PivotFacetProcessor in function doPivots, I have noticed a change in the code. the current implementation is: SimpleOrderedMap<Object> pivot = new SimpleOrderedMap<Object>(); pivot.add( "field", field ); if (null == fieldValue) { pivot.add( "value", null ); } else { termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( "value", fieldValue ); } pivot.add( "count", kv.getValue() ); It means we are getting the values as strings in pivots. In the past the implementation was "pivot.add( "value", ftype.toObject(sfield, termval))" - meaning we were getting the values as objects. I am using SolrJ and it's very important to me to get the values as objects. Is there a reason not returning the values as objects? Thank you very much.
          Hide
          Andrew Muldowney added a comment -

          I wrote about that on 22/Jul/13 22:15.

          By having it .ToObject it was taking an internal datetime reference and giving me the pretty format, which was nonconvertible when trying to do refinement on that value. I'm not sure where the .ToObject call should go now as the datetimes in facets never seem to come out in that pretty format.

          Show
          Andrew Muldowney added a comment - I wrote about that on 22/Jul/13 22:15. By having it .ToObject it was taking an internal datetime reference and giving me the pretty format, which was nonconvertible when trying to do refinement on that value. I'm not sure where the .ToObject call should go now as the datetimes in facets never seem to come out in that pretty format.
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          Sorry for the long delay.
          I am still seeing the issue I reported on 20/May/13 12:27 (f.field_A.facet.limit=-1 returns at most 100 terms for field_A).

          Also, can you please dircet me to the line of code where datetimes are breaking when trying to refine (caused by "pivot.add( "value", ftype.toObject(sfield, termval))")
          I need pivot to return values as objects.

          Thank you very much.

          Show
          Elran Dvir added a comment - Hi Andrew, Sorry for the long delay. I am still seeing the issue I reported on 20/May/13 12:27 (f.field_A.facet.limit=-1 returns at most 100 terms for field_A). Also, can you please dircet me to the line of code where datetimes are breaking when trying to refine (caused by "pivot.add( "value", ftype.toObject(sfield, termval))") I need pivot to return values as objects. Thank you very much.
          Hide
          Andrew Muldowney added a comment -

          By returning an object you get a pretty date format that does not work when doing:
          getListedTermCounts(field,firstFieldsValues); -PivotFacetProcessor::104

          When it attempts to convert the pretty date it has into the datatype of the field it will fail to do so.

          Show
          Andrew Muldowney added a comment - By returning an object you get a pretty date format that does not work when doing: getListedTermCounts(field,firstFieldsValues); -PivotFacetProcessor::104 When it attempts to convert the pretty date it has into the datatype of the field it will fail to do so.
          Hide
          Elran Dvir added a comment -

          I exained the code.
          It seems that the refinement process occures before doPivots (where the call "ftype.toObject(sfield, termval))" was).
          So it seems toObject shouldn't affect the refinement process .
          What am I missing?

          Thanks.

          Show
          Elran Dvir added a comment - I exained the code. It seems that the refinement process occures before doPivots (where the call "ftype.toObject(sfield, termval))" was). So it seems toObject shouldn't affect the refinement process . What am I missing? Thanks.
          Hide
          Andrew Muldowney added a comment -

          DoPivots is called on every shard once to get each shard's response. It has built into it support for refinement but this first run through has no refinement information yet.

          When the master box sends out its refinement requests, DoPivots is run again on the shards recieving requests, at this point it utilizes the refinement steps DoPivots has, and blows up on dates, because instead of getting something like "1995-12-31T23:59:59Z" it gets "Tuesday December 12th, 1995 at 23:59". The latter does not convert to the former, so the entire thing blows up.

          Show
          Andrew Muldowney added a comment - DoPivots is called on every shard once to get each shard's response. It has built into it support for refinement but this first run through has no refinement information yet. When the master box sends out its refinement requests, DoPivots is run again on the shards recieving requests, at this point it utilizes the refinement steps DoPivots has, and blows up on dates, because instead of getting something like "1995-12-31T23:59:59Z" it gets "Tuesday December 12th, 1995 at 23:59". The latter does not convert to the former, so the entire thing blows up.
          Hide
          Elran Dvir added a comment -

          I didn't manage to make ditributed pivot on date field to blow up with toObject.
          Can you please attach an example query that blows Solr up and I'll adjust it to my environment?

          Thanks.

          Show
          Elran Dvir added a comment - I didn't manage to make ditributed pivot on date field to blow up with toObject. Can you please attach an example query that blows Solr up and I'll adjust it to my environment? Thanks.
          Hide
          Trey Grainger added a comment -

          FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter.

          Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week.

          Show
          Trey Grainger added a comment - FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter. Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week.
          Hide
          Yonik Seeley added a comment -

          Sweet, nice work Trey!

          Show
          Yonik Seeley added a comment - Sweet, nice work Trey!
          Hide
          Trey Grainger added a comment -

          Thanks, Yonik. I worked on the architecture and design, but it's really been a team effort by several of us at CB. Chris worked with the initial patch, Andrew hardened it, and Brett (who will post the next version) focused on the soon-to-be-posted performance optimizations. We're deploying the new version to production right now to sanity check it before posting the patch, but I think the upcoming version will finally be ready for review for committing.

          Show
          Trey Grainger added a comment - Thanks, Yonik. I worked on the architecture and design, but it's really been a team effort by several of us at CB. Chris worked with the initial patch, Andrew hardened it, and Brett (who will post the next version) focused on the soon-to-be-posted performance optimizations. We're deploying the new version to production right now to sanity check it before posting the patch, but I think the upcoming version will finally be ready for review for committing.
          Hide
          Brett Lucey added a comment -

          This is the updated version of our implementation of Pivot Facets, as mentioned by Trey. We have significantly improved performance for cases which involve a large number of shards through changing the underlying data structure and the way that data from the shards is merged together.

          Show
          Brett Lucey added a comment - This is the updated version of our implementation of Pivot Facets, as mentioned by Trey. We have significantly improved performance for cases which involve a large number of shards through changing the underlying data structure and the way that data from the shards is merged together.
          Hide
          Elran Dvir added a comment -

          Brett (and your team), thank you very much for your hard work. We've been having a lot of performance issues with using the previous version patch (which we love), and initial testing shows they are now resolved.

          We did also notice a few issues:

          1) f.field.facet.limit=-1 is not being respected (as reported by me on 20/May/13 10:27)

          2) pivot queries are returning String instead of Object (as reported by me on 25/Aug/13 07:38) except for boolean fields.
          I know the reason is there was a problem with datetime fields. I changed it back to toObject and I can't reproduce any issues running unit test locally via maven.
          I'd be glad to help fix this problem, if anyone can create a simple test case that fails ?

          3) The following query throws an exception:

          q=:&rows=0&f.fieldA.facet.sort=index&f.fieldA.facet.limit=-1&f.fieldA.facet.missing=true&f.fieldA.facet.mincount=1&f.fieldB.facet.sort=index&f.fieldB.facet.limit=-1&f.fieldB.facet.missing=true&f.fieldB.facet.mincount=1&facet=true&facet.pivot=fieldA,fieldB&shards=127.0.0.1:8983/solr/shardA,127.0.0.1:8983/solr/shardB

          java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.ArrayList.subListRangeCheck(ArrayList.java:975) at java.util.ArrayList.subList(ArrayList.java:965) at org.apache.solr.handler.component.PivotFacetField.refineNextLevelOfFacets(PivotFacetField.java:276) at org.apache.solr.handler.component.PivotFacetField.queuePivotRefinementRequests(PivotFacetField.java:231) at org.apache.solr.handler.component.PivotFacet.queuePivotRefinementRequests(PivotFacet.java:86) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:565) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:413) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:804)

          Thank you again for this great patch!

          Show
          Elran Dvir added a comment - Brett (and your team), thank you very much for your hard work. We've been having a lot of performance issues with using the previous version patch (which we love), and initial testing shows they are now resolved. We did also notice a few issues: 1) f.field.facet.limit=-1 is not being respected (as reported by me on 20/May/13 10:27) 2) pivot queries are returning String instead of Object (as reported by me on 25/Aug/13 07:38) except for boolean fields. I know the reason is there was a problem with datetime fields. I changed it back to toObject and I can't reproduce any issues running unit test locally via maven. I'd be glad to help fix this problem, if anyone can create a simple test case that fails ? 3) The following query throws an exception: q= : &rows=0&f.fieldA.facet.sort=index&f.fieldA.facet.limit=-1&f.fieldA.facet.missing=true&f.fieldA.facet.mincount=1&f.fieldB.facet.sort=index&f.fieldB.facet.limit=-1&f.fieldB.facet.missing=true&f.fieldB.facet.mincount=1&facet=true&facet.pivot=fieldA,fieldB&shards=127.0.0.1:8983/solr/shardA,127.0.0.1:8983/solr/shardB java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.ArrayList.subListRangeCheck(ArrayList.java:975) at java.util.ArrayList.subList(ArrayList.java:965) at org.apache.solr.handler.component.PivotFacetField.refineNextLevelOfFacets(PivotFacetField.java:276) at org.apache.solr.handler.component.PivotFacetField.queuePivotRefinementRequests(PivotFacetField.java:231) at org.apache.solr.handler.component.PivotFacet.queuePivotRefinementRequests(PivotFacet.java:86) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:565) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:413) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:804) Thank you again for this great patch!
          Hide
          Chris Russell added a comment -

          We are at DevNexus today and tomorrow, but Brett, Andrew and I will look at it on Weds. Thanks for the feedback.

          Show
          Chris Russell added a comment - We are at DevNexus today and tomorrow, but Brett, Andrew and I will look at it on Weds. Thanks for the feedback.
          Hide
          Brett Lucey added a comment -

          Elran – Thanks for the feedback. I'm glad to hear the performance issues have been fixed. The exception is being thrown because the facet limit is -1 which causes an attempt to allocate a negatively sized sub list. If you need a work around for the short term, just avoid using that for the facet limit. I will try to fix that this week.

          Show
          Brett Lucey added a comment - Elran – Thanks for the feedback. I'm glad to hear the performance issues have been fixed. The exception is being thrown because the facet limit is -1 which causes an attempt to allocate a negatively sized sub list. If you need a work around for the short term, just avoid using that for the facet limit. I will try to fix that this week.
          Hide
          Brett Lucey added a comment -

          This is an update to the previous patch I uploaded which excludes whitespace changes and eliminates dead code. This does not yet include a fix for the -1 facet limit.

          Show
          Brett Lucey added a comment - This is an update to the previous patch I uploaded which excludes whitespace changes and eliminates dead code. This does not yet include a fix for the -1 facet limit.
          Hide
          Elran Dvir added a comment -

          By the way, when I profiled our very slow distributed pivot, I noticed most of the time is wasted in val.get in trimExcessValuesBasedUponFacetLimitAndOffset in PivotFacetHelper.java.
          The following change (in the first two lines of the function) has shown a significant improvement:
          List<NamedList<Object>> newVal = new LinkedList<NamedList<Object>>();
          if (val == null) return val;
          to:
          if (val == null) return val;
          List<NamedList<Object>> newVal = new ArrayList<NamedList<Object>>(val.size());

          Thanks.

          Show
          Elran Dvir added a comment - By the way, when I profiled our very slow distributed pivot, I noticed most of the time is wasted in val.get in trimExcessValuesBasedUponFacetLimitAndOffset in PivotFacetHelper.java. The following change (in the first two lines of the function) has shown a significant improvement: List<NamedList<Object>> newVal = new LinkedList<NamedList<Object>>(); if (val == null) return val; to: if (val == null) return val; List<NamedList<Object>> newVal = new ArrayList<NamedList<Object>>(val.size()); Thanks.
          Hide
          Brett Lucey added a comment -

          Hi Elran,

          Regarding the string/object issue: We have not been able to revert back to toObject for all data types because doing so results in the following exception being thrown by the testDistribSearch case of DistributedFacetPivotTest.
          org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid Date String:'Sat Sep 01 08:30:00 BOT 2012

          This is part of the test case which demonstrates the issue:
          //datetime
          this.query( "q", ":",
          "rows", "0",
          "facet","true",
          "facet.pivot","hiredate_dt,place_s,company_t",
          "f.hiredate_dt.facet.limit","2",
          "f.hiredate_dt.facet.offset","1",
          FacetParams.FACET_LIMIT, "4"); //test default sort (count)

          I am producing that error by running:
          ant -Dtests.class="org.apache.solr.handler.component.DistributedFacetPivotTest" clean test

          Show
          Brett Lucey added a comment - Hi Elran, Regarding the string/object issue: We have not been able to revert back to toObject for all data types because doing so results in the following exception being thrown by the testDistribSearch case of DistributedFacetPivotTest. org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid Date String:'Sat Sep 01 08:30:00 BOT 2012 This is part of the test case which demonstrates the issue: //datetime this.query( "q", " : ", "rows", "0", "facet","true", "facet.pivot","hiredate_dt,place_s,company_t", "f.hiredate_dt.facet.limit","2", "f.hiredate_dt.facet.offset","1", FacetParams.FACET_LIMIT, "4"); //test default sort (count) I am producing that error by running: ant -Dtests.class="org.apache.solr.handler.component.DistributedFacetPivotTest" clean test
          Hide
          Brett Lucey added a comment -

          We have been trying to finish up the distributed pivot facet patch to submit it for committing, but we've run into one issue raised in the JIRA issue for the patch.

          Distributed field facets appear to convert all values to strings, and refine on those strings when needed. The solrj test cases around the distributed field facets also expect all values to be represented as strings. Please correct me if I'm wrong.

          The most recent revision of the patch for SOLR-2894 converts all values to strings as well, except in the case of booleans. If we don't have this exception, the solrj test case of distributed pivot facets will fail because there is an assert verifying that the value returned is Boolean.TRUE rather than "true". If we do not convert object values to strings, we are unable to refine on date time's because the format will be incorrect and the date time class is unable to convert it back.

          Our feeling is that this assert is incorrect and that it should be looking for the string true. Changing this test case would allow us to bring the behavior of distributed pivot facets into consistency with the behavior of distributed field facets. We didn't want to just go ahead and make that change because it would change the behavior of solrj.

          It appears our options are:
          1. Convert all values to strings. Modify the solrj test case to expect this. We feel this makes things consistent and is the ideal option.
          2. Convert all values exception booleans to strings.
          3. Leave all values except datettimes as is.

          We are just looking for some input from the community prior to adding this change to our patch.

          Show
          Brett Lucey added a comment - We have been trying to finish up the distributed pivot facet patch to submit it for committing, but we've run into one issue raised in the JIRA issue for the patch. Distributed field facets appear to convert all values to strings, and refine on those strings when needed. The solrj test cases around the distributed field facets also expect all values to be represented as strings. Please correct me if I'm wrong. The most recent revision of the patch for SOLR-2894 converts all values to strings as well, except in the case of booleans. If we don't have this exception, the solrj test case of distributed pivot facets will fail because there is an assert verifying that the value returned is Boolean.TRUE rather than "true". If we do not convert object values to strings, we are unable to refine on date time's because the format will be incorrect and the date time class is unable to convert it back. Our feeling is that this assert is incorrect and that it should be looking for the string true. Changing this test case would allow us to bring the behavior of distributed pivot facets into consistency with the behavior of distributed field facets. We didn't want to just go ahead and make that change because it would change the behavior of solrj. It appears our options are: 1. Convert all values to strings. Modify the solrj test case to expect this. We feel this makes things consistent and is the ideal option. 2. Convert all values exception booleans to strings. 3. Leave all values except datettimes as is. We are just looking for some input from the community prior to adding this change to our patch.
          Hide
          Elran Dvir added a comment -

          I think we should preserve object values in disruibuted pivot, as in regular pivot.
          I want to help fix the toObject problem with datetime fields.
          I tried to apply the most recent patch to the latest Solr trunk revision. There were some problems applying it.
          Can you please create a new patch against latest Solr trunk revision, or indicate which revision the patch was created againt?

          Thanks!

          Show
          Elran Dvir added a comment - I think we should preserve object values in disruibuted pivot, as in regular pivot. I want to help fix the toObject problem with datetime fields. I tried to apply the most recent patch to the latest Solr trunk revision. There were some problems applying it. Can you please create a new patch against latest Solr trunk revision, or indicate which revision the patch was created againt? Thanks!
          Hide
          Brett Lucey added a comment -

          This is an updated patch which should apply cleanly against trunk. I've used this against revision 885cdea13918fee0c49d5ac0c5fa1fd286d5b466.

          This should include a fix for the unlimited facet that Elran brought up. It does not address the toObject issue being discussed.

          Does anyone have additional input or thoughts as to which route to go with the toObject/string issue?

          Show
          Brett Lucey added a comment - This is an updated patch which should apply cleanly against trunk. I've used this against revision 885cdea13918fee0c49d5ac0c5fa1fd286d5b466. This should include a fix for the unlimited facet that Elran brought up. It does not address the toObject issue being discussed. Does anyone have additional input or thoughts as to which route to go with the toObject/string issue?
          Hide
          Elran Dvir added a comment -

          Thanks for the patch.
          I will take a lool at the toObject problem with datetime fields.
          Does the patch fix issues 1 and 3 I reported on February 24?

          Thanks.

          Show
          Elran Dvir added a comment - Thanks for the patch. I will take a lool at the toObject problem with datetime fields. Does the patch fix issues 1 and 3 I reported on February 24? Thanks.
          Hide
          Brett Lucey added a comment -

          Elran - A facet limit of -1 in distributed pivot facets is not a use case we use in our environment, but we did go ahead and make the fixes in order to support the community. I've tested the changes locally on a box with success and added unit tests around it, but we have not yet deployed those changes to a production cluster. The exception you were seeing was directly related to the facet limit being negative, and that has been fixed in the patch I uploaded yesterday.

          Show
          Brett Lucey added a comment - Elran - A facet limit of -1 in distributed pivot facets is not a use case we use in our environment, but we did go ahead and make the fixes in order to support the community. I've tested the changes locally on a box with success and added unit tests around it, but we have not yet deployed those changes to a production cluster. The exception you were seeing was directly related to the facet limit being negative, and that has been fixed in the patch I uploaded yesterday.
          Hide
          Elran Dvir added a comment -

          I think I solved the the toObject problem with datetime fields.
          Please see the patch attached.
          All tests pass now.
          Let me know what you think.
          Thanks.

          Show
          Elran Dvir added a comment - I think I solved the the toObject problem with datetime fields. Please see the patch attached. All tests pass now. Let me know what you think. Thanks.
          Hide
          Elran Dvir added a comment -

          I have checked the latest patch.
          Problem 3 (field with negative limit threw exception) is now solved. Thanks!
          But I still see problem 1 (f.field.facet.limit=-1 is not being respected).

          Thank you very much.

          Show
          Elran Dvir added a comment - I have checked the latest patch. Problem 3 (field with negative limit threw exception) is now solved. Thanks! But I still see problem 1 (f.field.facet.limit=-1 is not being respected). Thank you very much.
          Hide
          Brett Lucey added a comment -

          Elran - Can you give me an example test case or query for which the -1 facet limit fails? I'll be glad to take a look and fix it if I can reproduce an issue with it.

          Show
          Brett Lucey added a comment - Elran - Can you give me an example test case or query for which the -1 facet limit fails? I'll be glad to take a look and fix it if I can reproduce an issue with it.
          Hide
          Elran Dvir added a comment -

          Hi,

          I don't know where I should exactly put the test in DistributedFacetPivotTest, but this the test:
          1)index more than 100 docs (you can index docs only with id)
          2)run the following query:
          this.query( "q", ":",
          "rows", "0",
          "facet","true",
          "facet.pivot","id",
          "f.id.facet.limit","-1");

          you expect to get as many ids as you indexed, but you will get only 100.

          Thanks.

          Show
          Elran Dvir added a comment - Hi, I don't know where I should exactly put the test in DistributedFacetPivotTest, but this the test: 1)index more than 100 docs (you can index docs only with id) 2)run the following query: this.query( "q", " : ", "rows", "0", "facet","true", "facet.pivot","id", "f.id.facet.limit","-1"); you expect to get as many ids as you indexed, but you will get only 100. Thanks.
          Hide
          Chris Russell added a comment -

          Elran, interesting, does that happen if you use facet.limit=-1 instead of the f.fieldname.facet.limit syntax? I am wondering if some code is checking the global limit but not the per-field limit.

          Show
          Chris Russell added a comment - Elran, interesting, does that happen if you use facet.limit=-1 instead of the f.fieldname.facet.limit syntax? I am wondering if some code is checking the global limit but not the per-field limit.
          Hide
          Elran Dvir added a comment -

          No.
          It doesn't happen when I use facet.limit=-1 instead of the f.fieldname.facet.limit syntax.

          Thanks.

          Show
          Elran Dvir added a comment - No. It doesn't happen when I use facet.limit=-1 instead of the f.fieldname.facet.limit syntax. Thanks.
          Hide
          Brett Lucey added a comment -

          I've uploaded the newest version of the patch. This includes a fix for the -1 facet limit when specified on a specific field and incorporates Elran's toObject fix.

          Show
          Brett Lucey added a comment - I've uploaded the newest version of the patch. This includes a fix for the -1 facet limit when specified on a specific field and incorporates Elran's toObject fix.
          Hide
          AJ Lemke added a comment -

          Thanks for the hard work on this.

          After applying the patch (to trunk) I noticed that the results for the pivot facet have changed from the example given here:
          http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1
          My results: http://pastebin.com/zs9rWMC5
          Is this expected behavior?

          Also I am getting "java.lang.String cannot be cast to org.apache.lucene.util.BytesRef" errors when sorting boolean fields.
          http://localhost:8983/solr/select?wt=json&q=*:*&fl=*, score&sort=inStock desc
          My results: http://pastebin.com/x9QnGfZA

          My environment: http://pastebin.com/AyMAwXD3

          Show
          AJ Lemke added a comment - Thanks for the hard work on this. After applying the patch (to trunk) I noticed that the results for the pivot facet have changed from the example given here: http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1 My results: http://pastebin.com/zs9rWMC5 Is this expected behavior? Also I am getting "java.lang.String cannot be cast to org.apache.lucene.util.BytesRef" errors when sorting boolean fields. http://localhost:8983/solr/select?wt=json&q=*:*&fl=* , score&sort=inStock desc My results: http://pastebin.com/x9QnGfZA My environment: http://pastebin.com/AyMAwXD3
          Hide
          Brett Lucey added a comment -

          AJ - It appears that the pivot output is different in trunk because the example data has changed. I manually checked it against the trunk example data and the counts you are seeing look to be correct. The wiki is just based on older example files.

          I tried your second query on a 4.2 box and a trunk box and neither of them had an exception thrown. Do you get that error with the same version of Solr without using this patch? If so, that might be something worth raising on the solr users list.

          Show
          Brett Lucey added a comment - AJ - It appears that the pivot output is different in trunk because the example data has changed. I manually checked it against the trunk example data and the counts you are seeing look to be correct. The wiki is just based on older example files. I tried your second query on a 4.2 box and a trunk box and neither of them had an exception thrown. Do you get that error with the same version of Solr without using this patch? If so, that might be something worth raising on the solr users list.
          Hide
          AJ Lemke added a comment - - edited

          Brett,

          When looking at the the output of the pivot, the data type has changed from a struct to an array.
          Is this the expected behavior or is the data type for each of the pivot facets to remain as a struct and I am experiencing a bug?

          Examples:
          4.7 pre patch:
          {
          field: "cat",
          value: "electronics",
          count: 12,
          pivot: [

          {field: "popularity",value: 7,count: 4}

          ,

          {field: "popularity",value: 6,count: 3}

          ,

          {field: "popularity",value: 1,count: 2}

          ,

          {field: "popularity",value: 0,count: 1}

          ,

          {field: "popularity",value: 5,count: 1}

          ,

          {field: "popularity",value: 10,count: 1}

          ]
          },

          Post Patch:
          [
          "field",
          "cat",
          "value",
          "electronics",
          "count",
          12,
          "pivot",
          [
          ["field","popularity","value",7,"count",4],
          ["field","popularity","value",6,"count",3],
          ["field","popularity","value",1,"count",2],
          ["field","popularity","value",0,"count",1],
          ["field","popularity","value",5,"count",1],
          ["field","popularity","value",10,"count",1]
          ]
          ],

          Edit: formatting.

          Show
          AJ Lemke added a comment - - edited Brett, When looking at the the output of the pivot, the data type has changed from a struct to an array. Is this the expected behavior or is the data type for each of the pivot facets to remain as a struct and I am experiencing a bug? Examples: 4.7 pre patch: { field: "cat", value: "electronics", count: 12, pivot: [ {field: "popularity",value: 7,count: 4} , {field: "popularity",value: 6,count: 3} , {field: "popularity",value: 1,count: 2} , {field: "popularity",value: 0,count: 1} , {field: "popularity",value: 5,count: 1} , {field: "popularity",value: 10,count: 1} ] }, Post Patch: [ "field", "cat", "value", "electronics", "count", 12, "pivot", [ ["field","popularity","value",7,"count",4] , ["field","popularity","value",6,"count",3] , ["field","popularity","value",1,"count",2] , ["field","popularity","value",0,"count",1] , ["field","popularity","value",5,"count",1] , ["field","popularity","value",10,"count",1] ] ], Edit: formatting.
          Hide
          Brett Lucey added a comment -

          Hi AJ,

          I just pulled the latest code from the repo, and when I patched the tip of the 4.7 branch with the SOLR-2894 patch, I am not able to reproduce this issue. (Revision 5981529c65d4ae671895948f43d8770daa58746b) This is the output I get using a pivot facet of cat,popularity. I used this url while running the example: http://localhost:8983/solr/select?wt=json&q=*:*&rows=0&facet=true&facet.pivot=cat,popularity

          [
          {
          "field": "cat",
          "value": "electronics",
          "count": 12,
          "pivot": [

          { "field": "popularity", "value": 7, "count": 4 }

          ,

          { "field": "popularity", "value": 6, "count": 3 }

          ,

          { "field": "popularity", "value": 1, "count": 2 }

          ,

          { "field": "popularity", "value": 0, "count": 1 }

          ,

          { "field": "popularity", "value": 5, "count": 1 }

          ,

          { "field": "popularity", "value": 10, "count": 1 }

          ]
          },
          ...
          ]

          Could you try with this same revision, applying only the SOLR-2894 patch and let me know if you see something different?

          -Brett

          Show
          Brett Lucey added a comment - Hi AJ, I just pulled the latest code from the repo, and when I patched the tip of the 4.7 branch with the SOLR-2894 patch, I am not able to reproduce this issue. (Revision 5981529c65d4ae671895948f43d8770daa58746b) This is the output I get using a pivot facet of cat,popularity. I used this url while running the example: http://localhost:8983/solr/select?wt=json&q=*:*&rows=0&facet=true&facet.pivot=cat,popularity [ { "field": "cat", "value": "electronics", "count": 12, "pivot": [ { "field": "popularity", "value": 7, "count": 4 } , { "field": "popularity", "value": 6, "count": 3 } , { "field": "popularity", "value": 1, "count": 2 } , { "field": "popularity", "value": 0, "count": 1 } , { "field": "popularity", "value": 5, "count": 1 } , { "field": "popularity", "value": 10, "count": 1 } ] }, ... ] Could you try with this same revision, applying only the SOLR-2894 patch and let me know if you see something different? -Brett
          Hide
          AJ Lemke added a comment -

          Hi Brett,

          I grabbed revision "5981529c65d4ae671895948f43d8770daa58746b" from the git repository and ran my tests.
          I am seeing still seeing the pivots as arrays rather than JSON objects.

          If I change numShards to 1 I see the the pivots as an array of objects.
          (http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=2&maxShardsPerNode=4)

              field: "cat",
              value: "electronics",
              count: 12,
              pivot: [
                  {field: "popularity",value: 7,count: 4},
                  {field: "popularity",value: 6,count: 3},
                  {field: "popularity",value: 1,count: 2},
                  {field: "popularity",value: 0,count: 1},
                  {field: "popularity",value: 5,count: 1},
                  {field: "popularity",value: 10,count: 1}
              ]
          

          If I change numShards to > 1 I see the the pivots as an array of arrays.
          (http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4)

              "field",
              "cat",
              "value",
              "electronics",
              "count",
              12,
              "pivot",
              [
                  ["field","popularity","value",7,"count",4],
                  ["field","popularity","value",6,"count",3],
                  ["field","popularity","value",1,"count",2],
                  ["field","popularity","value",0,"count",1],
                  ["field","popularity","value",5,"count",1],
                  ["field","popularity","value",10,"count",1]
              ]
          

          I am using Windows 7 as my test environment and starting with this batch file:

          Batch File
          echo Deleting Temp Data
          rmdir /s /q D:\dev\lucene-solr\solr\node1
          rmdir /s /q D:\dev\lucene-solr\solr\node2
          
          echo Copying the example folder
          xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node1
          xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node2
          
          echo removing the collection1 folder
          rmdir /s /q D:\dev\lucene-solr\solr\node1\solr\collection1
          rmdir /s /q D:\dev\lucene-solr\solr\node2\solr\collection1
          
          echo Starting Solr Processes
          cd D:\dev\lucene-solr\solr\node1
          start cmd /c java -DzkRun -jar start.jar
          sleep 10
          
          cd D:\dev\lucene-solr\solr\node2
          start cmd /c java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar
          
          echo All Solr Processes started
          
          sleep 10
          
          cd D:\dev\lucene-solr\solr\example\scripts\cloud-scripts
          cmd /c zkcli.bat -zkhost localhost:9983 -cmd upconfig -confdir D:\dev\lucene-solr\solr\example\solr\collection1\conf -confname collection1
          cmd /c zkcli.bat -zkhost localhost:9983 -cmd linkconfig -collection collection1 -confname collection1
          echo Zookeeper updated to contain collection1
          
          cmd /c curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4"
          cmd /c curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=collection1"
          
          sleep 10
          
          cd D:\dev\lucene-solr\solr\example\exampledocs
          cmd /c java -Dtype=text/xml -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.xml
          cmd /c java -Dtype=text/csv -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.csv
          cmd /c java -Dtype=application/json -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.json
          echo Collection1 has been bootstrapped and optimized
          

          Could you show me how you are starting yours?

          AJ

          Show
          AJ Lemke added a comment - Hi Brett, I grabbed revision "5981529c65d4ae671895948f43d8770daa58746b" from the git repository and ran my tests. I am seeing still seeing the pivots as arrays rather than JSON objects. If I change numShards to 1 I see the the pivots as an array of objects. ( http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=2&maxShardsPerNode=4 ) field: "cat" , value: "electronics" , count: 12, pivot: [ {field: "popularity" ,value: 7,count: 4}, {field: "popularity" ,value: 6,count: 3}, {field: "popularity" ,value: 1,count: 2}, {field: "popularity" ,value: 0,count: 1}, {field: "popularity" ,value: 5,count: 1}, {field: "popularity" ,value: 10,count: 1} ] If I change numShards to > 1 I see the the pivots as an array of arrays. ( http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4 ) "field" , "cat" , "value" , "electronics" , "count" , 12, "pivot" , [ [ "field" , "popularity" , "value" ,7, "count" ,4], [ "field" , "popularity" , "value" ,6, "count" ,3], [ "field" , "popularity" , "value" ,1, "count" ,2], [ "field" , "popularity" , "value" ,0, "count" ,1], [ "field" , "popularity" , "value" ,5, "count" ,1], [ "field" , "popularity" , "value" ,10, "count" ,1] ] I am using Windows 7 as my test environment and starting with this batch file: Batch File echo Deleting Temp Data rmdir /s /q D:\dev\lucene-solr\solr\node1 rmdir /s /q D:\dev\lucene-solr\solr\node2 echo Copying the example folder xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node1 xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node2 echo removing the collection1 folder rmdir /s /q D:\dev\lucene-solr\solr\node1\solr\collection1 rmdir /s /q D:\dev\lucene-solr\solr\node2\solr\collection1 echo Starting Solr Processes cd D:\dev\lucene-solr\solr\node1 start cmd /c java -DzkRun -jar start.jar sleep 10 cd D:\dev\lucene-solr\solr\node2 start cmd /c java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar echo All Solr Processes started sleep 10 cd D:\dev\lucene-solr\solr\example\scripts\cloud-scripts cmd /c zkcli.bat -zkhost localhost:9983 -cmd upconfig -confdir D:\dev\lucene-solr\solr\example\solr\collection1\conf -confname collection1 cmd /c zkcli.bat -zkhost localhost:9983 -cmd linkconfig -collection collection1 -confname collection1 echo Zookeeper updated to contain collection1 cmd /c curl "http: //localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4" cmd /c curl "http: //localhost:8983/solr/admin/collections?action=RELOAD&name=collection1" sleep 10 cd D:\dev\lucene-solr\solr\example\exampledocs cmd /c java -Dtype=text/xml -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.xml cmd /c java -Dtype=text/csv -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.csv cmd /c java -Dtype=application/json -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.json echo Collection1 has been bootstrapped and optimized Could you show me how you are starting yours? AJ
          Hide
          Brett Lucey added a comment - - edited

          Hi AJ,

          Thanks for all that info. It was super helpful and I've narrowed down the issue. If you'd like to fix it on your setup, you only need to make two changes to PivotFacetValue.java. At the top, add an import for org.apache.solr.common.util.SimpleOrderedMap, and in the convertToNamedList() function in that file, change this line:
          NamedList<Object> newList = new NamedList<Object>();
          to
          NamedList<Object> newList = new SimpleOrderedMap<Object>();

          I will be posting a new SOLR-2894 patch within the next day or two, but that patch will be for trunk and will likely not apply cleanly to the 4.7 branch. If you need this working in 4.7, make the above changes. If you are using trunk, then this fix will be incorporated into the upcoming patch.

          -Brett

          Show
          Brett Lucey added a comment - - edited Hi AJ, Thanks for all that info. It was super helpful and I've narrowed down the issue. If you'd like to fix it on your setup, you only need to make two changes to PivotFacetValue.java. At the top, add an import for org.apache.solr.common.util.SimpleOrderedMap, and in the convertToNamedList() function in that file, change this line: NamedList<Object> newList = new NamedList<Object>(); to NamedList<Object> newList = new SimpleOrderedMap<Object>(); I will be posting a new SOLR-2894 patch within the next day or two, but that patch will be for trunk and will likely not apply cleanly to the 4.7 branch. If you need this working in 4.7, make the above changes. If you are using trunk, then this fix will be incorporated into the upcoming patch. -Brett
          Hide
          Brett Lucey added a comment -

          I've uploaded a patch to include changes needed to patch against trunk. (Revision caccba783be7c9f4d7b25c992ed4c49e5a2bddf7). Additionally, this fixes the JSON output formatting issue discovered by AJ.

          Show
          Brett Lucey added a comment - I've uploaded a patch to include changes needed to patch against trunk. (Revision caccba783be7c9f4d7b25c992ed4c49e5a2bddf7). Additionally, this fixes the JSON output formatting issue discovered by AJ.
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Trey Grainger added a comment -

          After nearly 2 years of on-and-off development, I think this patch is finally ready to be committed. Brett's most recent patch includes significant performance improvements as well as fixes to all of the reported issues and edge cases mentioned by the others currently using this patch. We have just finished a large spike of work to get this ready for commit, so I'd love to get it pushed in soon unless there are any objections.

          Erik Hatcher, do you have any time to review this for suitability to be committed (since you are the reporter)? If there is anything additional that needs to be changed, I'll happily sign us up (either myself or someone on my team at CareerBuilder) to do it it will help.

          Show
          Trey Grainger added a comment - After nearly 2 years of on-and-off development, I think this patch is finally ready to be committed. Brett's most recent patch includes significant performance improvements as well as fixes to all of the reported issues and edge cases mentioned by the others currently using this patch. We have just finished a large spike of work to get this ready for commit, so I'd love to get it pushed in soon unless there are any objections. Erik Hatcher , do you have any time to review this for suitability to be committed (since you are the reporter)? If there is anything additional that needs to be changed, I'll happily sign us up (either myself or someone on my team at CareerBuilder) to do it it will help.
          Hide
          Elran Dvir added a comment -

          We have encountered a Java heap memory problem using distributed pivots – perhaps someone can shed some light on it.

          The scenario is as follows:
          We run Solr 4.4 (with patch for SOLR-2894) with 20 cores and the maximum java heap size is 1.5 GB.
          The following query with distributed facet pivot generates an out of memory exception:
          rows=0&
          q=:&
          facet=true&
          facet.pivot=f1,f2,f3,f4,f5&
          f.f1.facet.sort=count&
          f.f1.facet.limit=10&
          f.f1.facet.missing=true&
          f.f1.facet.mincount=1&
          f.f2.facet.sort=index&
          f.f2.facet.limit=-1&
          f.f2.facet.missing=true&
          f.f2.facet.mincount=1&
          f.f3.facet.sort=index&
          f.f3.facet.limit=-1&
          f.f3.facet.missing=true&
          f.f3.facet.mincount=1&
          f.f4.facet.sort=index&
          f.f4.facet.limit=-1&
          f.f4.facet.missing=true&
          f.f4.facet.mincount=1&
          f.f5.facet.sort=index&
          f.f5.facet.limit=-1&
          f.f5.facet.missing=true&
          f.f5.facet.mincount=1&
          shards=127.0.0.1:8983/solr/shard1,127.0.0.1:8983/solr/shard2

          Number of docs in each shard:
          shard1: 16,234
          shard2: 169,089

          These are the fields terms' distribution:
          f1: shard1 - 16,046, shard2 - 38
          f2: all shards - 232
          f3: all shards - 53
          f4: all shards - 6
          f5: all shards - 10

          When we use a maximum java heap size of 8GB, the query finishes. It seems about of 6GB is used for pivoting.
          It doesn’t seem reasonable that the facet.pivot on 2 cores with 200,000 docs requires that much memory.

          We tried looking into the code a little and it seems the sharded queries run with facet.pivot.mincount=-1 as part of the refinement process.
          We also noticed that in this scenario, the parameter skipRefinementAtThisLevel in the method queuePivotRefinementRequests in the class PivotFacetField is false.
          We think all of this is the cause of the memory consumption – but we couldn't pinpoint the underlying issue.

          Is there a way to alter the algorithm to consume less memory?
          If anyone can explain offline the way refinement works here, we would be happy to try and help resolve this.

          Thank you very much.

          Show
          Elran Dvir added a comment - We have encountered a Java heap memory problem using distributed pivots – perhaps someone can shed some light on it. The scenario is as follows: We run Solr 4.4 (with patch for SOLR-2894 ) with 20 cores and the maximum java heap size is 1.5 GB. The following query with distributed facet pivot generates an out of memory exception: rows=0& q= : & facet=true& facet.pivot=f1,f2,f3,f4,f5& f.f1.facet.sort=count& f.f1.facet.limit=10& f.f1.facet.missing=true& f.f1.facet.mincount=1& f.f2.facet.sort=index& f.f2.facet.limit=-1& f.f2.facet.missing=true& f.f2.facet.mincount=1& f.f3.facet.sort=index& f.f3.facet.limit=-1& f.f3.facet.missing=true& f.f3.facet.mincount=1& f.f4.facet.sort=index& f.f4.facet.limit=-1& f.f4.facet.missing=true& f.f4.facet.mincount=1& f.f5.facet.sort=index& f.f5.facet.limit=-1& f.f5.facet.missing=true& f.f5.facet.mincount=1& shards=127.0.0.1:8983/solr/shard1,127.0.0.1:8983/solr/shard2 Number of docs in each shard: shard1: 16,234 shard2: 169,089 These are the fields terms' distribution: f1: shard1 - 16,046, shard2 - 38 f2: all shards - 232 f3: all shards - 53 f4: all shards - 6 f5: all shards - 10 When we use a maximum java heap size of 8GB, the query finishes. It seems about of 6GB is used for pivoting. It doesn’t seem reasonable that the facet.pivot on 2 cores with 200,000 docs requires that much memory. We tried looking into the code a little and it seems the sharded queries run with facet.pivot.mincount=-1 as part of the refinement process. We also noticed that in this scenario, the parameter skipRefinementAtThisLevel in the method queuePivotRefinementRequests in the class PivotFacetField is false. We think all of this is the cause of the memory consumption – but we couldn't pinpoint the underlying issue. Is there a way to alter the algorithm to consume less memory? If anyone can explain offline the way refinement works here, we would be happy to try and help resolve this. Thank you very much.
          Hide
          Brett Lucey added a comment - - edited

          Hi Elran,

          Having a mincount of -1 for the shards is correct. The reason is that while a given shard may have a count lower than mincount for a given term, the aggregate total count for that value when combined with the other shards could exceed the mincount, so we do need to know about it. For example, consider a mincount of 10. If we have 3 shards with a count of 5 for a term of "Boston", we would still need to know about these because the total count would be 15, and would be higher than the mincount.

          I would expect the skipRefinementAtThisLevel to be false for the top level pivot facet, and true for each other level. Are you seeing otherwise?

          If you were to set a facet.limit of 10 for all levels of the pivot, what is the memory usage like?

          -Brett

          Show
          Brett Lucey added a comment - - edited Hi Elran, Having a mincount of -1 for the shards is correct. The reason is that while a given shard may have a count lower than mincount for a given term, the aggregate total count for that value when combined with the other shards could exceed the mincount, so we do need to know about it. For example, consider a mincount of 10. If we have 3 shards with a count of 5 for a term of "Boston", we would still need to know about these because the total count would be 15, and would be higher than the mincount. I would expect the skipRefinementAtThisLevel to be false for the top level pivot facet, and true for each other level. Are you seeing otherwise? If you were to set a facet.limit of 10 for all levels of the pivot, what is the memory usage like? -Brett
          Hide
          Mark Miller added a comment -

          We should get this in to get more feedback. Wish I had some time to tackle it, but I won't in the near term.

          Show
          Mark Miller added a comment - We should get this in to get more feedback. Wish I had some time to tackle it, but I won't in the near term.
          Hide
          Elran Dvir added a comment -

          Brett, thanks for your response.

          >>Having a mincount of -1 for the shards is correct. The reason is that while a given shard may have a count lower than mincount for a given term, the aggregate total count for that value >>when combined with the other shards could exceed the mincount, so we do need to know about it. For example, consider a mincount of 10. If we have 3 shards with a count of 5 for a term >>of "Boston", we would still need to know about these because the total count would be 15, and would be higher than the mincount.
          If mincount of 1 is asked for a field, couldn't it be more efficient? Is mincount of -1 necessary in this case?
          >>I would expect the skipRefinementAtThisLevel to be false for the top level pivot facet, and true for each other level. Are you seeing otherwise?
          No. You are right.
          >>If you were to set a facet.limit of 10 for all levels of the pivot, what is the memory usage like?
          The memory usage in this case is about 200 MB.

          Thanks again.

          Show
          Elran Dvir added a comment - Brett, thanks for your response. >>Having a mincount of -1 for the shards is correct. The reason is that while a given shard may have a count lower than mincount for a given term, the aggregate total count for that value >>when combined with the other shards could exceed the mincount, so we do need to know about it. For example, consider a mincount of 10. If we have 3 shards with a count of 5 for a term >>of "Boston", we would still need to know about these because the total count would be 15, and would be higher than the mincount. If mincount of 1 is asked for a field, couldn't it be more efficient? Is mincount of -1 necessary in this case? >>I would expect the skipRefinementAtThisLevel to be false for the top level pivot facet, and true for each other level. Are you seeing otherwise? No. You are right. >>If you were to set a facet.limit of 10 for all levels of the pivot, what is the memory usage like? The memory usage in this case is about 200 MB. Thanks again.
          Hide
          Brett Lucey added a comment - - edited

          Andrew actually raised that question to me yesterday as well and I spent a little bit of time looking into it. For the initial request to a shard, we only lower the mincount to 0 if the facet limit is set to something other than -1. If the facet limit is -1, we lower the mincount to 1. In your case, this would the limit would be 10 for the top level pivot, so we know we will (at most) get back 15 terms from each shard in this case. Because we are only faceting on a limited number of terms, having a mincount of 0 here provides us the benefit of potentially avoiding refinement. In refinement requests, we still need to know when a shard has responded to us with it's count for a term, so the mincount is -1 in that case because we are interested in the term even if the count is zero. It allows us to mark the shard as having responded and continue on. It's possible that we might be able to change this, but at the point of refinement, it's a rather targeted request so I don't expect there to be a significant benefit to doing so. In your case, with the facet limit being -1 on f2-f5, no refinement would be performed anyway.

          When we designed this implementation, the most important factor for us was speed, and we were willing to get it at a cost of memory. By making these changes, we reduced queries which previously took around 70 seconds for us down to around 600 milliseconds. I suspect that the biggest factor in the poor memory utilization is the wide open nature of using a facet.limit of -1, especially on a pivot so deep. Keep in mind that for each level of depth you add to a pivot, memory and time required will grow exponentially.

          Don't forget that if you are querying a node and all of the shards are located within the same Java VM, you are incurring the memory cost of both shards plus the node responding to the user query all within the same heap.

          I took a quick look at the code today while waiting for some other processes to finish, and I don't see any obvious low hanging fruit to free up memory.

          Show
          Brett Lucey added a comment - - edited Andrew actually raised that question to me yesterday as well and I spent a little bit of time looking into it. For the initial request to a shard, we only lower the mincount to 0 if the facet limit is set to something other than -1. If the facet limit is -1, we lower the mincount to 1. In your case, this would the limit would be 10 for the top level pivot, so we know we will (at most) get back 15 terms from each shard in this case. Because we are only faceting on a limited number of terms, having a mincount of 0 here provides us the benefit of potentially avoiding refinement. In refinement requests, we still need to know when a shard has responded to us with it's count for a term, so the mincount is -1 in that case because we are interested in the term even if the count is zero. It allows us to mark the shard as having responded and continue on. It's possible that we might be able to change this, but at the point of refinement, it's a rather targeted request so I don't expect there to be a significant benefit to doing so. In your case, with the facet limit being -1 on f2-f5, no refinement would be performed anyway. When we designed this implementation, the most important factor for us was speed, and we were willing to get it at a cost of memory. By making these changes, we reduced queries which previously took around 70 seconds for us down to around 600 milliseconds. I suspect that the biggest factor in the poor memory utilization is the wide open nature of using a facet.limit of -1, especially on a pivot so deep. Keep in mind that for each level of depth you add to a pivot, memory and time required will grow exponentially. Don't forget that if you are querying a node and all of the shards are located within the same Java VM, you are incurring the memory cost of both shards plus the node responding to the user query all within the same heap. I took a quick look at the code today while waiting for some other processes to finish, and I don't see any obvious low hanging fruit to free up memory.
          Hide
          Trey Grainger added a comment -

          >>Mark Miller said:
          >>We should get this in to get more feedback. Wish I had some time to tackle it, but I won't in the near term.

          Is there a committer who has interest in this issue and would be willing to look over it for (hopefully) getting it pushed into trunk? It's the top voted for and the top watched issue in Solr right now, so there's clearly a lot of community interest. Thanks!

          Show
          Trey Grainger added a comment - >> Mark Miller said: >>We should get this in to get more feedback. Wish I had some time to tackle it, but I won't in the near term. Is there a committer who has interest in this issue and would be willing to look over it for (hopefully) getting it pushed into trunk? It's the top voted for and the top watched issue in Solr right now, so there's clearly a lot of community interest. Thanks!
          Hide
          Otis Gospodnetic added a comment -

          If nobody volunteers, since this doesn't change existing behaviour (right?) I suggest just committing it and improving it from there once people start using it and suggesting improvements/bug fixes.

          Show
          Otis Gospodnetic added a comment - If nobody volunteers, since this doesn't change existing behaviour (right?) I suggest just committing it and improving it from there once people start using it and suggesting improvements/bug fixes.
          Hide
          Trey Grainger added a comment - - edited

          Hi Otis, I appreciate your interest here. That's correct: no previously working behavior was changed, and there are two things added with this patch: 1) distributed support, and 2) support for a single-level pivot facets (this previously threw an exception but is now supported: facet.pivot=aSingleFieldName).

          For context on #2, we found no good reason to disallow a single-level pivot facet (functions like a field facet but with the pivot facet output format), it made implementing distributed pivot faceting easier since a single level could be considered when refining, and there was work in some downstream issues like SOLR-3583 (adding percentiles and other stats to pivot facets) which was dependent upon being able to easily alternate between any number of facet levels for analytics purposes, so we just added the support for a single level. This also makes it easier to build analytics tools without having to arbitrarily alternate between field facets and pivot facets and their corresponding output formats based upon the number of levels.

          The end result is that no previously working capabilities have been modified, but distributed support for any number of pivot levels has been added, which should make this safe to commit to trunk.

          Show
          Trey Grainger added a comment - - edited Hi Otis, I appreciate your interest here. That's correct: no previously working behavior was changed, and there are two things added with this patch: 1) distributed support, and 2) support for a single-level pivot facets (this previously threw an exception but is now supported: facet.pivot=aSingleFieldName). For context on #2, we found no good reason to disallow a single-level pivot facet (functions like a field facet but with the pivot facet output format), it made implementing distributed pivot faceting easier since a single level could be considered when refining, and there was work in some downstream issues like SOLR-3583 (adding percentiles and other stats to pivot facets) which was dependent upon being able to easily alternate between any number of facet levels for analytics purposes, so we just added the support for a single level. This also makes it easier to build analytics tools without having to arbitrarily alternate between field facets and pivot facets and their corresponding output formats based upon the number of levels. The end result is that no previously working capabilities have been modified, but distributed support for any number of pivot levels has been added, which should make this safe to commit to trunk.
          Hide
          Mark Miller added a comment -

          I have not worked on faceting code in the past, so this is really not my area.

          However, here is a patch I just worked up to apply against 5x. I had to make some small changes - DateField is deprecated and there was an ndate field in the tests that could not be found. I removed it in this patch. I also fixed a few issues around licenses and formatting - this patch passes precommit except for a nocommit at the DateField change - someone should review if that change has any other ramifications. All tests pass.

          This code does touch some existing faceting code in a way that demands a deeper review I think, but until I have a lot more time, I'm not the man for that job. Perhaps Chris Hostetter?

          Show
          Mark Miller added a comment - I have not worked on faceting code in the past, so this is really not my area. However, here is a patch I just worked up to apply against 5x. I had to make some small changes - DateField is deprecated and there was an ndate field in the tests that could not be found. I removed it in this patch. I also fixed a few issues around licenses and formatting - this patch passes precommit except for a nocommit at the DateField change - someone should review if that change has any other ramifications. All tests pass. This code does touch some existing faceting code in a way that demands a deeper review I think, but until I have a lot more time, I'm not the man for that job. Perhaps Chris Hostetter ?
          Hide
          Hoss Man added a comment -

          I'm going to spend some time this week reviewing the state of things.

          First up, some minor tweaks to the latest patch...

          • fixed a typo in TestDistributedSearch ("facet.fiedl" -> "facet.field")
            • this is my biggest anoyance about most of our existing distributed serach tests – they just assert that queries return the same thing as single node tests, but don't assert anything about the response, so mistakes in the input, or mistakes in indexing hte docs, resulting in a useless test aren't caught)
            • this also relates to marks comment about removing "ndate" since that field no longer exists in the test configs - using tdate_a & tdate_b here should be fine
          • removed the "nocommit" mark mentioned regarding DateField - that method moved to TrieDateField so his fix is correct.

          Some comments/questions based on what I've reviewed so far (note: many of these comments/questions come from a place of genuine ignorance since i've only reviewed about 30% of the patch so far)...

          • even at a glance, it's obvious the SimpleFacets changes are a simple refactoring and totally fine.
          • In FacetComponent - Setting asside the core pivot facet changes...
            • Most of the other changes in seem like straight forward (and much needed!) variable renaming (+1) to help eliminate ambiguity between the existing field faceting refinement and the new pivot faceting refinement
            • the new "fieldsToOverRequestOn" Map confuses me in a few ways...
              • As is, i don't understand why this is a Map and not a Set.
              • Some odd conditional logic is used when iterating over this "Set" to determine the overrequest limit - i'm still trying to wrap my head arround this but in particular the comment // dff has the info, the params have been scrubbed confuses me – where are these params "scrubbed" ?
              • I like these new explict overrequest count/ratio params, and i get that the end-game here is that they can be used to affect the amount of overrequest done for both fact.field and facet.pivot – but i'm not understanding the value of building up this "fieldsToOverRequestOn" set of names (for every shard request) and then iterating over it and consulting either the DFF or the params to decide which limit value to use on the shard requests, and then (conditionally?) removing the limit/offset/overrequest params from the shard requests. Wouldn't it be simplier to have modifyRequest always remove the limit/offset/overrequest params from the shard params, and then have the individual code paths (for both facet.field & facet.pivot) take responsibility for adding back the new limit params based on the overrequest calculations using the original request params (ie: rb.req.getParams()).
              • My chief concern here being that (at first glance) this change seems like it adds a small amount of overhead to the overrequest limit calculations, and makes this bit of code more confusing, w/o any obvious (to me) advantage.
          • I don't yet understand the need for the new "PURPOSE_REFINE_PIVOT_FACETS" stage of shard requests? ... can someone clarify why pivot facets can't just be refined during the existing "PURPOSE_REFINE_FACETS" stage?
          • I notice that the new DistributedFacetPivotTest directly extends BaseDistributedSearchTestCase and uses a fixed shard count, and indexes some docs directly to certain clients
            • is there something about the functionality (or about the test) that requires certain data locality (ie: certain docs on same shard) to work?
            • if not: is there any other reason we can't switch this over to a Cloud based test with a variable numbers of shards and compairons against the control collection?
          Show
          Hoss Man added a comment - I'm going to spend some time this week reviewing the state of things. First up, some minor tweaks to the latest patch... fixed a typo in TestDistributedSearch ("facet.fiedl" -> "facet.field") this is my biggest anoyance about most of our existing distributed serach tests – they just assert that queries return the same thing as single node tests, but don't assert anything about the response, so mistakes in the input, or mistakes in indexing hte docs, resulting in a useless test aren't caught) this also relates to marks comment about removing "ndate" since that field no longer exists in the test configs - using tdate_a & tdate_b here should be fine removed the "nocommit" mark mentioned regarding DateField - that method moved to TrieDateField so his fix is correct. Some comments/questions based on what I've reviewed so far (note: many of these comments/questions come from a place of genuine ignorance since i've only reviewed about 30% of the patch so far)... even at a glance, it's obvious the SimpleFacets changes are a simple refactoring and totally fine. In FacetComponent - Setting asside the core pivot facet changes... Most of the other changes in seem like straight forward (and much needed!) variable renaming (+1) to help eliminate ambiguity between the existing field faceting refinement and the new pivot faceting refinement the new "fieldsToOverRequestOn" Map confuses me in a few ways... As is, i don't understand why this is a Map and not a Set. Some odd conditional logic is used when iterating over this "Set" to determine the overrequest limit - i'm still trying to wrap my head arround this but in particular the comment // dff has the info, the params have been scrubbed confuses me – where are these params "scrubbed" ? I like these new explict overrequest count/ratio params, and i get that the end-game here is that they can be used to affect the amount of overrequest done for both fact.field and facet.pivot – but i'm not understanding the value of building up this "fieldsToOverRequestOn" set of names (for every shard request) and then iterating over it and consulting either the DFF or the params to decide which limit value to use on the shard requests, and then (conditionally?) removing the limit/offset/overrequest params from the shard requests. Wouldn't it be simplier to have modifyRequest always remove the limit/offset/overrequest params from the shard params, and then have the individual code paths (for both facet.field & facet.pivot) take responsibility for adding back the new limit params based on the overrequest calculations using the original request params (ie: rb.req.getParams() ). My chief concern here being that (at first glance) this change seems like it adds a small amount of overhead to the overrequest limit calculations, and makes this bit of code more confusing, w/o any obvious (to me) advantage. I don't yet understand the need for the new "PURPOSE_REFINE_PIVOT_FACETS" stage of shard requests? ... can someone clarify why pivot facets can't just be refined during the existing "PURPOSE_REFINE_FACETS" stage? I notice that the new DistributedFacetPivotTest directly extends BaseDistributedSearchTestCase and uses a fixed shard count, and indexes some docs directly to certain clients is there something about the functionality (or about the test) that requires certain data locality (ie: certain docs on same shard) to work? if not: is there any other reason we can't switch this over to a Cloud based test with a variable numbers of shards and compairons against the control collection?
          Hide
          Brett Lucey added a comment -

          I spoke with Andrew regarding the tests. The test does do some Query() testing to verify that the output of distributed and single node is the same, however further down it does to explicit testing. He said it's important that we don't use the variable number of shards in this particular because it exercises a number of specific sharding situations to ensure that we get the correct answers.

          Looking at fieldsToOverRequestOn, I think you are correct. It looks like this probably could be a set. I suspect this is an artifact of multiple pass-throughs at the implementation of this feature. I will spend some time on this on Thursday to see if I can clean this area up and make it a little more straight forward.

          Regarding PURPOSE_REFINE_PIVOT_FACETS: If it was truly needed, we probably could use a single purpose at a very slight cost. Keep in mind that the purposes are OR'd together, so they should happen in the same shard request. Having two different purposes allows us to call refineFacets and/or refinePivotFacets only as needed to avoid looping through the shard responses an extra time. If your comment is more to the effect that the loop in refineFacets() and refinePivotFacets() could be merged into a single loop – that's probably true and if you feel it's a better route to go, let me know and I can work on that change.

          Show
          Brett Lucey added a comment - I spoke with Andrew regarding the tests. The test does do some Query() testing to verify that the output of distributed and single node is the same, however further down it does to explicit testing. He said it's important that we don't use the variable number of shards in this particular because it exercises a number of specific sharding situations to ensure that we get the correct answers. Looking at fieldsToOverRequestOn, I think you are correct. It looks like this probably could be a set. I suspect this is an artifact of multiple pass-throughs at the implementation of this feature. I will spend some time on this on Thursday to see if I can clean this area up and make it a little more straight forward. Regarding PURPOSE_REFINE_PIVOT_FACETS: If it was truly needed, we probably could use a single purpose at a very slight cost. Keep in mind that the purposes are OR'd together, so they should happen in the same shard request. Having two different purposes allows us to call refineFacets and/or refinePivotFacets only as needed to avoid looping through the shard responses an extra time. If your comment is more to the effect that the loop in refineFacets() and refinePivotFacets() could be merged into a single loop – that's probably true and if you feel it's a better route to go, let me know and I can work on that change.
          Hide
          Hoss Man added a comment -

          He said it's important that we don't use the variable number of shards in this particular because it exercises a number of specific sharding situations to ensure that we get the correct answers.

          that makes complete sense – i just wanted to make sure it's a test requirement and not a feature requirement. We should definitely comment the test to that effect, and add a more randomized cloud based test with some dynamic shard assignment as well to try and help catch strange edge cases. I'll try to work on that to help me better understand the feature (in general, it's been a long time since i looved at pivot faceting)

          I will spend some time on this on Thursday to see if I can clean this area up and make it a little more straight forward.

          that would be great, thanks. The Map vs Set aspect would be a trivial improvement, but in general i'm more concerned about the odd flow – it seems like something that could easily bite us in the ass later when people try to maintain it. It seems like it would be a lot more straight forward to:

          • have modifyRequest unconditionally remove all limit/offset/overrequest params from the shard requests
          • have a simple "getOverrequestLimitAmount(String fieldName)" method in the FacetInfo class (that consults the original request params)
          • have each code path (facet.field & facet.pivot & whatever down the road...) that sets params on the shard request and cares about over requesting call sreq.params.set(pre + FACET_LIMIT, getOverrequestLimitAmount(fieldName))

          Having two different purposes allows us to call refineFacets and/or refinePivotFacets only as needed to avoid looping through the shard responses an extra time. If your comment is more to the effect that the loop in refineFacets() and refinePivotFacets() could be merged into a single loop...

          I don't have an opinion, i just wanted to understand the purpose of the new "PURPOSE" (heh) since we don't have a special one for range facets or query facets – but in hindsight i realize that of course we don't: those don't need refinement.

          Show
          Hoss Man added a comment - He said it's important that we don't use the variable number of shards in this particular because it exercises a number of specific sharding situations to ensure that we get the correct answers. that makes complete sense – i just wanted to make sure it's a test requirement and not a feature requirement. We should definitely comment the test to that effect, and add a more randomized cloud based test with some dynamic shard assignment as well to try and help catch strange edge cases. I'll try to work on that to help me better understand the feature (in general, it's been a long time since i looved at pivot faceting) I will spend some time on this on Thursday to see if I can clean this area up and make it a little more straight forward. that would be great, thanks. The Map vs Set aspect would be a trivial improvement, but in general i'm more concerned about the odd flow – it seems like something that could easily bite us in the ass later when people try to maintain it. It seems like it would be a lot more straight forward to: have modifyRequest unconditionally remove all limit/offset/overrequest params from the shard requests have a simple "getOverrequestLimitAmount(String fieldName)" method in the FacetInfo class (that consults the original request params) have each code path (facet.field & facet.pivot & whatever down the road...) that sets params on the shard request and cares about over requesting call sreq.params.set(pre + FACET_LIMIT, getOverrequestLimitAmount(fieldName)) Having two different purposes allows us to call refineFacets and/or refinePivotFacets only as needed to avoid looping through the shard responses an extra time. If your comment is more to the effect that the loop in refineFacets() and refinePivotFacets() could be merged into a single loop... I don't have an opinion, i just wanted to understand the purpose of the new "PURPOSE" (heh) since we don't have a special one for range facets or query facets – but in hindsight i realize that of course we don't: those don't need refinement.
          Hide
          Brett Lucey added a comment -

          I don't have an opinion, i just wanted to understand the purpose of the new "PURPOSE" (heh) since we don't have a special one for range facets or query facets – but in hindsight i realize that of course we don't: those don't need refinement.

          Primarily it's just for performance. Field facet refinement should only occur once, but pivot facet refinement may occur a number of times due to the tiering.

          Show
          Brett Lucey added a comment - I don't have an opinion, i just wanted to understand the purpose of the new "PURPOSE" (heh) since we don't have a special one for range facets or query facets – but in hindsight i realize that of course we don't: those don't need refinement. Primarily it's just for performance. Field facet refinement should only occur once, but pivot facet refinement may occur a number of times due to the tiering.
          Hide
          Hoss Man added a comment -

          I still haven't had a chance to really dig into the implementation details of the patch, but i wanted to spend some time testing things out from a user perspective...


          One of the first things i noticed, is that the refinement requests seem to be extra verbose. For example, given this user request (using the example data, with a 2 shard cloud setup):

          http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,inStock&facet.limit=3&facet.pivot=manu_id_s,inStock

          This is what the refinement requests in the logs of each shard looked like...

          3434041 [qtp1282186295-19] INFO  org.apache.solr.core.SolrCore  – [collection1] webapp=/solr path=/select params={manu_id_s,inStock_8__terms=samsung&facet=true&sort=id+desc&facet.limit=3&manu_id_s,inStock_9__terms=viewsonic&distrib=false&cat,inStock_1__terms=search&wt=javabin&version=2&rows=0&manu_id_s,inStock_6__terms=maxtor&manu_id_s,inStock_7__terms=nor&NOW=1399584452682&shard.url=http://127.0.1.1:8983/solr/collection1/&df=text&cat,inStock_2__terms=software&q=*:*&manu_id_s,inStock_3__terms=canon&manu_id_s,inStock_4__terms=ati&facet.pivot.mincount=-1&isShard=true&cat,inStock_0__terms=hard+drive&facet.pivot={!terms%3D$cat,inStock_0__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_1__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_2__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_3__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_4__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_5__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_6__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_7__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_8__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_9__terms}manu_id_s,inStock&manu_id_s,inStock_5__terms=eu} hits=14 status=0 QTime=3 
          
          3424918 [qtp1282186295-16] INFO  org.apache.solr.core.SolrCore  – [collection1] webapp=/solr path=/select params={cat,inStock_10__terms=memory&manu_id_s,inStock_15__terms=dell&facet=true&manu_id_s,inStock_12__terms=apple&sort=id+desc&facet.limit=3&manu_id_s,inStock_13__terms=asus&manu_id_s,inStock_16__terms=uk&distrib=false&wt=javabin&manu_id_s,inStock_14__terms=boa&version=2&rows=0&NOW=1399584452682&shard.url=http://127.0.1.1:7574/solr/collection1/&df=text&manu_id_s,inStock_11__terms=corsair&q=*:*&facet.pivot.mincount=-1&isShard=true&facet.pivot={!terms%3D$cat,inStock_10__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_11__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_12__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_13__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_14__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_15__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_16__terms}manu_id_s,inStock} hits=18 status=0 QTime=2 
          

          Or if we prune that down to just the interesting params (as far as pivot faceting goes)...

          shard1
          facet.pivot.mincount=-1
          cat,inStock_0__terms=hard+drive
          cat,inStock_1__terms=search
          cat,inStock_2__terms=software
          manu_id_s,inStock_3__terms=canon
          manu_id_s,inStock_4__terms=ati
          manu_id_s,inStock_5__terms=eu
          manu_id_s,inStock_6__terms=maxtor
          manu_id_s,inStock_7__terms=nor
          manu_id_s,inStock_8__terms=samsung
          manu_id_s,inStock_9__terms=viewsonic
          facet.pivot={!terms=$cat,inStock_0__terms}cat,inStock
          facet.pivot={!terms=$cat,inStock_1__terms}cat,inStock
          facet.pivot={!terms=$cat,inStock_2__terms}cat,inStock
          facet.pivot={!terms=$manu_id_s,inStock_3__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_4__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_5__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_6__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_7__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_8__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_9__terms}manu_id_s,inStock
          
          shard2
          facet.pivot.mincount=-1
          cat,inStock_10__terms=memory
          manu_id_s,inStock_11__terms=corsair
          manu_id_s,inStock_12__terms=apple
          manu_id_s,inStock_13__terms=asus
          manu_id_s,inStock_14__terms=boa
          manu_id_s,inStock_15__terms=dell
          manu_id_s,inStock_16__terms=uk
          facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock
          facet.pivot={!terms=$manu_id_s,inStock_11__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_12__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_13__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_14__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_15__terms}manu_id_s,inStock
          facet.pivot={!terms=$manu_id_s,inStock_16__terms}manu_id_s,inStock
          

          I believe that what's going on here is basically:

          • top level params are being used for the individual terms that need refined (which is smart, helps eliminate risk of terms needing special escaping with local params)
          • the top level param names for these terms that need refined use a per-(user)request "global" counter to ensure that they are unique (+1)
          • the top level term param names also include the facet.pivot spec they are needed for – this seems redundant since the counter is clearly global (even across multiple "facet.pivot" specs)
          • these top leve term param names are then added only to the shard requests where refinement is actually needed for those terms (+1) and are referenced as variables in facet.pivot commands using the "terms" local param (which the shards evidently look for to know when this is a refinement request)
          • because many terms may need refinement, that means each user specified facet.pivot=X,Y param results in many shard params of facet.pivot={!terms=$N}X,Y

          I realize that local params don't play nice with multi-valued params at all, let alone make it easy to use a single variable to refer to a multi-valued param – But wouldn't it be simpler (and less verbose over the wire) to just ignore Solr's built in param variable derefrencing and instead generate 1 unique param name to use for all the terms we care about (for each unique pivot spec), and then refer to that name once in a local param for a single facet.pivot param (which the pivot facet could would then go and explicitly fetch from the top level SolrParams as a multi-value)

          The result being, that instead of the refinement requests shows above, the refinement requests for each shard could be something much simpler like...

          shard1_proposed
          facet.pivot.mincount=-1
          _fpt_1=hard+drive
          _fpt_1=search
          _fpt_1=software
          _fpt_2=canon
          _fpt_2=ati
          _fpt_2=eu
          _fpt_2=maxtor
          _fpt_2=nor
          _fpt_2=samsung
          _fpt_2=viewsonic
          facet.pivot={!fpt=1}cat,inStock
          facet.pivot={!fpt=2}manu_id_s,inStock
          
          shard2_proposed
          facet.pivot.mincount=-1
          _fpt_1=memory
          _fpt_2=corsair
          _fpt_2=apple
          _fpt_2=asus
          _fpt_2=boa
          _fpt_2=dell
          _fpt_2=uk
          facet.pivot={!fpt=1}cat,inStock
          facet.pivot={!fpt=2}manu_id_s,inStock
          

          (where _fpt_ is just a short prefix for "facet pivot terms" that i pulled out of my ass)


          Another thing I noticed is that with my 2 shard exampledocs setup, the following URL seems to send the pivot faceting into an infinite loop of refinement requests (note the typo: there's a space embeded in a field name manu_id_s != manu_+id_s )...

          http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3

          ...not clear what's going on there, but definitely something that needs fixed before committing ("garbage in -> garbage out" is one thing, "garbage in -> crash your cluster" is another)


          the multi-level refinement is sooooooo sweet.

          Show
          Hoss Man added a comment - I still haven't had a chance to really dig into the implementation details of the patch, but i wanted to spend some time testing things out from a user perspective... One of the first things i noticed, is that the refinement requests seem to be extra verbose. For example, given this user request (using the example data, with a 2 shard cloud setup): http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,inStock&facet.limit=3&facet.pivot=manu_id_s,inStock This is what the refinement requests in the logs of each shard looked like... 3434041 [qtp1282186295-19] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={manu_id_s,inStock_8__terms=samsung&facet=true&sort=id+desc&facet.limit=3&manu_id_s,inStock_9__terms=viewsonic&distrib=false&cat,inStock_1__terms=search&wt=javabin&version=2&rows=0&manu_id_s,inStock_6__terms=maxtor&manu_id_s,inStock_7__terms=nor&NOW=1399584452682&shard.url=http://127.0.1.1:8983/solr/collection1/&df=text&cat,inStock_2__terms=software&q=*:*&manu_id_s,inStock_3__terms=canon&manu_id_s,inStock_4__terms=ati&facet.pivot.mincount=-1&isShard=true&cat,inStock_0__terms=hard+drive&facet.pivot={!terms%3D$cat,inStock_0__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_1__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_2__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_3__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_4__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_5__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_6__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_7__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_8__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_9__terms}manu_id_s,inStock&manu_id_s,inStock_5__terms=eu} hits=14 status=0 QTime=3 3424918 [qtp1282186295-16] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={cat,inStock_10__terms=memory&manu_id_s,inStock_15__terms=dell&facet=true&manu_id_s,inStock_12__terms=apple&sort=id+desc&facet.limit=3&manu_id_s,inStock_13__terms=asus&manu_id_s,inStock_16__terms=uk&distrib=false&wt=javabin&manu_id_s,inStock_14__terms=boa&version=2&rows=0&NOW=1399584452682&shard.url=http://127.0.1.1:7574/solr/collection1/&df=text&manu_id_s,inStock_11__terms=corsair&q=*:*&facet.pivot.mincount=-1&isShard=true&facet.pivot={!terms%3D$cat,inStock_10__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_11__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_12__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_13__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_14__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_15__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_16__terms}manu_id_s,inStock} hits=18 status=0 QTime=2 Or if we prune that down to just the interesting params (as far as pivot faceting goes)... shard1 facet.pivot.mincount=-1 cat,inStock_0__terms=hard+drive cat,inStock_1__terms=search cat,inStock_2__terms=software manu_id_s,inStock_3__terms=canon manu_id_s,inStock_4__terms=ati manu_id_s,inStock_5__terms=eu manu_id_s,inStock_6__terms=maxtor manu_id_s,inStock_7__terms=nor manu_id_s,inStock_8__terms=samsung manu_id_s,inStock_9__terms=viewsonic facet.pivot={!terms=$cat,inStock_0__terms}cat,inStock facet.pivot={!terms=$cat,inStock_1__terms}cat,inStock facet.pivot={!terms=$cat,inStock_2__terms}cat,inStock facet.pivot={!terms=$manu_id_s,inStock_3__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_4__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_5__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_6__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_7__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_8__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_9__terms}manu_id_s,inStock shard2 facet.pivot.mincount=-1 cat,inStock_10__terms=memory manu_id_s,inStock_11__terms=corsair manu_id_s,inStock_12__terms=apple manu_id_s,inStock_13__terms=asus manu_id_s,inStock_14__terms=boa manu_id_s,inStock_15__terms=dell manu_id_s,inStock_16__terms=uk facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock facet.pivot={!terms=$manu_id_s,inStock_11__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_12__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_13__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_14__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_15__terms}manu_id_s,inStock facet.pivot={!terms=$manu_id_s,inStock_16__terms}manu_id_s,inStock I believe that what's going on here is basically: top level params are being used for the individual terms that need refined (which is smart, helps eliminate risk of terms needing special escaping with local params) the top level param names for these terms that need refined use a per-(user)request "global" counter to ensure that they are unique (+1) the top level term param names also include the facet.pivot spec they are needed for – this seems redundant since the counter is clearly global (even across multiple "facet.pivot" specs) these top leve term param names are then added only to the shard requests where refinement is actually needed for those terms (+1) and are referenced as variables in facet.pivot commands using the "terms" local param (which the shards evidently look for to know when this is a refinement request) because many terms may need refinement, that means each user specified facet.pivot=X,Y param results in many shard params of facet.pivot={!terms=$N}X,Y I realize that local params don't play nice with multi-valued params at all, let alone make it easy to use a single variable to refer to a multi-valued param – But wouldn't it be simpler (and less verbose over the wire) to just ignore Solr's built in param variable derefrencing and instead generate 1 unique param name to use for all the terms we care about (for each unique pivot spec), and then refer to that name once in a local param for a single facet.pivot param (which the pivot facet could would then go and explicitly fetch from the top level SolrParams as a multi-value) The result being, that instead of the refinement requests shows above, the refinement requests for each shard could be something much simpler like... shard1_proposed facet.pivot.mincount=-1 _fpt_1=hard+drive _fpt_1=search _fpt_1=software _fpt_2=canon _fpt_2=ati _fpt_2=eu _fpt_2=maxtor _fpt_2=nor _fpt_2=samsung _fpt_2=viewsonic facet.pivot={!fpt=1}cat,inStock facet.pivot={!fpt=2}manu_id_s,inStock shard2_proposed facet.pivot.mincount=-1 _fpt_1=memory _fpt_2=corsair _fpt_2=apple _fpt_2=asus _fpt_2=boa _fpt_2=dell _fpt_2=uk facet.pivot={!fpt=1}cat,inStock facet.pivot={!fpt=2}manu_id_s,inStock (where _fpt_ is just a short prefix for "facet pivot terms" that i pulled out of my ass) Another thing I noticed is that with my 2 shard exampledocs setup, the following URL seems to send the pivot faceting into an infinite loop of refinement requests (note the typo: there's a space embeded in a field name manu_id_s != manu_+id_s )... http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3 ...not clear what's going on there, but definitely something that needs fixed before committing ("garbage in -> garbage out" is one thing, "garbage in -> crash your cluster" is another) the multi-level refinement is sooooooo sweet.
          Hide
          Hoss Man added a comment -

          Something's wonky with the way mincount is handled - if you run the attached "pivot_mincount_problem.sh" script while a 2 node cluster is running with th example configs you can see the problem by comparing these 3 URLs...

          According to the "Pivot1" URL, there are 4244 total docs matching the query, of those 586 match multi_50_ss:35 and of those 13 match single_100_s:79

          This all jives with what the "Filter" URL tells us (where we ignore the pivot facets and just apply those as filters)

          But if we add facet.pivot.mincount=10 to the original pivot request to get the "Pivot2" URL, no values for single_100_s make the cut as sub-facets of the 586 multi_50_ss:35 docs.

          Looking at the logs of the shard queries, it appears that facet.pivot.mincount=-1 is set only on the refinement queries, but non in the initial sub-shard queries (where the limit over requesting happens to find the top terms). So terms that don't match above the mincount on at least one single shard won't be considered at all for the cumulative total.

          Show
          Hoss Man added a comment - Something's wonky with the way mincount is handled - if you run the attached "pivot_mincount_problem.sh" script while a 2 node cluster is running with th example configs you can see the problem by comparing these 3 URLs... Pivot1: http://localhost:8983/solr/select?rows=0&wt=json&indent=true&q=single_7_s:%284%205%206%29&facet=true&facet.pivot=multi_50_ss,single_100_s&facet.limit=10 Filter: http://localhost:8983/solr/select?rows=0&wt=json&indent=true&q=single_7_s:%284%205%206%29&fq=multi_50_ss:35&fq=single_100_s:79 Pivot2: http://localhost:8983/solr/select?rows=0&wt=json&indent=true&q=single_7_s:%284%205%206%29&facet=true&facet.pivot=multi_50_ss,single_100_s&facet.limit=10&facet.pivot.mincount=10 According to the "Pivot1" URL, there are 4244 total docs matching the query, of those 586 match multi_50_ss:35 and of those 13 match single_100_s:79 This all jives with what the "Filter" URL tells us (where we ignore the pivot facets and just apply those as filters) But if we add facet.pivot.mincount=10 to the original pivot request to get the "Pivot2" URL, no values for single_100_s make the cut as sub-facets of the 586 multi_50_ss:35 docs. Looking at the logs of the shard queries, it appears that facet.pivot.mincount=-1 is set only on the refinement queries, but non in the initial sub-shard queries (where the limit over requesting happens to find the top terms). So terms that don't match above the mincount on at least one single shard won't be considered at all for the cumulative total.
          Hide
          Brett Lucey added a comment -

          Hmn. Yes, both of those sound concerning. I will take a look this week and get those squared away. I did a bit of work last week to clean up the dff parameters block, but I still need to work out a few kinks there. Let me know if you encounter anything else and I will continue to address these issues.

          Show
          Brett Lucey added a comment - Hmn. Yes, both of those sound concerning. I will take a look this week and get those squared away. I did a bit of work last week to clean up the dff parameters block, but I still need to work out a few kinks there. Let me know if you encounter anything else and I will continue to address these issues.
          Hide
          Brett Lucey added a comment -

          Hoss -

          I am working to prioritize the changes you've brought up. While the size of the shard parameters may not strictly be as efficient as possible, is it such that we can run with that for now and circle back to this at a later point, or are you uncomfortable with including the parameters as is in the initial commit?

          -Brett

          Show
          Brett Lucey added a comment - Hoss - I am working to prioritize the changes you've brought up. While the size of the shard parameters may not strictly be as efficient as possible, is it such that we can run with that for now and circle back to this at a later point, or are you uncomfortable with including the parameters as is in the initial commit? -Brett
          Hide
          Brett Lucey added a comment -

          Another thing I noticed is that with my 2 shard exampledocs setup, the following URL seems to send the pivot faceting into an infinite loop of refinement requests (note the typo: there's a space embeded in a field name manu_id_s != manu_+id_s )...

          http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3

          ...not clear what's going on there, but definitely something that needs fixed before committing ("garbage in -> garbage out" is one thing, "garbage in -> crash your cluster" is another)

          I'm not able to reproduce this. Could you tell me a little more about your setup? I am trying to recreate using the example data split to the two shards. (a-m example files on shard1, n-z on shard2). I've run your script and added your data as well, and then gone to the URL you provided and added a &shards=localhost:8983/solr,localhost:7574/solr onto it. It comes back each time without locking up. I am using revision 566d6371c77fd07d11f2e1b3033a669e26692a58 with only the SOLR-2894 patch applied.

          Show
          Brett Lucey added a comment - Another thing I noticed is that with my 2 shard exampledocs setup, the following URL seems to send the pivot faceting into an infinite loop of refinement requests (note the typo: there's a space embeded in a field name manu_id_s != manu_+id_s )... http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3 ...not clear what's going on there, but definitely something that needs fixed before committing ("garbage in -> garbage out" is one thing, "garbage in -> crash your cluster" is another) I'm not able to reproduce this. Could you tell me a little more about your setup? I am trying to recreate using the example data split to the two shards. (a-m example files on shard1, n-z on shard2). I've run your script and added your data as well, and then gone to the URL you provided and added a &shards=localhost:8983/solr,localhost:7574/solr onto it. It comes back each time without locking up. I am using revision 566d6371c77fd07d11f2e1b3033a669e26692a58 with only the SOLR-2894 patch applied.
          Hide
          Hoss Man added a comment -

          I'm not able to reproduce this. Could you tell me a little more about your setup?

          trunk, with patch applied, build the example and then run the Simple Two-Shard Cluster ...

          hossman@frisbee:~/lucene/dev/solr$ cp -r example node1
          hossman@frisbee:~/lucene/dev/solr$ cp -r example node2
          
          # in term1...
          hossman@frisbee:~/lucene/dev/solr/node1$ java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
          
          # wait for node1 startup, then in term2...
          hossman@frisbee:~/lucene/dev/solr/node2$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
          
          # wait for node2 startup, then in term3...
          hossman@frisbee:~/lucene/dev/solr/example/exampledocs$ java -jar post.jar *.xml
          SimplePostTool version 1.5
          Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
          ...
          14 files indexed.
          COMMITting Solr index changes to http://localhost:8983/solr/update..
          Time spent: 0:00:01.763
          hossman@frisbee:~/lucene/dev/solr/example/exampledocs$ curl 'http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3' > /dev/null
          
          # watch the logs in term1 and term2 go spinning like mad
          

          While the size of the shard parameters may not strictly be as efficient as possible, is it such that we can run with that for now and circle back to this at a later point, or are you uncomfortable with including the parameters as is in the initial commit?

          Hmm... not sure how i feel about it w/o more testing - from what i was seeing, with non-trivial field names, term values, and facet.limit the refinements requests were getting HUGE so I suspect it's something we're going to want to tackle before releasing – but refactoring it to be smaller definitely seems like something that should be a lower priority to some of the correctness related issues we're finding, and adding more tests (so we can be confident the refactoring is correct)


          I'm attaching a "SOLR-2894_cloud_test.patch" that contains a new cloud based randomized test i've been working at off and on over the last few days (I created it as a standalone patch because i didn't want to conflict with anything Brett might be in the middle of, and it was easy to do - kept me focused on the test and not dabbling with the internals).

          The test builds up a bunch of random docs, then does a handfull of random pivot facet queries. For each pivot query, it recursively walks the pivot response executing verification queries using "fq" params it builds up from the pivot constraints – so if pivot.facet=a,b,c says that "a" has a term "x" with 4 matching docs, it adds an "fq=a:x" to the original query and checks the count; then it looks a the pivot terms for field "b" under "a:x" and also executes a query for each of them with another fq added, etc...

          As is, the patch currently passes, but that's only because of a few nocommits...

          • randomization of mincount is disabled due to the refinement bug i mentioned before
          • it's currently only doing pivots on 2 string fields (one multivalued and one single valued) ... any attempts at pivot faceting the numeric/date/boolean fields (already included in the docs) causes an NPE in the SolrJ QueryResponse class (i haven't investigated why yet)
          Show
          Hoss Man added a comment - I'm not able to reproduce this. Could you tell me a little more about your setup? trunk, with patch applied, build the example and then run the Simple Two-Shard Cluster ... hossman@frisbee:~/lucene/dev/solr$ cp -r example node1 hossman@frisbee:~/lucene/dev/solr$ cp -r example node2 # in term1... hossman@frisbee:~/lucene/dev/solr/node1$ java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar # wait for node1 startup, then in term2... hossman@frisbee:~/lucene/dev/solr/node2$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar # wait for node2 startup, then in term3... hossman@frisbee:~/lucene/dev/solr/example/exampledocs$ java -jar post.jar *.xml SimplePostTool version 1.5 Posting files to base url http://localhost:8983/solr/update using content-type application/xml.. ... 14 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/update.. Time spent: 0:00:01.763 hossman@frisbee:~/lucene/dev/solr/example/exampledocs$ curl 'http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3' > /dev/null # watch the logs in term1 and term2 go spinning like mad While the size of the shard parameters may not strictly be as efficient as possible, is it such that we can run with that for now and circle back to this at a later point, or are you uncomfortable with including the parameters as is in the initial commit? Hmm... not sure how i feel about it w/o more testing - from what i was seeing, with non-trivial field names, term values, and facet.limit the refinements requests were getting HUGE so I suspect it's something we're going to want to tackle before releasing – but refactoring it to be smaller definitely seems like something that should be a lower priority to some of the correctness related issues we're finding, and adding more tests (so we can be confident the refactoring is correct) I'm attaching a " SOLR-2894 _cloud_test.patch" that contains a new cloud based randomized test i've been working at off and on over the last few days (I created it as a standalone patch because i didn't want to conflict with anything Brett might be in the middle of, and it was easy to do - kept me focused on the test and not dabbling with the internals). The test builds up a bunch of random docs, then does a handfull of random pivot facet queries. For each pivot query, it recursively walks the pivot response executing verification queries using "fq" params it builds up from the pivot constraints – so if pivot.facet=a,b,c says that "a" has a term "x" with 4 matching docs, it adds an "fq=a:x" to the original query and checks the count; then it looks a the pivot terms for field "b" under "a:x" and also executes a query for each of them with another fq added, etc... As is, the patch currently passes, but that's only because of a few nocommits... randomization of mincount is disabled due to the refinement bug i mentioned before it's currently only doing pivots on 2 string fields (one multivalued and one single valued) ... any attempts at pivot faceting the numeric/date/boolean fields (already included in the docs) causes an NPE in the SolrJ QueryResponse class (i haven't investigated why yet)
          Hide
          Andrew Muldowney added a comment -

          Hey Hoss, I'm working with Brett to clear up these issues and get 2894 commited.

          I've done the work on the MinCount and simplified the logic and made it a fair bit easier to read and fixed the issue with pivot facets. The example you showed with the second query failing to return results has been rectified.

          Lets talk about the refinement terms. We can easily shorten the name of the !terms we use because as you've correctly asserted that the number makes them unique the rest is just filler. But I don't understand how your example functions under the covers.

          shard2_proposed
          facet.pivot.mincount=-1
          _fpt_1=memory
          _fpt_2=corsair
          _fpt_2=apple
          _fpt_2=asus
          _fpt_2=boa
          _fpt_2=dell
          _fpt_2=uk
          facet.pivot={!fpt=1}cat,inStock
          facet.pivot={!fpt=2}manu_id_s,inStock

          What under the covers is used to make _fpt_2 match up to fpt=2 ? The one-to-many relation alters my understanding of how this works.

          Show
          Andrew Muldowney added a comment - Hey Hoss, I'm working with Brett to clear up these issues and get 2894 commited. I've done the work on the MinCount and simplified the logic and made it a fair bit easier to read and fixed the issue with pivot facets. The example you showed with the second query failing to return results has been rectified. Lets talk about the refinement terms. We can easily shorten the name of the !terms we use because as you've correctly asserted that the number makes them unique the rest is just filler. But I don't understand how your example functions under the covers. shard2_proposed facet.pivot.mincount=-1 _fpt_1=memory _fpt_2=corsair _fpt_2=apple _fpt_2=asus _fpt_2=boa _fpt_2=dell _fpt_2=uk facet.pivot={!fpt=1}cat,inStock facet.pivot={!fpt=2}manu_id_s,inStock What under the covers is used to make _fpt_2 match up to fpt=2 ? The one-to-many relation alters my understanding of how this works.
          Hide
          Hoss Man added a comment -

          I've done the work on the MinCount and simplified the logic and made it a fair bit easier to read and fixed the issue with pivot facets. The example you showed with the second query failing to return results has been rectified.

          awesome.

          What under the covers is used to make _fpt_2 match up to fpt=2 ? The one-to-many relation alters my understanding of how this works.

          I haven't been getting much sleep lately, so forgive me if i'm misunderstanding your question or if my answer is obvious giberish: I think what i had in mind before was just a few lines of new code in the pivot-refinement logic that runs on the shards to construct a param name for hte top levle multi-valued param using the numeric id, that the pivot could would lookup directly instead of relying on local param variable dereferencing – which as mentioned, doesn't support any sort of 1-to-many variable refs.

          (i don't have the patch in front of me in an editor, so i'll make up variable names and do this mostly in psuedo-code)...

          SolrParams reqParams = req.getParams()
          String[] allPivots = reqParams.getParams("facet.pivot")
          for (String pivot : allPivots) {
            SolrParams localParams = parseLocalParams(pivot)
            String refine_id = localParams.get("fpt")
            if (refine_id == null) {
              // TODO: not a refinement ... do full pivoting
            } else {
              String[] refinements = reqParams.getParams("_fpt_" + refine_id)
              for (String r : refinements) {
                // TODO: compute the refinement count for "r" relative to the current "pivot"
              }
            }
          }
          

          ...does that make sense?

          Show
          Hoss Man added a comment - I've done the work on the MinCount and simplified the logic and made it a fair bit easier to read and fixed the issue with pivot facets. The example you showed with the second query failing to return results has been rectified. awesome. What under the covers is used to make _fpt_2 match up to fpt=2 ? The one-to-many relation alters my understanding of how this works. I haven't been getting much sleep lately, so forgive me if i'm misunderstanding your question or if my answer is obvious giberish: I think what i had in mind before was just a few lines of new code in the pivot-refinement logic that runs on the shards to construct a param name for hte top levle multi-valued param using the numeric id, that the pivot could would lookup directly instead of relying on local param variable dereferencing – which as mentioned, doesn't support any sort of 1-to-many variable refs. (i don't have the patch in front of me in an editor, so i'll make up variable names and do this mostly in psuedo-code)... SolrParams reqParams = req.getParams() String [] allPivots = reqParams.getParams( "facet.pivot" ) for ( String pivot : allPivots) { SolrParams localParams = parseLocalParams(pivot) String refine_id = localParams.get( "fpt" ) if (refine_id == null ) { // TODO: not a refinement ... do full pivoting } else { String [] refinements = reqParams.getParams( "_fpt_" + refine_id) for ( String r : refinements) { // TODO: compute the refinement count for "r" relative to the current "pivot" } } } ...does that make sense?
          Hide
          Andrew Muldowney added a comment - - edited

          Thank you Hoss, that explanation gave me everything I needed.

          So this patch breaks up the "modifyRequest" block into three parts. First is the global removal and then running of the "modifyRequestForFieldFacets" and "modifyRequestForPivotFacets" and includes the changed mincount for pivot facet fields.

          This also changes the refinement queries from
          facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock
          to
          facet.pivot={!fpt=1}cat,inStock

          This caused some problems since before each term had its own facet.pivot and thusly its own context and PivotFacetProcessor.
          Now that we only have one context for all the refinement requests we needed to manage our DocSet since it gets messed with. But those issues seem to be fixed.

          Show
          Andrew Muldowney added a comment - - edited Thank you Hoss, that explanation gave me everything I needed. So this patch breaks up the "modifyRequest" block into three parts. First is the global removal and then running of the "modifyRequestForFieldFacets" and "modifyRequestForPivotFacets" and includes the changed mincount for pivot facet fields. This also changes the refinement queries from facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock to facet.pivot={!fpt=1}cat,inStock This caused some problems since before each term had its own facet.pivot and thusly its own context and PivotFacetProcessor. Now that we only have one context for all the refinement requests we needed to manage our DocSet since it gets messed with. But those issues seem to be fixed.
          Hide
          Andrew Muldowney added a comment -

          The latest patch upload has included Brett's change, along with the changes I outlined earlier.

          Hoss, I think we've addressed everything up to this point. I've got time to correct any other issues you find.

          Show
          Andrew Muldowney added a comment - The latest patch upload has included Brett's change, along with the changes I outlined earlier. Hoss, I think we've addressed everything up to this point. I've got time to correct any other issues you find.
          Hide
          Hoss Man added a comment -

          I haven't had a lot of time to review the updatd patch in depth, but I did spend some time trying to improve TestCloudPivotFacet to resolve some of the nocommits – but i'm still seeing failures...

          1) I realized the "depth" check i was trying to do was bogus and commented it out (still need to purge the code - didn't want to muck with that until the rest of the test was passing more reliably)

          2) the NPE I mentioned in QueryResponse.readPivots is still happening, but i realized that it has nothing to do with the datatype of the fields being pivoted on – it only seemed that way because of the poor randomization of values getting put in the single valued string fields vs the multivalued fields in the old version of the test.

          The bug seems to pop up in some cases where a pivot constraint has no sub-pivots. Normally this results in a NamedList with 3 keys (field,value,count) – the 4th "pivot" key is only included if there is a list of at least 1 sub-pivot. But in some cases (I can't explain from looking at the code why) the server is responding back with a 4th entry using hte key "pivot" but the value is "null"

          We need to get to the bottom of this – it's not clear if there is a bug preventing real sub-pivot constraints from being returned correctly, or if this is just a mistake in the code where it's putting "null" in the NamedList instead of not adding anything at all (in which case it might be tempting to make QueryResponse.readPivots smart enough to deal with it, but if we did that it would still be broken for older clients – best to stick with teh current API semantics)

          In the attached patch update, this seed will fail showing the null sub-pivots problem...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=680E68425E7CA1BA -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 41.7s | TestCloudPivotFacet.testDistribSearch <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: Server sent back 'null' for sub pivots?
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([680E68425E7CA1BA:E9E8E65A2923C186]:0)
             [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.readPivots(QueryResponse.java:383)
             [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:363)
             [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:148)
             [junit4]    > 	at org.apache.solr.client.solrj.response.QueryResponse.<init>(QueryResponse.java:91)
             [junit4]    > 	at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
             [junit4]    > 	at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:161)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)
          
          

          3) Independent (i think) from the NPE issue, there is still something wonky with the refined counts when mincount is specified...

          Here for example is a seed that gets based the QueryResponse.readPivots, but then fails the numFound validation queries used to check the pivot counts...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=F08A107C384690FC -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Jamaica -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 27.0s | TestCloudPivotFacet.testDistribSearch <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: {main({main(facet.pivot.mincount=9),extra({main(facet.limit=12),extra({main(facet.pivot=pivot_y_s%2Cpivot_x_s1),extra(facet=true&facet.pivot=pivot_x_s1%2Cpivot_x_s)})})}),extra(rows=0&q=id%3A%5B*+TO+503%5D)} ==> pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})} expected:<9> but was:<14>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F08A107C384690FC:716C9E644F19F0C0]:0)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:190)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145)
             [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
             [junit4]    > Caused by: java.lang.AssertionError: pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})} expected:<9> but was:<14>
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:403)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:208)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:176)
             [junit4]    > 	... 42 more
          

          This is saying that while doing a request with a pivot on the "pivot_y_s,pivot_x_s1" fields it looped over the (top level) pivot constraints in "pivot_y_s" - and for one of those term values (it just happens to be the empty string "") it got a pivot count of 9, but when it executed a query filtering the main results on that term ("fq={!term f=pivot_y_s}") the total number of results found were 14.

          If you comment out the line of the test that sets the FACET_PIVOT_MINCOUNT param, this seed stats to pass, suggesting that it's almost certianly the mincount logic that's putting a kink in the correctness of the final refined counts.

          Show
          Hoss Man added a comment - I haven't had a lot of time to review the updatd patch in depth, but I did spend some time trying to improve TestCloudPivotFacet to resolve some of the nocommits – but i'm still seeing failures... 1) I realized the "depth" check i was trying to do was bogus and commented it out (still need to purge the code - didn't want to muck with that until the rest of the test was passing more reliably) 2) the NPE I mentioned in QueryResponse.readPivots is still happening, but i realized that it has nothing to do with the datatype of the fields being pivoted on – it only seemed that way because of the poor randomization of values getting put in the single valued string fields vs the multivalued fields in the old version of the test. The bug seems to pop up in some cases where a pivot constraint has no sub-pivots. Normally this results in a NamedList with 3 keys (field,value,count) – the 4th "pivot" key is only included if there is a list of at least 1 sub-pivot. But in some cases (I can't explain from looking at the code why) the server is responding back with a 4th entry using hte key "pivot" but the value is "null" We need to get to the bottom of this – it's not clear if there is a bug preventing real sub-pivot constraints from being returned correctly, or if this is just a mistake in the code where it's putting "null" in the NamedList instead of not adding anything at all (in which case it might be tempting to make QueryResponse.readPivots smart enough to deal with it, but if we did that it would still be broken for older clients – best to stick with teh current API semantics) In the attached patch update, this seed will fail showing the null sub-pivots problem... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=680E68425E7CA1BA -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8 [junit4] FAILURE 41.7s | TestCloudPivotFacet.testDistribSearch <<< [junit4] > Throwable #1: java.lang.AssertionError: Server sent back 'null' for sub pivots? [junit4] > at __randomizedtesting.SeedInfo.seed([680E68425E7CA1BA:E9E8E65A2923C186]:0) [junit4] > at org.apache.solr.client.solrj.response.QueryResponse.readPivots(QueryResponse.java:383) [junit4] > at org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:363) [junit4] > at org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:148) [junit4] > at org.apache.solr.client.solrj.response.QueryResponse.<init>(QueryResponse.java:91) [junit4] > at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) [junit4] > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:161) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145) 3) Independent (i think) from the NPE issue, there is still something wonky with the refined counts when mincount is specified... Here for example is a seed that gets based the QueryResponse.readPivots, but then fails the numFound validation queries used to check the pivot counts... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=F08A107C384690FC -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Jamaica -Dtests.file.encoding=UTF-8 [junit4] FAILURE 27.0s | TestCloudPivotFacet.testDistribSearch <<< [junit4] > Throwable #1: java.lang.AssertionError: {main({main(facet.pivot.mincount=9),extra({main(facet.limit=12),extra({main(facet.pivot=pivot_y_s%2Cpivot_x_s1),extra(facet=true&facet.pivot=pivot_x_s1%2Cpivot_x_s)})})}),extra(rows=0&q=id%3A%5B*+TO+503%5D)} ==> pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})} expected:<9> but was:<14> [junit4] > at __randomizedtesting.SeedInfo.seed([F08A107C384690FC:716C9E644F19F0C0]:0) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:190) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863) [junit4] > at java.lang.Thread.run(Thread.java:744) [junit4] > Caused by: java.lang.AssertionError: pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})} expected:<9> but was:<14> [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:403) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:208) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:176) [junit4] > ... 42 more This is saying that while doing a request with a pivot on the "pivot_y_s,pivot_x_s1" fields it looped over the (top level) pivot constraints in "pivot_y_s" - and for one of those term values (it just happens to be the empty string "") it got a pivot count of 9, but when it executed a query filtering the main results on that term ("fq={!term f=pivot_y_s}") the total number of results found were 14. If you comment out the line of the test that sets the FACET_PIVOT_MINCOUNT param, this seed stats to pass, suggesting that it's almost certianly the mincount logic that's putting a kink in the correctness of the final refined counts.
          Hide
          Andrew Muldowney added a comment - - edited

          Me and Brett discovered serveral bugs with our mincount and the changes I made to our refinement requests that resulted in the odd behavior you were seeing. Not everything is super happy. I get what look like solrcloud errors when running certain seeds
          I forgot this patch also comments out the randomUsableUnicodeString to just be a simple string, BUT I've changed it back on my box and It seems to be fine.

          215 T12 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: ..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002
          5216 T12 oasc.SolrResourceLoader.<init> new SolrResourceLoader for directory: '..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\'
          5421 T12 oasc.ConfigSolr.fromFile Loading container configuration from D:\hmm\lucene-solr\..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\solr.xml
          5422 T12 oass.SolrDispatchFilter.init ERROR Could not start Solr. Check solr/home property and the logs
          5483 T12 oasc.SolrException.log ERROR null:org.apache.solr.common.SolrException: Could not load SOLR configuration
          		at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:71)
          		at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:96)
          		at org.apache.solr.servlet.SolrDispatchFilter.loadConfigSolr(SolrDispatchFilter.java:157)
          		at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:188)
          		at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:137)
          		at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119)
          		at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
          		at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719)
          		at org.eclipse.jetty.servlet.ServletHandler.updateMappings(ServletHandler.java:1309)
          		at org.eclipse.jetty.servlet.ServletHandler.setFilterMappings(ServletHandler.java:1345)
          		at org.eclipse.jetty.servlet.ServletHandler.addFilterMapping(ServletHandler.java:1085)
          		at org.eclipse.jetty.servlet.ServletHandler.addFilterWithMapping(ServletHandler.java:931)
          		at org.eclipse.jetty.servlet.ServletHandler.addFilterWithMapping(ServletHandler.java:888)
          		at org.eclipse.jetty.servlet.ServletContextHandler.addFilter(ServletContextHandler.java:340)
          		at org.apache.solr.client.solrj.embedded.JettySolrRunner$1.lifeCycleStarted(JettySolrRunner.java:327)
          		at org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:174)
          		at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:65)
          		at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:432)
          		at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:405)
          		at org.apache.solr.cloud.AbstractFullDistribZkTestBase.createJetty(AbstractFullDistribZkTestBase.java:481)
          		at org.apache.solr.BaseDistributedSearchTestCase.createJetty(BaseDistributedSearchTestCase.java:351)
          		at org.apache.solr.cloud.AbstractFullDistribZkTestBase.createServers(AbstractFullDistribZkTestBase.java:282)
          		at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
          		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          		at java.lang.reflect.Method.invoke(Method.java:606)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
          		at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
          		at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
          		at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
          		at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
          		at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
          		at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
          		at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
          		at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
          		at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
          		at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
          		at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
          		at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
          		at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
          		at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
          		at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
          		at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
          		at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
          		at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
          		at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
          		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
          		at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
          		at java.lang.Thread.run(Thread.java:724)
          	Caused by: org.apache.solr.common.SolrException: solr.xml does not exist in D:\hmm\lucene-solr\..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\solr.xml cannot start Solr
          		at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:62)
          

          I've run your TestCloudPivotFacet test a bunch of times and I only get ZK errors, no value mismatch or null sub-pivots.

          Show
          Andrew Muldowney added a comment - - edited Me and Brett discovered serveral bugs with our mincount and the changes I made to our refinement requests that resulted in the odd behavior you were seeing. Not everything is super happy. I get what look like solrcloud errors when running certain seeds I forgot this patch also comments out the randomUsableUnicodeString to just be a simple string, BUT I've changed it back on my box and It seems to be fine. 215 T12 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: ..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002 5216 T12 oasc.SolrResourceLoader.<init> new SolrResourceLoader for directory: '..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\' 5421 T12 oasc.ConfigSolr.fromFile Loading container configuration from D:\hmm\lucene-solr\..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\solr.xml 5422 T12 oass.SolrDispatchFilter.init ERROR Could not start Solr. Check solr/home property and the logs 5483 T12 oasc.SolrException.log ERROR null :org.apache.solr.common.SolrException: Could not load SOLR configuration at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:71) at org.apache.solr.core.ConfigSolr.fromSolrHome(ConfigSolr.java:96) at org.apache.solr.servlet.SolrDispatchFilter.loadConfigSolr(SolrDispatchFilter.java:157) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:188) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:137) at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:119) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:719) at org.eclipse.jetty.servlet.ServletHandler.updateMappings(ServletHandler.java:1309) at org.eclipse.jetty.servlet.ServletHandler.setFilterMappings(ServletHandler.java:1345) at org.eclipse.jetty.servlet.ServletHandler.addFilterMapping(ServletHandler.java:1085) at org.eclipse.jetty.servlet.ServletHandler.addFilterWithMapping(ServletHandler.java:931) at org.eclipse.jetty.servlet.ServletHandler.addFilterWithMapping(ServletHandler.java:888) at org.eclipse.jetty.servlet.ServletContextHandler.addFilter(ServletContextHandler.java:340) at org.apache.solr.client.solrj.embedded.JettySolrRunner$1.lifeCycleStarted(JettySolrRunner.java:327) at org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:174) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:65) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:432) at org.apache.solr.client.solrj.embedded.JettySolrRunner.start(JettySolrRunner.java:405) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.createJetty(AbstractFullDistribZkTestBase.java:481) at org.apache.solr.BaseDistributedSearchTestCase.createJetty(BaseDistributedSearchTestCase.java:351) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.createServers(AbstractFullDistribZkTestBase.java:282) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at java.lang. Thread .run( Thread .java:724) Caused by: org.apache.solr.common.SolrException: solr.xml does not exist in D:\hmm\lucene-solr\..\..\C:\Users\AMULDO~1\AppData\Local\Temp\solr.cloud.TestCloudPivotFacet-A515DED004CF1660-001\tempDir-002\solr.xml cannot start Solr at org.apache.solr.core.ConfigSolr.fromFile(ConfigSolr.java:62) I've run your TestCloudPivotFacet test a bunch of times and I only get ZK errors, no value mismatch or null sub-pivots.
          Hide
          Hoss Man added a comment -

          Started getting back into this yesterday (i should have several large blocks of time for this issue this week & next week)...

          Me and Brett discovered serveral bugs with our mincount and the changes I made to our refinement requests that resulted in the odd behavior you were seeing.

          Awesome! ... glad to see the test was useful.

          Not everything is super happy. I get what look like solrcloud errors when running certain seeds

          Hmmm... that is a weird error. People sometimes see errors in solr tests that use threads related to timing and/or assertions of things that haven't happened yet - but i don't remember ever seeing anything like this type of problem with initialization of the cores.

          do these failures reproduce for you with the same seeds? can you post the full reproduce line that you get with these failures?

          I forgot this patch also comments out the randomUsableUnicodeString to just be a simple string, BUT I've changed it back on my box and It seems to be fine.

          yep – it also still had one of my nocommits so that it was only pivoting on string fields, but even w/o that it's worked great for me on many iterations.


          Revised patch - mostly cleaning up the lingering issues in TestCloudPivotFacet but a few other minor fixes of stuff i noticed.

          Detailed changes compared to previous patch...

          • removed "TestDistributedSearch.java.orig" that seems to have been included in patch by mistake
          • cleanup TestCloudPivotFacet
            • fixed randomUsableUnicodeString()
            • fix nocommit about testing pivot on non-string fields
            • fixed the depth checking (we can assert the max depth, but that's it)
            • removed weird (unused) "int ss = 2" that got added to assertNumFound
              • was also in some dead code in PivotFacetProcessor?
            • refactored cut/pate methods from Cursor test into baseclass
          • I removed the NullGoesLastComparator class and replaced it with a compareWithNullLast helper method in PivotFacetField (and added a unit test for it)
            • the Comparator contract is pretty explicit about null, and this class violated that
            • it was only being used for simple method calls, not passed to anything that explicitly needed a Comparator, so there wasn't a strong need for a standalone class

          My "next step" plans...

          • review DistributedFacetPivotTest in depth more - add more strong assertions
            • at first glance, it looks like a lot of the test is following the example of most existing distrib tests of relying on comparisons between the controlClient and the distrib client – in my opinion that's a bad pattern, and i'd like to add some explicit assertions on the results of all the this.query(...) calls
          • re-review the new pivot code (and the changes to facet code) in general
            • it's been a while since my last skim, and i know you've tweaked a bunch based on my previous comments
            • i'll take a stab at adding more javadocs to some of the new methods as i make sense of them
            • where possible, i'm going to try to add unit tests for some of the new low level methods you've introduced – largely as a way to help ensure i understand what they do
          Show
          Hoss Man added a comment - Started getting back into this yesterday (i should have several large blocks of time for this issue this week & next week)... Me and Brett discovered serveral bugs with our mincount and the changes I made to our refinement requests that resulted in the odd behavior you were seeing. Awesome! ... glad to see the test was useful. Not everything is super happy. I get what look like solrcloud errors when running certain seeds Hmmm... that is a weird error. People sometimes see errors in solr tests that use threads related to timing and/or assertions of things that haven't happened yet - but i don't remember ever seeing anything like this type of problem with initialization of the cores. do these failures reproduce for you with the same seeds? can you post the full reproduce line that you get with these failures? I forgot this patch also comments out the randomUsableUnicodeString to just be a simple string, BUT I've changed it back on my box and It seems to be fine. yep – it also still had one of my nocommits so that it was only pivoting on string fields, but even w/o that it's worked great for me on many iterations. Revised patch - mostly cleaning up the lingering issues in TestCloudPivotFacet but a few other minor fixes of stuff i noticed. Detailed changes compared to previous patch... removed "TestDistributedSearch.java.orig" that seems to have been included in patch by mistake cleanup TestCloudPivotFacet fixed randomUsableUnicodeString() fix nocommit about testing pivot on non-string fields fixed the depth checking (we can assert the max depth, but that's it) removed weird (unused) "int ss = 2" that got added to assertNumFound was also in some dead code in PivotFacetProcessor? refactored cut/pate methods from Cursor test into baseclass I removed the NullGoesLastComparator class and replaced it with a compareWithNullLast helper method in PivotFacetField (and added a unit test for it) the Comparator contract is pretty explicit about null, and this class violated that it was only being used for simple method calls, not passed to anything that explicitly needed a Comparator, so there wasn't a strong need for a standalone class My "next step" plans... review DistributedFacetPivotTest in depth more - add more strong assertions at first glance, it looks like a lot of the test is following the example of most existing distrib tests of relying on comparisons between the controlClient and the distrib client – in my opinion that's a bad pattern, and i'd like to add some explicit assertions on the results of all the this.query(...) calls re-review the new pivot code (and the changes to facet code) in general it's been a while since my last skim, and i know you've tweaked a bunch based on my previous comments i'll take a stab at adding more javadocs to some of the new methods as i make sense of them where possible, i'm going to try to add unit tests for some of the new low level methods you've introduced – largely as a way to help ensure i understand what they do
          Hide
          Hoss Man added a comment -

          review DistributedFacetPivotTest in depth more - add more strong assertions

          Attaching updated patch with progress along this line: in addition so some new explicit assertions, it also includes some refactoring & simplification of setupDistributedPivotFacetDocuments

          One thing that jumped out at me when reviewing this is even though the test does some queries with large overrequest params as well disabling overrequest, there doesn't seem to be any assertions about how the overrequesting affects the results – in fact, because of how the controlClient is compared with the distributed client, it seems that with this sample data disabling overrequest doesn't even change the results at all.

          I definitely want to add some test logic around that – if for no other reason then to prove that when the overrequesting is used, it can help with finding constraints in the long tail

          Show
          Hoss Man added a comment - review DistributedFacetPivotTest in depth more - add more strong assertions Attaching updated patch with progress along this line: in addition so some new explicit assertions, it also includes some refactoring & simplification of setupDistributedPivotFacetDocuments One thing that jumped out at me when reviewing this is even though the test does some queries with large overrequest params as well disabling overrequest, there doesn't seem to be any assertions about how the overrequesting affects the results – in fact, because of how the controlClient is compared with the distributed client, it seems that with this sample data disabling overrequest doesn't even change the results at all. I definitely want to add some test logic around that – if for no other reason then to prove that when the overrequesting is used, it can help with finding constraints in the long tail
          Hide
          Hoss Man added a comment -

          I definitely want to add some test logic around that – if for no other reason then to prove that when the overrequesting is used, it can help with finding constraints in the long tail

          Updated patch...

          • new DistributedFacetPivotLongTailTest
            • crafts the shard distribution specifically to demonstrate that overrequesting is affecting things as expected
          • split DistributedFacetPivotTest into DistributedFacetPivotSmallTest and DistributedFacetPivotLargeTest
            • this was already 2 very different sets of data with two very differnet styles of asserting expected results – so i went ahead and split it up
            • Now there isn't the weird suprise that halfway through a test all the data is deleted and new data is added and more asertions are made.

          Brett & Andrew: would really appreciate if you guys could review my changes to your existing test as well as the new LongTail test and help sanity check that the assertions all look correct.

          Assuming you guys don't spot any problems with the tests: next up i'll move back into reviewing the code more in depth, and documenting/refactoring/unit-testing as needed to help myself understand all this awesomeness you guys have added.

          Show
          Hoss Man added a comment - I definitely want to add some test logic around that – if for no other reason then to prove that when the overrequesting is used, it can help with finding constraints in the long tail Updated patch... new DistributedFacetPivotLongTailTest crafts the shard distribution specifically to demonstrate that overrequesting is affecting things as expected split DistributedFacetPivotTest into DistributedFacetPivotSmallTest and DistributedFacetPivotLargeTest this was already 2 very different sets of data with two very differnet styles of asserting expected results – so i went ahead and split it up Now there isn't the weird suprise that halfway through a test all the data is deleted and new data is added and more asertions are made. Brett & Andrew: would really appreciate if you guys could review my changes to your existing test as well as the new LongTail test and help sanity check that the assertions all look correct. Assuming you guys don't spot any problems with the tests: next up i'll move back into reviewing the code more in depth, and documenting/refactoring/unit-testing as needed to help myself understand all this awesomeness you guys have added.
          Hide
          Andrew Muldowney added a comment - - edited

          I'm on it.
          I'm loving all these added asserts. I've added a few asserts that test the exact data that proves refinement worked, I'll work through the whole changeset and post up what I've got.

          I want to bring up a separate issue that we've been dealing with in this patch.

          SolrJ and its response types. I'm fairly certain that field facets always return the value of any field as a string, so "true" instead of boolean.true and dates in the external format not the pretty printed format. This isn't wholly true in pivot facets right now and it causes some wierdness in the code.
          I'd personally like to just use the string data types like field facet does, and remove the hacky functionality that sends back objects for solrJ
          Hoss, thoughts?

          Show
          Andrew Muldowney added a comment - - edited I'm on it. I'm loving all these added asserts. I've added a few asserts that test the exact data that proves refinement worked, I'll work through the whole changeset and post up what I've got. I want to bring up a separate issue that we've been dealing with in this patch. SolrJ and its response types. I'm fairly certain that field facets always return the value of any field as a string, so "true" instead of boolean.true and dates in the external format not the pretty printed format. This isn't wholly true in pivot facets right now and it causes some wierdness in the code. I'd personally like to just use the string data types like field facet does, and remove the hacky functionality that sends back objects for solrJ Hoss, thoughts?
          Hide
          Hoss Man added a comment -

          This isn't wholly true in pivot facets right now and it causes some wierdness in the code.

          I'd personally like to just use the string data types like field facet does, and remove the hacky functionality that sends back objects for solrJ

          Can you elaborate on what kind of weirdness/hacky functionality you are talking about?

          personally i'm a big fan of the fact that the facet.pivot returns the correct data types for the corresponding FieldType – i think that's a big improvement over facet.field.

          Show
          Hoss Man added a comment - This isn't wholly true in pivot facets right now and it causes some wierdness in the code. I'd personally like to just use the string data types like field facet does, and remove the hacky functionality that sends back objects for solrJ Can you elaborate on what kind of weirdness/hacky functionality you are talking about? personally i'm a big fan of the fact that the facet.pivot returns the correct data types for the corresponding FieldType – i think that's a big improvement over facet.field.
          Hide
          Andrew Muldowney added a comment -

          Can you elaborate on what kind of weirdness/hacky functionality you are talking about?

          In the PivotFacetProcessor (shards) we .toObject each value. This is weird in the non-distributed mode because nothing clears up those into strings for the response -XML or JSON. This is a problem with dates, because "2012-11-01T12:30:00Z" becomes "Nov 1 4:30 EST 2012". I don't know what methods get run after process in the non-distrib mode that we could hook into to change these values back into what they should be.

          In distributed this can be trouble because when trying to assign a refinement path we must get the .toexternal value of the datefield so that it can be properly looked up in the index. Elran has a fix for this, which works fine but we'll need to extend this to the PivotFacetValue's ConvertToNamedList for the output reponse to look right and we end up having conditionals for dates in a bunch of places.

          Most other datatypes are fine, but date is the worst of this set.

          Show
          Andrew Muldowney added a comment - Can you elaborate on what kind of weirdness/hacky functionality you are talking about? In the PivotFacetProcessor (shards) we .toObject each value. This is weird in the non-distributed mode because nothing clears up those into strings for the response -XML or JSON. This is a problem with dates, because "2012-11-01T12:30:00Z" becomes "Nov 1 4:30 EST 2012". I don't know what methods get run after process in the non-distrib mode that we could hook into to change these values back into what they should be. In distributed this can be trouble because when trying to assign a refinement path we must get the .toexternal value of the datefield so that it can be properly looked up in the index. Elran has a fix for this, which works fine but we'll need to extend this to the PivotFacetValue's ConvertToNamedList for the output reponse to look right and we end up having conditionals for dates in a bunch of places. Most other datatypes are fine, but date is the worst of this set.
          Hide
          Andrew Muldowney added a comment -

          Hoss, I can't find any faults with your test additions

          I'd add

              //Microsoft will come back wrong if refinement was not done correctly
              PivotField microsoft = firstPlace.getPivot().get(1);
              assertEquals("company_t", microsoft.getField());
              assertEquals("microsoft",microsoft.getValue());
              assertEquals(56,microsoft.getCount());
          

          to the
          // basic check w/ limit & default sort (count)
          in the distributedpivotfacetlargetest

          The microsoft pivot is the key value that is wrong if refinement didn't work, so might as well sanity check it.

          Otherwise you've just included the CursorPagingTest which is probably from a different patch?

          Thanks for your help making this patch better

          Show
          Andrew Muldowney added a comment - Hoss, I can't find any faults with your test additions I'd add //Microsoft will come back wrong if refinement was not done correctly PivotField microsoft = firstPlace.getPivot().get(1); assertEquals( "company_t" , microsoft.getField()); assertEquals( "microsoft" ,microsoft.getValue()); assertEquals(56,microsoft.getCount()); to the // basic check w/ limit & default sort (count) in the distributedpivotfacetlargetest The microsoft pivot is the key value that is wrong if refinement didn't work, so might as well sanity check it. Otherwise you've just included the CursorPagingTest which is probably from a different patch? Thanks for your help making this patch better
          Hide
          Elran Dvir added a comment -

          I think it's very important to keep pivot's response values as objects.
          We should consider changing facet's response values from string to object.
          In objects, there is , ofcourse, more information than strings.
          For example, there is no ability in Solr to sort by index desc. With objects it can be done in the client.

          Show
          Elran Dvir added a comment - I think it's very important to keep pivot's response values as objects. We should consider changing facet's response values from string to object. In objects, there is , ofcourse, more information than strings. For example, there is no ability in Solr to sort by index desc. With objects it can be done in the client.
          Hide
          Hoss Man added a comment -

          I think it's very important to keep pivot's response values as objects.

          +1

          In the PivotFacetProcessor (shards) we .toObject each value. This is weird in the non-distributed mode because nothing clears up those into strings for the response -XML or JSON. This is a problem with dates, because "2012-11-01T12:30:00Z" becomes "Nov 1 4:30 EST 2012". I don't know what methods get run after process in the non-distrib mode that we could hook into to change these values back into what they should be.

          I don't think that's weird – i think the toObject() call you have is exactly what it should be – i'm not really following your point about the XML or JSON responses, the response writers already know how to handle the various Object types that (Like Dates, and Integers, etc...) that might be included.

          Based on your comment about PivotFacetValue's convertToNamedList, i think what you mean is that the main underlying problem with using the real Object representation of the values is that when you then want to build up the paths in PivotFacetValue's createFromNamedList for the purposes of the refinement queries, there is no corrollary to "toObject" that can be used.

          This is very similar to the problem we encountered in SOLR-5354 – the solution there was a new FieldType methd specific to marshalling and unmarshalling sort values. we can't simply re-use that new method as is because the Objects used as Sort values don't neccessarily have a 1-to-1 corrispondence with the Objects that matter here.

          Ideally there should be a similar method on the FieldType for doing this, that let's you round trip the output of FieldType.toObject() for the purposes of building up a simple query string – but that doesn't exist at the moment.

          My vote would be to leave the code the way it is right now (assuming it can toString() anything except a "Date" object) and open a new issue to improve on this for custom FieldTypes at a later date. That way people who want to go ahead and use Distributed Pivot Faceting for out of the box field types like Strings/Dates/Numbers can, and have the benefits of well structured objects in the response – w/o waiting on a more robust solution that can work with arbitrary custom field types. (which can come later)

          Otherwise you've just included the CursorPagingTest which is probably from a different patch?

          CursorPagingTest is included in the patch because of methods refactored up into SolrTestCaseJ4 for use in this patch.

          I've been making my way further through the code review slowly – Attaching a revised patch...

          • updated to trunk
          • added microsoft asserts to DistributedFacetPivotLargeTest (per Andrew)
          • make FacetParams.FACET_OVERREQUEST package-private since it's not a usable param (just a base)
          • StrUtils
            • more javadocs
            • new escapeTextWithSeparator test -> TestUtils
            • refactor duplicated code with existing "join" method into new private method
          • PivotListEntry
            • more javadocs
            • kill some dead code (multiple enums with same index?)
            • refactored to leverage standard java Enum plumbing better
          • PivotFacetValue
            • added a nocommit regarding custom fieldtypes to createFromNamedList that we either need a better solution to the Object->String problem, or we need to file a new issue prior to commiting and update the comment
            • switched if-else-if-else-if on PivotListEntry instances to be an enum switch
          Show
          Hoss Man added a comment - I think it's very important to keep pivot's response values as objects. +1 In the PivotFacetProcessor (shards) we .toObject each value. This is weird in the non-distributed mode because nothing clears up those into strings for the response -XML or JSON. This is a problem with dates, because "2012-11-01T12:30:00Z" becomes "Nov 1 4:30 EST 2012". I don't know what methods get run after process in the non-distrib mode that we could hook into to change these values back into what they should be. I don't think that's weird – i think the toObject() call you have is exactly what it should be – i'm not really following your point about the XML or JSON responses, the response writers already know how to handle the various Object types that (Like Dates, and Integers, etc...) that might be included. Based on your comment about PivotFacetValue's convertToNamedList, i think what you mean is that the main underlying problem with using the real Object representation of the values is that when you then want to build up the paths in PivotFacetValue's createFromNamedList for the purposes of the refinement queries, there is no corrollary to "toObject" that can be used. This is very similar to the problem we encountered in SOLR-5354 – the solution there was a new FieldType methd specific to marshalling and unmarshalling sort values. we can't simply re-use that new method as is because the Objects used as Sort values don't neccessarily have a 1-to-1 corrispondence with the Objects that matter here. Ideally there should be a similar method on the FieldType for doing this, that let's you round trip the output of FieldType.toObject() for the purposes of building up a simple query string – but that doesn't exist at the moment. My vote would be to leave the code the way it is right now (assuming it can toString() anything except a "Date" object) and open a new issue to improve on this for custom FieldTypes at a later date. That way people who want to go ahead and use Distributed Pivot Faceting for out of the box field types like Strings/Dates/Numbers can, and have the benefits of well structured objects in the response – w/o waiting on a more robust solution that can work with arbitrary custom field types. (which can come later) Otherwise you've just included the CursorPagingTest which is probably from a different patch? CursorPagingTest is included in the patch because of methods refactored up into SolrTestCaseJ4 for use in this patch. — I've been making my way further through the code review slowly – Attaching a revised patch... updated to trunk added microsoft asserts to DistributedFacetPivotLargeTest (per Andrew) make FacetParams.FACET_OVERREQUEST package-private since it's not a usable param (just a base) StrUtils more javadocs new escapeTextWithSeparator test -> TestUtils refactor duplicated code with existing "join" method into new private method PivotListEntry more javadocs kill some dead code (multiple enums with same index?) refactored to leverage standard java Enum plumbing better PivotFacetValue added a nocommit regarding custom fieldtypes to createFromNamedList that we either need a better solution to the Object->String problem, or we need to file a new issue prior to commiting and update the comment switched if-else-if-else-if on PivotListEntry instances to be an enum switch
          Hide
          Hoss Man added a comment -

          I started reviewing again this afternoon and made a few more tweaks but then quickly encountered a troubling situation:

          There seems to be some set of circumstances that can cause pivot refinement to go into an (infinite?) ridiculously long loop.

          Here's an example log snippet from a test run that i eventually had to explicitly kill after several minutes (normally it finishes in ~40 seconds on my laptop)..

          ...
             [junit4]   2> 365476 T48 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3557}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3557=-8197981690463795098&fpt3557=-7333481702750443698&fpt3557=-5750361150833026124&fpt3557=-1254664925684537075&fpt3557=-790491513359287891&fpt3557=-259812169693239119&fpt3557=5005&fpt3557=5023&fpt3557=434325197357513755&fpt3557=1208379606676285112&fpt3557=2157244738088160377&fpt3557=4049867752092041147&wt=javabin} hits=384 status=0 QTime=3 
             [junit4]   2> 365484 T53 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3558}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3558=-8197981690463795098&fpt3558=-7333481702750443698&fpt3558=-5750361150833026124&fpt3558=-1254664925684537075&fpt3558=-790491513359287891&fpt3558=-259812169693239119&fpt3558=5005&fpt3558=5023&fpt3558=434325197357513755&fpt3558=1208379606676285112&fpt3558=2157244738088160377&fpt3558=4049867752092041147&wt=javabin} hits=384 status=0 QTime=3 
             [junit4]   2> 365493 T50 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3559}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3559=-8197981690463795098&fpt3559=-7333481702750443698&fpt3559=-5750361150833026124&fpt3559=-1254664925684537075&fpt3559=-790491513359287891&fpt3559=-259812169693239119&fpt3559=5005&fpt3559=5023&fpt3559=434325197357513755&fpt3559=1208379606676285112&fpt3559=2157244738088160377&fpt3559=4049867752092041147&wt=javabin} hits=384 status=0 QTime=5 
          ...
          

          A few things to note about those above log lines:

          • with the seed used in this run there was only 740 total docs in the index
          • all three of those requests were made to the same shard/core (C473) on the same port (P35623)
          • the "pivot_l1" field being refined in these requests is a single valued long field - which means even if every random value generated for it were unique, in an index with 740 docs there can only be 740 possible long values here.
          • these requests are already upto fpt=3559 – way more refinements then should be neccessary for this field
          • the shard is being asked to refine the same pivot values over and over again (but with increasing "fpt#####" keys)

          Unfortunately while trying to get to the bottom of this, i realized the way the test was picking the random pivots it used wasn't reproducible with a consistent test seed. I've fixed that, but now i need to hammer on this test some more to try and reproduce again with a reliable seed.


          Small changes to the patch ...

          • TestCloudPivotFacet
            • added explicit sort to String[] fieldNames so buildRandomPivot would reproduce with consistent seed
          • SimpleFacets tweaks i made before encountering the test bug:
            • more javadocs on some subtly diff methods
            • change the new getTermCounts(String,Integer,DocSet) to private since it's only used as a helper for the other public methods
          Show
          Hoss Man added a comment - I started reviewing again this afternoon and made a few more tweaks but then quickly encountered a troubling situation: There seems to be some set of circumstances that can cause pivot refinement to go into an (infinite?) ridiculously long loop. Here's an example log snippet from a test run that i eventually had to explicitly kill after several minutes (normally it finishes in ~40 seconds on my laptop).. ... [junit4] 2> 365476 T48 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3557}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3557=-8197981690463795098&fpt3557=-7333481702750443698&fpt3557=-5750361150833026124&fpt3557=-1254664925684537075&fpt3557=-790491513359287891&fpt3557=-259812169693239119&fpt3557=5005&fpt3557=5023&fpt3557=434325197357513755&fpt3557=1208379606676285112&fpt3557=2157244738088160377&fpt3557=4049867752092041147&wt=javabin} hits=384 status=0 QTime=3 [junit4] 2> 365484 T53 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3558}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3558=-8197981690463795098&fpt3558=-7333481702750443698&fpt3558=-5750361150833026124&fpt3558=-1254664925684537075&fpt3558=-790491513359287891&fpt3558=-259812169693239119&fpt3558=5005&fpt3558=5023&fpt3558=434325197357513755&fpt3558=1208379606676285112&fpt3558=2157244738088160377&fpt3558=4049867752092041147&wt=javabin} hits=384 status=0 QTime=3 [junit4] 2> 365493 T50 C473 P35623 oasc.SolrCore.execute [collection1] webapp= path=/select params={facet.limit=14&facet.pivot={!fpt%3D3559}pivot_y_s,pivot_l1&isShard=true&distrib=false&facet=true&shard.url=https://127.0.0.1:35623/collection1/|https://127.0.0.1:35174/collection1/&version=2&q=*:*&NOW=1403905534861&facet.pivot.mincount=-1&rows=0&fpt3559=-8197981690463795098&fpt3559=-7333481702750443698&fpt3559=-5750361150833026124&fpt3559=-1254664925684537075&fpt3559=-790491513359287891&fpt3559=-259812169693239119&fpt3559=5005&fpt3559=5023&fpt3559=434325197357513755&fpt3559=1208379606676285112&fpt3559=2157244738088160377&fpt3559=4049867752092041147&wt=javabin} hits=384 status=0 QTime=5 ... A few things to note about those above log lines: with the seed used in this run there was only 740 total docs in the index all three of those requests were made to the same shard/core (C473) on the same port (P35623) the "pivot_l1" field being refined in these requests is a single valued long field - which means even if every random value generated for it were unique, in an index with 740 docs there can only be 740 possible long values here. these requests are already upto fpt=3559 – way more refinements then should be neccessary for this field the shard is being asked to refine the same pivot values over and over again (but with increasing "fpt#####" keys) Unfortunately while trying to get to the bottom of this, i realized the way the test was picking the random pivots it used wasn't reproducible with a consistent test seed. I've fixed that, but now i need to hammer on this test some more to try and reproduce again with a reliable seed. Small changes to the patch ... TestCloudPivotFacet added explicit sort to String[] fieldNames so buildRandomPivot would reproduce with consistent seed SimpleFacets tweaks i made before encountering the test bug: more javadocs on some subtly diff methods change the new getTermCounts(String,Integer,DocSet) to private since it's only used as a helper for the other public methods
          Hide
          Hoss Man added a comment -

          Ok, with the last patch, here are a couple of seeds that seem to reliably reproduce some sort of infinite loop for me...

          ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=BE59C186858EBC0E -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8 
          
          ...
          
             [junit4]   2> 75419 T68 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&fpt14287=false,&distrib=false&facet.pivot={!fpt%3D14287}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 
             [junit4]   2> 75425 T67 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14289}pivot_b,pivot_y_s&fpt14289=false,&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=1 
             [junit4]   2> 75430 T69 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14291}pivot_b,pivot_y_s&facet.limit=17&fpt14291=false,&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 
             [junit4]   2> 75435 T70 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&fpt14293=false,&distrib=false&facet.pivot={!fpt%3D14293}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 
             [junit4]   2> 75440 T71 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14295}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&fpt14295=false,&isShard=true} hits=112 status=0 QTime=1 
             [junit4]   2> 75446 T68 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14297}pivot_b,pivot_y_s&fpt14297=false,&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=1 
          
          ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=FFB687151132403E -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8
          
          ...
          
             [junit4]   2> 196659 T68 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&fpt25929=-1.37306931E9&fpt25929=-1.1585728E9&fpt25929=-3.86510688E8&fpt25929=-3.42199296E8&fpt25929=-2.79124352E8&fpt25929=-2.6666448E8&fpt25929=-1.54946432E8&fpt25929=0.125&fpt25929=2956621.2&fpt25929=5.4770541E8&fpt25929=1.16071846E9&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25929}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 
             [junit4]   2> 196668 T64 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&fpt25931=-1.37306931E9&fpt25931=-1.1585728E9&fpt25931=-3.86510688E8&fpt25931=-3.42199296E8&fpt25931=-2.79124352E8&fpt25931=-2.6666448E8&fpt25931=-1.54946432E8&fpt25931=0.125&fpt25931=2956621.2&fpt25931=5.4770541E8&fpt25931=1.16071846E9&facet.pivot={!fpt%3D25931}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 
             [junit4]   2> 196678 T66 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25933}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&fpt25933=-1.37306931E9&fpt25933=-1.1585728E9&fpt25933=-3.86510688E8&fpt25933=-3.42199296E8&fpt25933=-2.79124352E8&fpt25933=-2.6666448E8&fpt25933=-1.54946432E8&fpt25933=0.125&fpt25933=2956621.2&fpt25933=5.4770541E8&fpt25933=1.16071846E9&NOW=1403914294688} hits=269 status=0 QTime=5 
             [junit4]   2> 196687 T69 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&fpt25935=-1.37306931E9&fpt25935=-1.1585728E9&fpt25935=-3.86510688E8&fpt25935=-3.42199296E8&fpt25935=-2.79124352E8&fpt25935=-2.6666448E8&fpt25935=-1.54946432E8&fpt25935=0.125&fpt25935=2956621.2&fpt25935=5.4770541E8&fpt25935=1.16071846E9&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25935}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 
          

          (NOTE: Since the whole problem is that these seeds seem to go into infinite loops, and i didn't feel like waiting for the test framework to time them out after an hour, i pulled the seeds out of the junit "Master seed: XXXXX" log output after killing the tests manually. The other tests.* sys props are just constants i picked at random when trying to reproduce to ensure that the "ant test ..." lines i posted here would be fully reproducible)

          By the looks of things, it looks the problem seems to be poping up when a refinement constraint in a multi-level pivot involves the empty string (and/or missing values?)

          Looking back at the log snippet i posted in my previous comment (facet.pivot=pivot_y_s,pivot_l1) and comparing that with the refinement requests in test runs that pass, i realize how none of those refinements on the pivot_l1 long values had a string prefix – so perhaps the code was getting confused about what it was supose to return, and that was then causing hte coordinator to re-request?

          just speculating here ... Andrew Muldowney & Brett Lucey – does that sound plausible to you?

          Show
          Hoss Man added a comment - Ok, with the last patch, here are a couple of seeds that seem to reliably reproduce some sort of infinite loop for me... ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=BE59C186858EBC0E -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8 ... [junit4] 2> 75419 T68 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&fpt14287=false,&distrib=false&facet.pivot={!fpt%3D14287}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 [junit4] 2> 75425 T67 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14289}pivot_b,pivot_y_s&fpt14289=false,&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=1 [junit4] 2> 75430 T69 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14291}pivot_b,pivot_y_s&facet.limit=17&fpt14291=false,&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 [junit4] 2> 75435 T70 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&fpt14293=false,&distrib=false&facet.pivot={!fpt%3D14293}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=2 [junit4] 2> 75440 T71 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14295}pivot_b,pivot_y_s&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&fpt14295=false,&isShard=true} hits=112 status=0 QTime=1 [junit4] 2> 75446 T68 C104 P58648 oasc.SolrCore.execute [collection1] webapp=/vv_ path=/select params={NOW=1403913254268&version=2&facet.pivot.mincount=-1&facet=true&distrib=false&facet.pivot={!fpt%3D14297}pivot_b,pivot_y_s&fpt14297=false,&facet.limit=17&fq=id:[*+TO+232]&shard.url=https://127.0.0.1:58648/vv_/collection1/|https://127.0.0.1:58190/vv_/collection1/&rows=0&q=*:*&wt=javabin&isShard=true} hits=112 status=0 QTime=1 ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=FFB687151132403E -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8 ... [junit4] 2> 196659 T68 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&fpt25929=-1.37306931E9&fpt25929=-1.1585728E9&fpt25929=-3.86510688E8&fpt25929=-3.42199296E8&fpt25929=-2.79124352E8&fpt25929=-2.6666448E8&fpt25929=-1.54946432E8&fpt25929=0.125&fpt25929=2956621.2&fpt25929=5.4770541E8&fpt25929=1.16071846E9&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25929}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 [junit4] 2> 196668 T64 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&fpt25931=-1.37306931E9&fpt25931=-1.1585728E9&fpt25931=-3.86510688E8&fpt25931=-3.42199296E8&fpt25931=-2.79124352E8&fpt25931=-2.6666448E8&fpt25931=-1.54946432E8&fpt25931=0.125&fpt25931=2956621.2&fpt25931=5.4770541E8&fpt25931=1.16071846E9&facet.pivot={!fpt%3D25931}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 [junit4] 2> 196678 T66 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25933}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&fpt25933=-1.37306931E9&fpt25933=-1.1585728E9&fpt25933=-3.86510688E8&fpt25933=-3.42199296E8&fpt25933=-2.79124352E8&fpt25933=-2.6666448E8&fpt25933=-1.54946432E8&fpt25933=0.125&fpt25933=2956621.2&fpt25933=5.4770541E8&fpt25933=1.16071846E9&NOW=1403914294688} hits=269 status=0 QTime=5 [junit4] 2> 196687 T69 C356 P40661 oasc.SolrCore.execute [collection1] webapp=/vtut path=/select params={facet=true&fpt25935=-1.37306931E9&fpt25935=-1.1585728E9&fpt25935=-3.86510688E8&fpt25935=-3.42199296E8&fpt25935=-2.79124352E8&fpt25935=-2.6666448E8&fpt25935=-1.54946432E8&fpt25935=0.125&fpt25935=2956621.2&fpt25935=5.4770541E8&fpt25935=1.16071846E9&q=*:*&wt=javabin&rows=0&facet.limit=12&isShard=true&facet.pivot={!fpt%3D25935}pivot_x_s1,pivot_f&shard.url=http://127.0.0.1:40661/vtut/collection1/|http://127.0.0.1:35181/vtut/collection1/&facet.pivot.mincount=-1&version=2&distrib=false&NOW=1403914294688} hits=269 status=0 QTime=5 ( NOTE: Since the whole problem is that these seeds seem to go into infinite loops, and i didn't feel like waiting for the test framework to time them out after an hour, i pulled the seeds out of the junit "Master seed: XXXXX" log output after killing the tests manually. The other tests.* sys props are just constants i picked at random when trying to reproduce to ensure that the "ant test ..." lines i posted here would be fully reproducible) By the looks of things, it looks the problem seems to be poping up when a refinement constraint in a multi-level pivot involves the empty string (and/or missing values?) Looking back at the log snippet i posted in my previous comment (facet.pivot=pivot_y_s,pivot_l1) and comparing that with the refinement requests in test runs that pass, i realize how none of those refinements on the pivot_l1 long values had a string prefix – so perhaps the code was getting confused about what it was supose to return, and that was then causing hte coordinator to re-request? just speculating here ... Andrew Muldowney & Brett Lucey – does that sound plausible to you?
          Hide
          Hoss Man added a comment -

          By the looks of things, it looks the problem seems to be poping up when a refinement constraint in a multi-level pivot involves the empty string (and/or missing values?)

          Hmmm... both cases definitely seem to be problematic:

          • refining on values that are the empty string ""
          • refining against the null psuedo-values when using facet.missing

          (Note: TestCloudPivotFacet currently doesn't even try facet.missing – need to rememedy that in a future patch)


          The attached patch update modifies DistributedFacetPivotLargeTest to add a new "special_s" field to a handful of docs – some of which get the value of SPECIAL (final String SPECIAL = ""; and it goes into a loop here...

              // refine on empty string
              rsp = query( "q", "*:*",
                           "rows", "0",
                           "facet","true",
                           "facet.limit","1",
                           FacetParams.FACET_OVERREQUEST_RATIO, "0", // force refine
                           FacetParams.FACET_OVERREQUEST_COUNT, "0", // force refine
                           "facet.pivot","special_s,company_t");
          
             [junit4]   2> 32409 T43 C21 oasc.SolrCore.execute [collection1] webapp=/po_cuf path=/select params={shard.url=[ff01::083]:33332/po_cuf|[ff01::213]:33332/po_cuf|http://127.0.0.1:37920/po_cuf&NOW=1403920234230&rows=0&isShard=true&distrib=false&wt=javabin&fpt2938=&facet.pivot.mincount=-1&facet.overrequest.count=0&q=*:*&version=2&facet.pivot={!fpt%3D2938}special_s,company_t&facet.overrequest.ratio=0&facet=true&facet.limit=1} hits=357 status=0 QTime=0 
             [junit4]   2> 32413 T42 C21 oasc.SolrCore.execute [collection1] webapp=/po_cuf path=/select params={shard.url=[ff01::083]:33332/po_cuf|[ff01::213]:33332/po_cuf|http://127.0.0.1:37920/po_cuf&NOW=1403920234230&rows=0&isShard=true&distrib=false&wt=javabin&fpt2939=&facet.pivot.mincount=-1&facet.overrequest.count=0&q=*:*&version=2&facet.pivot={!fpt%3D2939}special_s,company_t&facet.overrequest.ratio=0&facet=true&facet.limit=1} hits=357 status=0 QTime=0 
          

          (Note the ...&fpt2938=&... and ...&fpt2939=&...)

          Even if you redefine SPECIAL to be some other constant (ie: SPECIAL = "SPECIAL";) the code still goes into a loop in the next call, where facet.missing is used and refinement is needed on the "missing" value...

              // refine on empty string & facet.missing
              rsp = query( "q", "*:*",
                           "fq", "-place_s:0placeholder",
                           "rows", "0",
                           "facet","true",
                           "facet.limit","1",
                           "facet.missing","true",
                           FacetParams.FACET_OVERREQUEST_RATIO, "0", // force refine
                           FacetParams.FACET_OVERREQUEST_COUNT, "0", // force refine
                           "facet.pivot","special_s,company_t");
          
             [junit4]   2> 26798 T53 C19 oasc.SolrCore.execute [collection1] webapp=/do_ path=/select params={facet.overrequest.ratio=0&wt=javabin&facet.missing=true&facet.limit=1&facet.pivot.mincount=-1&facet.pivot={!fpt%3D2151}special_s,company_t&fpt2151=null,microsoft&distrib=false&version=2&shard.url=[ff01::083]:33332/do_|[ff01::213]:33332/do_|https://127.0.0.1:36955/do_|[ff01::114]:33332/do_&facet=true&q=*:*&rows=0&fq=-place_s:0placeholder&NOW=1403920466501&isShard=true&facet.overrequest.count=0} hits=202 status=0 QTime=0 
             [junit4]   2> 26802 T54 C19 oasc.SolrCore.execute [collection1] webapp=/do_ path=/select params={facet.overrequest.ratio=0&wt=javabin&facet.missing=true&facet.limit=1&facet.pivot.mincount=-1&facet.pivot={!fpt%3D2153}special_s,company_t&distrib=false&version=2&shard.url=[ff01::083]:33332/do_|[ff01::213]:33332/do_|https://127.0.0.1:36955/do_|[ff01::114]:33332/do_&facet=true&q=*:*&rows=0&fpt2153=null,microsoft&fq=-place_s:0placeholder&NOW=1403920466501&isShard=true&facet.overrequest.count=0} hits=202 status=0 QTime=1 
          

          (Note the ...&fpt2151=null,microsoft&... and ...&fpt2153=null,microsoft&...)


          It looks like we need to rethink how the values are encoded into a path for the purpose of refinement so we can account for and differentiate between missing values, the empty string (0 chars), and the literal string "null" (4 chars)

          Show
          Hoss Man added a comment - By the looks of things, it looks the problem seems to be poping up when a refinement constraint in a multi-level pivot involves the empty string (and/or missing values?) Hmmm... both cases definitely seem to be problematic: refining on values that are the empty string "" refining against the null psuedo-values when using facet.missing (Note: TestCloudPivotFacet currently doesn't even try facet.missing – need to rememedy that in a future patch) The attached patch update modifies DistributedFacetPivotLargeTest to add a new "special_s" field to a handful of docs – some of which get the value of SPECIAL ( final String SPECIAL = ""; and it goes into a loop here... // refine on empty string rsp = query( "q" , "*:*" , "rows" , "0" , "facet" , " true " , "facet.limit" , "1" , FacetParams.FACET_OVERREQUEST_RATIO, "0" , // force refine FacetParams.FACET_OVERREQUEST_COUNT, "0" , // force refine "facet.pivot" , "special_s,company_t" ); [junit4] 2> 32409 T43 C21 oasc.SolrCore.execute [collection1] webapp=/po_cuf path=/select params={shard.url=[ff01::083]:33332/po_cuf|[ff01::213]:33332/po_cuf|http://127.0.0.1:37920/po_cuf&NOW=1403920234230&rows=0&isShard=true&distrib=false&wt=javabin&fpt2938=&facet.pivot.mincount=-1&facet.overrequest.count=0&q=*:*&version=2&facet.pivot={!fpt%3D2938}special_s,company_t&facet.overrequest.ratio=0&facet=true&facet.limit=1} hits=357 status=0 QTime=0 [junit4] 2> 32413 T42 C21 oasc.SolrCore.execute [collection1] webapp=/po_cuf path=/select params={shard.url=[ff01::083]:33332/po_cuf|[ff01::213]:33332/po_cuf|http://127.0.0.1:37920/po_cuf&NOW=1403920234230&rows=0&isShard=true&distrib=false&wt=javabin&fpt2939=&facet.pivot.mincount=-1&facet.overrequest.count=0&q=*:*&version=2&facet.pivot={!fpt%3D2939}special_s,company_t&facet.overrequest.ratio=0&facet=true&facet.limit=1} hits=357 status=0 QTime=0 (Note the ...&fpt2938=&... and ...&fpt2939=&... ) Even if you redefine SPECIAL to be some other constant (ie: SPECIAL = "SPECIAL"; ) the code still goes into a loop in the next call, where facet.missing is used and refinement is needed on the "missing" value... // refine on empty string & facet.missing rsp = query( "q" , "*:*" , "fq" , "-place_s:0placeholder" , "rows" , "0" , "facet" , " true " , "facet.limit" , "1" , "facet.missing" , " true " , FacetParams.FACET_OVERREQUEST_RATIO, "0" , // force refine FacetParams.FACET_OVERREQUEST_COUNT, "0" , // force refine "facet.pivot" , "special_s,company_t" ); [junit4] 2> 26798 T53 C19 oasc.SolrCore.execute [collection1] webapp=/do_ path=/select params={facet.overrequest.ratio=0&wt=javabin&facet.missing=true&facet.limit=1&facet.pivot.mincount=-1&facet.pivot={!fpt%3D2151}special_s,company_t&fpt2151=null,microsoft&distrib=false&version=2&shard.url=[ff01::083]:33332/do_|[ff01::213]:33332/do_|https://127.0.0.1:36955/do_|[ff01::114]:33332/do_&facet=true&q=*:*&rows=0&fq=-place_s:0placeholder&NOW=1403920466501&isShard=true&facet.overrequest.count=0} hits=202 status=0 QTime=0 [junit4] 2> 26802 T54 C19 oasc.SolrCore.execute [collection1] webapp=/do_ path=/select params={facet.overrequest.ratio=0&wt=javabin&facet.missing=true&facet.limit=1&facet.pivot.mincount=-1&facet.pivot={!fpt%3D2153}special_s,company_t&distrib=false&version=2&shard.url=[ff01::083]:33332/do_|[ff01::213]:33332/do_|https://127.0.0.1:36955/do_|[ff01::114]:33332/do_&facet=true&q=*:*&rows=0&fpt2153=null,microsoft&fq=-place_s:0placeholder&NOW=1403920466501&isShard=true&facet.overrequest.count=0} hits=202 status=0 QTime=1 (Note the ...&fpt2151=null,microsoft&... and ...&fpt2153=null,microsoft&... ) It looks like we need to rethink how the values are encoded into a path for the purpose of refinement so we can account for and differentiate between missing values, the empty string (0 chars), and the literal string "null" (4 chars)
          Hide
          Hoss Man added a comment -

          It looks like we need to rethink how the values are encoded into a path for the purpose of refinement so we can account for and differentiate between missing values, the empty string (0 chars), and the literal string "null" (4 chars)

          I've been working on this for the last few days - cleaning up how we deal with the "refinement" strings so that facet.missing and/or empty strings ("") in fields won't be problematic.

          It's been slow going as i tried to be systematic about refactoring & documenting methods as i went along and started understanding more and more of the code.

          The bulk of the changes i made can be summarized as:

          1. make the "valuePath" tracking more structured via List<String> instead of building up single comma seperated refinement string right off the bat
          2. refactor the encoding/decoding of the refinement strings into a utility method thta can handle null and empty string.
          3. refactor the refinement count & subset computation so that it can actually handle facet.missing correctly (before attempts at refining facet.missing were just looking for the term "null" (ie: 4 characters)

          Full details on how this patch differs from the lsat one are listed below – but as things stand right now there is still a nasty bug somewhere in the facet.missing processing that i can't wrap my head arround...

          In short: when facet.missing is enabled in the SPECIAL test i mentioned in my last comment, it's somehow causing the refined counts of of the non-missing SPECIAL value to be wrong (even if the SPECIAL value is a regular string, and not "").

          I can't really wrap my head arround how that's happening – it's going to involve some more manual testing & some more unit tests to get to the bottom of it, but in the mean time I wanted to get this patch posted.

          If folks could review it & sanity check that i'm not doing something stupid with the refinement that would be appreciated.


          Detailed changes in this patch iteration...

          • PivotFacetHelper
            • add new encodeRefinementValuePath & decodeRefinementValuePath methods
              • special encoding to handle empty strings (should be valid when pivoting) and null values (needed for facet.missing refinement)
            • add tests in TestPivotHelperCode
          • PivotFacetValue & PivotFacetField
            • in general, make these a bit more structured
            • eliminate "fieldPath" since it's unused
            • replace PivotFacetValue.field (String) with a ref to the actual parentPivot (PivotFacetField)
            • add PivotFacetField.parentValue (PivotFacetValue) to ref the value this pivot field is nested under (if any)
            • replace valuePath with getValuePath() (List<String>) to track the full structure
          • FacetComponent
            • prune some big chunks of commented out code (alt approaches no longer needed it looks like?)
            • use new PivotFacetValue.getValuePath() + PivotFacetHelper.encodeRefinementValuePath instead of PivotFacetValue.valuePath
          • SimpleFacets
            • make getListedTermCounts(String,String) private again & add javadocs clarifing that it smarSplits the list of terms
            • convert getListedTermCounts(String,String,DocSet) -> getListedTermCounts(String,DocSet,List<String>)
              • ie: pull the split logic out of this method, since it's confusing, and some callers don't need it.
              • add javadocs
              • updated SimpleFacets callers to do the split themselves
          • PivotFacetProcessor
            • refactor subset logic (that dealt with missing values via negatived range query) into "getSubset" helper method
              • add complimentary "getSubsetSize" method as well
            • update previous callers of getListedTermCounts(String,String,DocSet) to use getSubsetSize instead in order to correctly handle the refinements of null (ie: facet.missing)
            • refactor & cleanup processSingle:
              • have caller do the field splitting & validation (eliminates redundency when refining many values)
              • stop treating empty string as special case, switch conditionals that were looking at first value to look at list size directly
          • misc new javadocs on various methods throughout hte above mentioned files

          Misc notes for the future:

          • even if/when we get the refinement logic fixed, we really need some safety check to ensure we've completely eliminated this possibility of an infinite loop on refinement:
            • coordinator should assert that if if asks shard for a refinement, that refinement is returned
            • shard should assert that if it's asked to refine, the #vals makes sense for the #fields in the pivot
          • we need to include more testing of facet.missing:
            • randomized testing in in TestCloudPivotFacet
            • more usage of it in the Small & Large tests.
          • in general, we need more testing that we know triggers refinement
            • ie: the "Small" test already does a bunch with facet.missing, but I guess that never caught ny of these bugs, because refinement was never needed?
            • randomly set small overrequest values in TestCloudPivotFacet ?
          • for completeness, we should do some testing of literal string value "null" (4 chars) in pivot fields
          • we aren't doing enough testing of multiple facet.pivot params in a single request - need to make sure that when refinement happens, those aren't colliding
            • in particularly i'm wondering about facet.pivot={!key=aaa}foo,bar&facet.pivot={!key=bbb}foo,bar type stuff
          Show
          Hoss Man added a comment - It looks like we need to rethink how the values are encoded into a path for the purpose of refinement so we can account for and differentiate between missing values, the empty string (0 chars), and the literal string "null" (4 chars) I've been working on this for the last few days - cleaning up how we deal with the "refinement" strings so that facet.missing and/or empty strings ("") in fields won't be problematic. It's been slow going as i tried to be systematic about refactoring & documenting methods as i went along and started understanding more and more of the code. The bulk of the changes i made can be summarized as: make the "valuePath" tracking more structured via List<String> instead of building up single comma seperated refinement string right off the bat refactor the encoding/decoding of the refinement strings into a utility method thta can handle null and empty string. refactor the refinement count & subset computation so that it can actually handle facet.missing correctly (before attempts at refining facet.missing were just looking for the term "null" (ie: 4 characters) Full details on how this patch differs from the lsat one are listed below – but as things stand right now there is still a nasty bug somewhere in the facet.missing processing that i can't wrap my head arround... In short: when facet.missing is enabled in the SPECIAL test i mentioned in my last comment, it's somehow causing the refined counts of of the non-missing SPECIAL value to be wrong (even if the SPECIAL value is a regular string, and not ""). I can't really wrap my head arround how that's happening – it's going to involve some more manual testing & some more unit tests to get to the bottom of it, but in the mean time I wanted to get this patch posted. If folks could review it & sanity check that i'm not doing something stupid with the refinement that would be appreciated. Detailed changes in this patch iteration... PivotFacetHelper add new encodeRefinementValuePath & decodeRefinementValuePath methods special encoding to handle empty strings (should be valid when pivoting) and null values (needed for facet.missing refinement) add tests in TestPivotHelperCode PivotFacetValue & PivotFacetField in general, make these a bit more structured eliminate "fieldPath" since it's unused replace PivotFacetValue.field (String) with a ref to the actual parentPivot (PivotFacetField) add PivotFacetField.parentValue (PivotFacetValue) to ref the value this pivot field is nested under (if any) replace valuePath with getValuePath() (List<String>) to track the full structure FacetComponent prune some big chunks of commented out code (alt approaches no longer needed it looks like?) use new PivotFacetValue.getValuePath() + PivotFacetHelper.encodeRefinementValuePath instead of PivotFacetValue.valuePath SimpleFacets make getListedTermCounts(String,String) private again & add javadocs clarifing that it smarSplits the list of terms convert getListedTermCounts(String,String,DocSet) -> getListedTermCounts(String,DocSet,List<String>) ie: pull the split logic out of this method, since it's confusing, and some callers don't need it. add javadocs updated SimpleFacets callers to do the split themselves PivotFacetProcessor refactor subset logic (that dealt with missing values via negatived range query) into "getSubset" helper method add complimentary "getSubsetSize" method as well update previous callers of getListedTermCounts(String,String,DocSet) to use getSubsetSize instead in order to correctly handle the refinements of null (ie: facet.missing) refactor & cleanup processSingle: have caller do the field splitting & validation (eliminates redundency when refining many values) stop treating empty string as special case, switch conditionals that were looking at first value to look at list size directly misc new javadocs on various methods throughout hte above mentioned files Misc notes for the future: even if/when we get the refinement logic fixed, we really need some safety check to ensure we've completely eliminated this possibility of an infinite loop on refinement: coordinator should assert that if if asks shard for a refinement, that refinement is returned shard should assert that if it's asked to refine, the #vals makes sense for the #fields in the pivot we need to include more testing of facet.missing: randomized testing in in TestCloudPivotFacet more usage of it in the Small & Large tests. in general, we need more testing that we know triggers refinement ie: the "Small" test already does a bunch with facet.missing, but I guess that never caught ny of these bugs, because refinement was never needed? randomly set small overrequest values in TestCloudPivotFacet ? for completeness, we should do some testing of literal string value "null" (4 chars) in pivot fields we aren't doing enough testing of multiple facet.pivot params in a single request - need to make sure that when refinement happens, those aren't colliding in particularly i'm wondering about facet.pivot={!key=aaa}foo,bar&facet.pivot={!key=bbb}foo,bar type stuff
          Hide
          Brett Lucey added a comment -

          Hoss,

          Sorry we've been a little quiet lately. I was out of town for two weeks and Andrew has been on vacation as well. We plan on digging back into this next week. The endless loop is definitely a concern and we will focus on that first if your changes haven't already fixed that. What we are wondering is if you feel we could get a preliminary version of this committed if we can resolve that loop? I have a few ideas we can do to prevent infinite looping from ever happening even if we don't have the information we expected. We are really hoping to see this be a part of the next 4.x release in some form, and it would allow us to start getting more feedback from a broader base.

          Thanks,
          -Brett

          Show
          Brett Lucey added a comment - Hoss, Sorry we've been a little quiet lately. I was out of town for two weeks and Andrew has been on vacation as well. We plan on digging back into this next week. The endless loop is definitely a concern and we will focus on that first if your changes haven't already fixed that. What we are wondering is if you feel we could get a preliminary version of this committed if we can resolve that loop? I have a few ideas we can do to prevent infinite looping from ever happening even if we don't have the information we expected. We are really hoping to see this be a part of the next 4.x release in some form, and it would allow us to start getting more feedback from a broader base. Thanks, -Brett
          Hide
          Andrew Muldowney added a comment -

          Hey Hoss, I should have time this week and next to investigate the infinite loop and try to implement some of your other requests.

          Show
          Andrew Muldowney added a comment - Hey Hoss, I should have time this week and next to investigate the infinite loop and try to implement some of your other requests.
          Hide
          Hoss Man added a comment -

          ... – but as things stand right now there is still a nasty bug somewhere in the facet.missing processing that i can't wrap my head arround...

          I spent today doing some manual testing with some small amounts of data, and looking at the shard requests triggered by each request. I then started reading through more of the refinement code (first time i've looked at a fair bit of this) and i think i've figured out what's going on (but i don't have a fix for it yet)...

          Basically: the PivotFacetField class, that holds a List<PivotFacetValue> doesn't do anything special as far as keeping track of the PivotFacetValue that represents the facet.missing value (ie: the PivotFacetValue where PivotFacetValue.value==null). This means that in methods like PivotFacetField.sort() and PivotFacetField.queuePivotRefinementRequests(...) the PivotFacetValue for facet.missing is mixed in with the other values and included in considerations about what the cutoff "countThreshold" is for refinement, even though it's not affected by facet.limit and should always be returned

          This means that in the test i added that has facet.limit=1&facet.missing=true the null vaue from facet.missing is the only value considered in the "top 1" of the constraints, and has a count much higher then the count for the SPECIAL value – which means SPECIAL doesn't even qualify for the processPossibleCandidateElement logic so it never gets refined at all.

          I think the best course of action is to cleanup PivotFacetField a bit, so that in addition to the List<PivotFacetValue> of values that are subject to the facet.limit, a specfic "missingValue" variable should be added to track the corrisponding PivotFacetValue – this should make the value sorting & refinement logic in queuePivotRefinementRequests() accurate as is, at the cost of slightly more complex (but accurately modeled) logic in createFromListOfNamedLists() and convertToListOfNamedLists().

          What do folks think?


          The endless loop is definitely a concern and we will focus on that first if your changes haven't already fixed that.

          The root cause of the infinite loop seems to be that the formating/parsing of of the refinment params wasn't in sync (ie: empty strings weren't being included at all, while facet.missing values were being encoded as "null" which owuld then be parsed as 4 character string literals) ... so that cause should be fixed in my latest patch.

          What still concerns me though is that there is evidently no general sanity check in the code to prevent the distributed logic in the coordinator from retrying to refine values over and over again even if the shard never responses back with a number for it (ie: if some future bug gets introduced in the refine code that runs on the shards, or if some shard has been misconfigured to have a hard coded invaraint of facet=false, etc...). That's the sort of edge case that may be really hard to test for, but even if we can't explicit test it, we should at least have some sanity check in the distrib coordination code that says "we already asked shardX for refinementY and still don't have it, throw 5xx error!" instead of "still need refinementY from shardX, ..., still need refinementY from shardX, ..." which is what seems to be happening right now.

          What we are wondering is if you feel we could get a preliminary version of this committed if we can resolve that loop?

          I'm not comfortable committing features unless i know they work – particularly something like this, where it's adding distributed support to an existing core feature. I don't want existing pivot users to see "oh, distributed pivot support has been added" and upgrade to SolrCloud and then start getting silently incorrect results.

          Rest assured however: I'm dedicated to continuing to working through this issue, and helping to fix whatever bugs we find, until it's ready to be committed. I won't leave you hanging.


          Hey Hoss, I should have time this week and next to investigate the infinite loop and try to implement some of your other requests.

          That's great – like i mentioned above, i think a sanity check on the infinite loop is important, but i suspect it should be fairly trivial (i'm just not 100% certain where it makes sense to put it yet)

          I think the biggest concern right now however is addressing the bugs with how facet.missing impacts refinement and kicks value values out of contention due to the modeling in PivotFacetField

          If you have time, and can help out with making those changes, I can go ahead and focus on the additional tests i was describing – which is probably the best way to divide & conquer the problem since you guys already know the code internals better then me. Then you can help review my tests, and i can help review the PivotFacetField changes.

          sound good?

          Show
          Hoss Man added a comment - ... – but as things stand right now there is still a nasty bug somewhere in the facet.missing processing that i can't wrap my head arround... I spent today doing some manual testing with some small amounts of data, and looking at the shard requests triggered by each request. I then started reading through more of the refinement code (first time i've looked at a fair bit of this) and i think i've figured out what's going on (but i don't have a fix for it yet)... Basically: the PivotFacetField class, that holds a List<PivotFacetValue> doesn't do anything special as far as keeping track of the PivotFacetValue that represents the facet.missing value (ie: the PivotFacetValue where PivotFacetValue.value==null ). This means that in methods like PivotFacetField.sort() and PivotFacetField.queuePivotRefinementRequests(...) the PivotFacetValue for facet.missing is mixed in with the other values and included in considerations about what the cutoff "countThreshold" is for refinement, even though it's not affected by facet.limit and should always be returned This means that in the test i added that has facet.limit=1&facet.missing=true the null vaue from facet.missing is the only value considered in the "top 1" of the constraints, and has a count much higher then the count for the SPECIAL value – which means SPECIAL doesn't even qualify for the processPossibleCandidateElement logic so it never gets refined at all. – I think the best course of action is to cleanup PivotFacetField a bit, so that in addition to the List<PivotFacetValue> of values that are subject to the facet.limit, a specfic "missingValue" variable should be added to track the corrisponding PivotFacetValue – this should make the value sorting & refinement logic in queuePivotRefinementRequests() accurate as is, at the cost of slightly more complex (but accurately modeled) logic in createFromListOfNamedLists() and convertToListOfNamedLists() . What do folks think? The endless loop is definitely a concern and we will focus on that first if your changes haven't already fixed that. The root cause of the infinite loop seems to be that the formating/parsing of of the refinment params wasn't in sync (ie: empty strings weren't being included at all, while facet.missing values were being encoded as "null" which owuld then be parsed as 4 character string literals) ... so that cause should be fixed in my latest patch. What still concerns me though is that there is evidently no general sanity check in the code to prevent the distributed logic in the coordinator from retrying to refine values over and over again even if the shard never responses back with a number for it (ie: if some future bug gets introduced in the refine code that runs on the shards, or if some shard has been misconfigured to have a hard coded invaraint of facet=false , etc...). That's the sort of edge case that may be really hard to test for, but even if we can't explicit test it, we should at least have some sanity check in the distrib coordination code that says "we already asked shardX for refinementY and still don't have it, throw 5xx error!" instead of "still need refinementY from shardX, ..., still need refinementY from shardX, ..." which is what seems to be happening right now. What we are wondering is if you feel we could get a preliminary version of this committed if we can resolve that loop? I'm not comfortable committing features unless i know they work – particularly something like this, where it's adding distributed support to an existing core feature. I don't want existing pivot users to see "oh, distributed pivot support has been added" and upgrade to SolrCloud and then start getting silently incorrect results. Rest assured however: I'm dedicated to continuing to working through this issue, and helping to fix whatever bugs we find, until it's ready to be committed. I won't leave you hanging. Hey Hoss, I should have time this week and next to investigate the infinite loop and try to implement some of your other requests. That's great – like i mentioned above, i think a sanity check on the infinite loop is important, but i suspect it should be fairly trivial (i'm just not 100% certain where it makes sense to put it yet) I think the biggest concern right now however is addressing the bugs with how facet.missing impacts refinement and kicks value values out of contention due to the modeling in PivotFacetField If you have time, and can help out with making those changes, I can go ahead and focus on the additional tests i was describing – which is probably the best way to divide & conquer the problem since you guys already know the code internals better then me. Then you can help review my tests, and i can help review the PivotFacetField changes. sound good?
          Hide
          Steve Molloy added a comment -

          Quick note on PivotFacetHelper's retrieve method. I understand the desire for good performance and more than agree with it. But with some entries being optional (statistics and qcount from SOLR-3583 and SOLR-4212 for instance), this causes the lookup to start after the proper position thus not finding entries that are there. I don't have a better solution than starting from 0 currently, but I'm sure there's something that can be done to keep at least some of the speed improvement while still being able to support optional entries. Maybe force all optional to the end of the list, lookup by index for required ones (field, value, count) and starting at first optional spot for the rest?

          Show
          Steve Molloy added a comment - Quick note on PivotFacetHelper's retrieve method. I understand the desire for good performance and more than agree with it. But with some entries being optional (statistics and qcount from SOLR-3583 and SOLR-4212 for instance), this causes the lookup to start after the proper position thus not finding entries that are there. I don't have a better solution than starting from 0 currently, but I'm sure there's something that can be done to keep at least some of the speed improvement while still being able to support optional entries. Maybe force all optional to the end of the list, lookup by index for required ones (field, value, count) and starting at first optional spot for the rest?
          Hide
          Andrew Muldowney added a comment -

          That sounds good Hoss

          Show
          Andrew Muldowney added a comment - That sounds good Hoss
          Hide
          Hoss Man added a comment -

          Quick note on PivotFacetHelper's retrieve method ...

          I haven't really been aware of those other issues until now (although SOLR-3583 may explain some of the unused code i pruned from PivotListEntry a few patches ago) but i agree with your assessment: if/when enhancements to distributed pivots start dealing with adding optional data to each level of the pivot, the appraoch currently used will have to change.

          (Personally: I'm not emotionally ready to put any serious thought into that level of implementation detail in future pivot improvements - i want to focus on getting the basics of distrib pivots solid & released first)


          Updated patch with most of the tests i had in mind that i mentioned before (although i'd still like to add some more facet.missing tests)...

          • TestCloudPivotFacet
            • randomize overrequest amounts
            • randomize facet.mincount usage & assert never exceded
            • randomize facet.missing usage & assert that null values are only ever last in list of values
              • make the odds of docs missing a field more randomized (across test runs)
            • add in the possibility of trying to pivot on a field that is in 0 docs
            • Dial back some constants to reduce OOM risk when running -Dtests.nightly=true
            • example refine count failure from the facet.missing problem (unless there's another bug that looks really similar) with these changes:
              • ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=98C12D5256897A09 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=sr -Dtests.timezone=America/Louisville -Dtests.file.encoding=UTF-8
          • DistributedFacetPivotLongTailTest
            • some data tweaks & an additional assertion to ensure refinement is happening
          • DistributedFacetPivotSmallTest
            • s/honda/null/g - help test that the 4 character string "null" isn't triggering any special behavior, or getting confused with a missing value in docs.
          • DistributedFacetPivotLargeTest
            • comment & assert noting that a shard is left empty (helps with edge case testing of result merging & refinement)
            • added "assertPivot" helper method & did a bit of refactoring
            • added test of 2 diff pivots in the same request (swap field order)
            • added test of same bi-level pivot with & w/o a tagged fq exclusion in the same request
            • added test variants of facet.limit & facet.index used as localparam
              • currently commented out because it doesn't work – see SOLR-6193

          The problem noted above with using facet.* params as local params in facet.pivot is something i discovered earlier this week while writing up these tests. I initially set the problem set it asside to keep working on tests, with hte intention of looking into a fix once i had better coverage of the problem – but then when i came back to revisit it yesterdan and looked to the existing facet.field shard request logic for guidance, i discovered that didn't seem to work the way i expected either and realized John Gibson recently filed SOLR-6193 because facet.field does have the exact same problem.

          i don't think we should let this block adding distributed facet.pivot, let's tackle it holisticly for all faceting in SOLR-6193.


          Andrew/Brett: have you guys had a chance to look into the refinement bug when facet.missing is used?

          (BTW: my update patch only affected test files, so hopefully theres no collision with anything you guys have been working on – but if there is, feel free to just post whatever patch you guys come up with and i'll handle the merge)

          Show
          Hoss Man added a comment - Quick note on PivotFacetHelper's retrieve method ... I haven't really been aware of those other issues until now (although SOLR-3583 may explain some of the unused code i pruned from PivotListEntry a few patches ago) but i agree with your assessment: if/when enhancements to distributed pivots start dealing with adding optional data to each level of the pivot, the appraoch currently used will have to change. (Personally: I'm not emotionally ready to put any serious thought into that level of implementation detail in future pivot improvements - i want to focus on getting the basics of distrib pivots solid & released first) Updated patch with most of the tests i had in mind that i mentioned before (although i'd still like to add some more facet.missing tests)... TestCloudPivotFacet randomize overrequest amounts randomize facet.mincount usage & assert never exceded randomize facet.missing usage & assert that null values are only ever last in list of values make the odds of docs missing a field more randomized (across test runs) add in the possibility of trying to pivot on a field that is in 0 docs Dial back some constants to reduce OOM risk when running -Dtests.nightly=true example refine count failure from the facet.missing problem (unless there's another bug that looks really similar) with these changes: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=98C12D5256897A09 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=sr -Dtests.timezone=America/Louisville -Dtests.file.encoding=UTF-8 DistributedFacetPivotLongTailTest some data tweaks & an additional assertion to ensure refinement is happening DistributedFacetPivotSmallTest s/honda/null/g - help test that the 4 character string "null" isn't triggering any special behavior, or getting confused with a missing value in docs. DistributedFacetPivotLargeTest comment & assert noting that a shard is left empty (helps with edge case testing of result merging & refinement) added "assertPivot" helper method & did a bit of refactoring added test of 2 diff pivots in the same request (swap field order) added test of same bi-level pivot with & w/o a tagged fq exclusion in the same request added test variants of facet.limit & facet.index used as localparam currently commented out because it doesn't work – see SOLR-6193 The problem noted above with using facet.* params as local params in facet.pivot is something i discovered earlier this week while writing up these tests. I initially set the problem set it asside to keep working on tests, with hte intention of looking into a fix once i had better coverage of the problem – but then when i came back to revisit it yesterdan and looked to the existing facet.field shard request logic for guidance, i discovered that didn't seem to work the way i expected either and realized John Gibson recently filed SOLR-6193 because facet.field does have the exact same problem. i don't think we should let this block adding distributed facet.pivot, let's tackle it holisticly for all faceting in SOLR-6193 . Andrew/Brett: have you guys had a chance to look into the refinement bug when facet.missing is used? (BTW: my update patch only affected test files, so hopefully theres no collision with anything you guys have been working on – but if there is, feel free to just post whatever patch you guys come up with and i'll handle the merge)
          Hide
          Andrew Muldowney added a comment - - edited

          I've been making generally good headway on the .missing problem. We've got a new PivotFacetFieldValueCollection that should deal with the null values properly. Right now the Small and LongTail tests pass but the Long fails on the new facet.limit=1 and facet.missing=true case with SPECIAL. The control response doesn't include the null and the distributed response doesn't get the count of bbc right, it only gets 150 and I'm sure the 298 it gets for microsoft is wrong too. There is something on the shard side code that is not happy with our "" and null values. I'm working on that right now.

          My assumption is that the facet.missing request makes it out to all the shards so we never need to refine on it since all shards responded with the full information, but I guess that isn't always the case since other fields under that null value might have limits that would need to be refined on?

          Show
          Andrew Muldowney added a comment - - edited I've been making generally good headway on the .missing problem. We've got a new PivotFacetFieldValueCollection that should deal with the null values properly. Right now the Small and LongTail tests pass but the Long fails on the new facet.limit=1 and facet.missing=true case with SPECIAL . The control response doesn't include the null and the distributed response doesn't get the count of bbc right, it only gets 150 and I'm sure the 298 it gets for microsoft is wrong too. There is something on the shard side code that is not happy with our "" and null values. I'm working on that right now. My assumption is that the facet.missing request makes it out to all the shards so we never need to refine on it since all shards responded with the full information, but I guess that isn't always the case since other fields under that null value might have limits that would need to be refined on?
          Hide
          Andrew Muldowney added a comment -

          I've uploaded a new file with my facet.missing changes. It's got the small and longtail working.

          DistributedFacetPivotLargeTest.java
          rsp = query( "q", "*:*",
                           "fq", "-place_s:0placeholder",
                           "rows", "0",
                           "facet","true",
                           "facet.limit","1",
                           "facet.missing","true",
                           //FacetParams.FACET_OVERREQUEST_RATIO, "0", // force refine
                           //FacetParams.FACET_OVERREQUEST_COUNT, "0", // force refine
                           "facet.pivot","special_s,company_t");

          This test gets whacky when the OVERREQUEST options are uncommented. With the OVERREQUEST options uncommented we do not get the proper bbc value and so the distributed version diverges from the non-distrib. Your second comment on this issue is exactly on point.

          Another variance in that test is that on the distrib side we get

          {field=special_s,value=,count=3,pivot=[
              {field=company_t,value=microsoft,count=2}, 
              {field=company_t,value=null,count=0}]}
          

          whereas for the non-distrib we just get

          {field=special_s,value=,count=3,pivot=[
              {field=company_t,value=microsoft,count=2}]}
          

          Should facet.missing respect the mincount (in this case it's 1)?

          Show
          Andrew Muldowney added a comment - I've uploaded a new file with my facet.missing changes. It's got the small and longtail working. DistributedFacetPivotLargeTest.java rsp = query( "q" , "*:*" , "fq" , "-place_s:0placeholder" , "rows" , "0" , "facet" , " true " , "facet.limit" , "1" , "facet.missing" , " true " , //FacetParams.FACET_OVERREQUEST_RATIO, "0" , // force refine //FacetParams.FACET_OVERREQUEST_COUNT, "0" , // force refine "facet.pivot" , "special_s,company_t" ); This test gets whacky when the OVERREQUEST options are uncommented. With the OVERREQUEST options uncommented we do not get the proper bbc value and so the distributed version diverges from the non-distrib. Your second comment on this issue is exactly on point. Another variance in that test is that on the distrib side we get {field=special_s,value=,count=3,pivot=[ {field=company_t,value=microsoft,count=2}, {field=company_t,value= null ,count=0}]} whereas for the non-distrib we just get {field=special_s,value=,count=3,pivot=[ {field=company_t,value=microsoft,count=2}]} Should facet.missing respect the mincount (in this case it's 1)?
          Hide
          Hoss Man added a comment -

          Hey Andrew, I probably won't have a chance to review this issue/patches again until monday - but some quick replies...

          With the OVERREQUEST options uncommented we do not get the proper bbc value and so the distributed version diverges from the non-distrib. Your second comment on this issue is exactly on point.

          Just to clarify: you are saying that bbc isn't included in the "top" set in the distrib call because overrequest is so low, which is inconcsistent with the control where bbc is in the top – but all of the values returned by the distrib call do in fact have accurate refined counts ... correct?

          The point of that check is to definitely ensure that refinement works properly on facet.missing – that's why i added it, because it wasn't before and the test didn't catch it because of the default overrequest – so we can't eliminate those OVERREQUEST params.

          what we can do is explicitly call queryServer(...) instead of query(...) to ht a random distributed server bu bypass the comparison with the control server – in that case though we want a lot of tight assertions to ensure that we aren't missing anything.

          (of course: we can also include another check of the same facet.missing request with the overrequest disabled if you want – no one ever complained about too many assertions in a test)

          Should facet.missing respect the mincount (in this case it's 1)?

          I think so? .. if that's what the non-distrib code is doing, that's what the distrib code should do as well.

          Show
          Hoss Man added a comment - Hey Andrew, I probably won't have a chance to review this issue/patches again until monday - but some quick replies... With the OVERREQUEST options uncommented we do not get the proper bbc value and so the distributed version diverges from the non-distrib. Your second comment on this issue is exactly on point. Just to clarify: you are saying that bbc isn't included in the "top" set in the distrib call because overrequest is so low, which is inconcsistent with the control where bbc is in the top – but all of the values returned by the distrib call do in fact have accurate refined counts ... correct? The point of that check is to definitely ensure that refinement works properly on facet.missing – that's why i added it, because it wasn't before and the test didn't catch it because of the default overrequest – so we can't eliminate those OVERREQUEST params. what we can do is explicitly call queryServer(...) instead of query(...) to ht a random distributed server bu bypass the comparison with the control server – in that case though we want a lot of tight assertions to ensure that we aren't missing anything. (of course: we can also include another check of the same facet.missing request with the overrequest disabled if you want – no one ever complained about too many assertions in a test) Should facet.missing respect the mincount (in this case it's 1)? I think so? .. if that's what the non-distrib code is doing, that's what the distrib code should do as well.
          Hide
          Andrew Muldowney added a comment -

          Just to clarify: you are saying that bbc isn't included in the "top" set in the distrib call because overrequest is so low, which is inconcsistent with the control where bbc is in the top – but all of the values returned by the distrib call do in fact have accurate refined counts ... correct?

          It was not refining properly, which I attributed to the lack of overrequest but that was incorrect. The test is actually the only one that tests the following criteria:

          • A value that should be in the "top" elements is not because the overrequesting didn't pick it up or too many shards had values too small. (In this case "bbc" only has a value of 150 from the initial round, when its actual value is 445, larger than microsoft's inital value of 398)
          • There is a shard that has responded with an empty response, aka it has no documents (shard#3 is always empty in the long test file)

          When those two things combine we had an error in our refinement code where we would add Integer.MAX_VALUE to the possible count, overflowing the int and causing it to go negative, and we would never ask for refinement. So we would get microsoft:398 over bbc Fixed

          I have fixed the null issue that keeps away from counting towards the facet.limit
          I have fixed the null issue that keeps it around even when its less than facet.mincount
          I have fixed the issue where an empty response from a shard would render all values on the cusp of making it into the top values never get refined.

          Are you still seeing the infinite recursion problem? The seeds you provided earlier pass locally for me.

          Show
          Andrew Muldowney added a comment - Just to clarify: you are saying that bbc isn't included in the "top" set in the distrib call because overrequest is so low, which is inconcsistent with the control where bbc is in the top – but all of the values returned by the distrib call do in fact have accurate refined counts ... correct? It was not refining properly, which I attributed to the lack of overrequest but that was incorrect. The test is actually the only one that tests the following criteria: A value that should be in the "top" elements is not because the overrequesting didn't pick it up or too many shards had values too small. (In this case "bbc" only has a value of 150 from the initial round, when its actual value is 445, larger than microsoft 's inital value of 398) There is a shard that has responded with an empty response, aka it has no documents (shard#3 is always empty in the long test file) When those two things combine we had an error in our refinement code where we would add Integer.MAX_VALUE to the possible count, overflowing the int and causing it to go negative, and we would never ask for refinement. So we would get microsoft:398 over bbc Fixed I have fixed the null issue that keeps away from counting towards the facet.limit I have fixed the null issue that keeps it around even when its less than facet.mincount I have fixed the issue where an empty response from a shard would render all values on the cusp of making it into the top values never get refined. Are you still seeing the infinite recursion problem? The seeds you provided earlier pass locally for me.
          Hide
          Hoss Man added a comment -

          When those two things combine ... Fixed

          Awesome .. i love it when tests uncover bugs you never even suspected.

          Are you still seeing the infinite recursion problem? The seeds you provided earlier pass locally for me.

          Changing the encoding mechanism stoped the reliably reproducing infinite loop – my concern though is that – even if it's hard to test for – we should make sure the overall algorithm for refinement isn't susceptible to infinite looping in the event of aberrant shard behavior.

          At the moment, from what i can tell, the general behavior of the refinement logic is along the lines of...

          while ( ! values_needing_refined.isEmpty() ) {
            foreach (value : values_needing_refined) {
              foreach (shard_not_responded : value) {
                shardrequstor.enque_refinement_request(shard_not_responded, value)
              }
            }
            shardrequestor.send_refinement_requests_to_shards_that_need_them()
          }
          

          Now imagine a situation where, for whatever reason, some shard will never respond back as expected from a refinement request – ie: maybe we just started a rolling config upgrade and the new solrconfig.xml permanently disables faceting via a facet=false invariant? that's going to be an infinite loop.

          We need a safety valve in the refinement logic.

          Instead of simply sending refinement requests to a shard if there is a value whose shard count we don't know, potentially asking for hte same refinement over and over; we should instead keep track of which shards we've already asked for a refinement, and if we've already asked a shard once, and still don't have a response then we should just give up and return an error.

          does that make sense?

          Show
          Hoss Man added a comment - When those two things combine ... Fixed Awesome .. i love it when tests uncover bugs you never even suspected. Are you still seeing the infinite recursion problem? The seeds you provided earlier pass locally for me. Changing the encoding mechanism stoped the reliably reproducing infinite loop – my concern though is that – even if it's hard to test for – we should make sure the overall algorithm for refinement isn't susceptible to infinite looping in the event of aberrant shard behavior. At the moment, from what i can tell, the general behavior of the refinement logic is along the lines of... while ( ! values_needing_refined.isEmpty() ) { foreach (value : values_needing_refined) { foreach (shard_not_responded : value) { shardrequstor.enque_refinement_request(shard_not_responded, value) } } shardrequestor.send_refinement_requests_to_shards_that_need_them() } Now imagine a situation where, for whatever reason, some shard will never respond back as expected from a refinement request – ie: maybe we just started a rolling config upgrade and the new solrconfig.xml permanently disables faceting via a facet=false invariant? that's going to be an infinite loop. We need a safety valve in the refinement logic. Instead of simply sending refinement requests to a shard if there is a value whose shard count we don't know, potentially asking for hte same refinement over and over; we should instead keep track of which shards we've already asked for a refinement, and if we've already asked a shard once, and still don't have a response then we should just give up and return an error. does that make sense?
          Hide
          Andrew Muldowney added a comment -

          We need a safety valve in the refinement logic.

          I'm with you. Brett and I have talked through some options and I think we have the requisite accounting already in place to be able to check if a shard did not respond with all the refinement values we asked for. We should be able to check the refinements and see that they have had data contributed from the shard in question, if not well throw an error.

          If a shard never responds, does the searcher handle that by eventually timing out? The pivot facet code is predicated on waiting for all shards to respond before moving forward with the next level of refinement so if a shard never responds at all then it'll just wait forever. We're assuming that other processes are watching for searches that take much too long and kill them.

          Show
          Andrew Muldowney added a comment - We need a safety valve in the refinement logic. I'm with you. Brett and I have talked through some options and I think we have the requisite accounting already in place to be able to check if a shard did not respond with all the refinement values we asked for. We should be able to check the refinements and see that they have had data contributed from the shard in question, if not well throw an error. If a shard never responds, does the searcher handle that by eventually timing out? The pivot facet code is predicated on waiting for all shards to respond before moving forward with the next level of refinement so if a shard never responds at all then it'll just wait forever. We're assuming that other processes are watching for searches that take much too long and kill them.
          Hide
          Hoss Man added a comment -

          If a shard never responds, does the searcher handle that by eventually timing out?

          Off the top of my head i'm not sure – i'm pretty sure that will generate an error at a much lower level then the individual search components, so it's not something the pivot code needs to worry about.

          I think the main thing is that when the pivot codes is looking at a ShardResponse, it should be able to say "this response doesn't contain a count for the refinementX we asked for in the corrisponding ShardRequest, fail!" (as opposed to know where i believe it reQueues it) ... we can leave worrying about whether or not a ShardResponse was ever returned at all to the SearchHandler.

          ... I think we have the requisite accounting already in place to be able to check if a shard did not respond with all the refinement values we asked for. We should be able to check the refinements and see that they have had data contributed from the shard in question, if not well throw an error.

          that should be all we need.

          Show
          Hoss Man added a comment - If a shard never responds, does the searcher handle that by eventually timing out? Off the top of my head i'm not sure – i'm pretty sure that will generate an error at a much lower level then the individual search components, so it's not something the pivot code needs to worry about. I think the main thing is that when the pivot codes is looking at a ShardResponse, it should be able to say "this response doesn't contain a count for the refinementX we asked for in the corrisponding ShardRequest, fail!" (as opposed to know where i believe it reQueues it) ... we can leave worrying about whether or not a ShardResponse was ever returned at all to the SearchHandler. ... I think we have the requisite accounting already in place to be able to check if a shard did not respond with all the refinement values we asked for. We should be able to check the refinements and see that they have had data contributed from the shard in question, if not well throw an error. that should be all we need.
          Hide
          Brett Lucey added a comment -

          I think the main thing is that when the pivot codes is looking at a ShardResponse, it should be able to say "this response doesn't contain a count for the refinementX we asked for in the corrisponding ShardRequest, fail!" (as opposed to know where i believe it reQueues it) ... we can leave worrying about whether or not a ShardResponse was ever returned at all to the SearchHandler.

          We mark a shard as having contributed a value already, and we have a list of refinements we requested from the shard. All we'll need to do is iterate over that list after merging the shard contribution and making sure that the shard bit is set on that value. If it isn't, then we know we got a response from that shard but that it didn't tell us what we asked it for, and therefore some sort of error has occurred. We're working on implementing this now.

          Show
          Brett Lucey added a comment - I think the main thing is that when the pivot codes is looking at a ShardResponse, it should be able to say "this response doesn't contain a count for the refinementX we asked for in the corrisponding ShardRequest, fail!" (as opposed to know where i believe it reQueues it) ... we can leave worrying about whether or not a ShardResponse was ever returned at all to the SearchHandler. We mark a shard as having contributed a value already, and we have a list of refinements we requested from the shard. All we'll need to do is iterate over that list after merging the shard contribution and making sure that the shard bit is set on that value. If it isn't, then we know we got a response from that shard but that it didn't tell us what we asked it for, and therefore some sort of error has occurred. We're working on implementing this now.
          Hide
          Andrew Muldowney added a comment -

          Hey Hoss. How would we test this?

          I've verfied it works by commenting out the mergeResponse lines and seeing it error since we expected shard contributions but failed. But how do I write a test where a shard responds in a canned way that is bad?

          Show
          Andrew Muldowney added a comment - Hey Hoss. How would we test this? I've verfied it works by commenting out the mergeResponse lines and seeing it error since we expected shard contributions but failed. But how do I write a test where a shard responds in a canned way that is bad?
          Hide
          Hoss Man added a comment -

          Hey Hoss. How would we test this?

          I don't know. I don't think we can. Like i mentioned before...

          ...That's the sort of edge case that may be really hard to test for, but even if we can't explicit test it, we should at least have some sanity check in the distrib coordination code...

          It's an extreme enough edge case that i don't think we need to jump through a crazy amount of hoops to have a test case for it, i just didn't want to leave such a dangerous trap in the code

          Show
          Hoss Man added a comment - Hey Hoss. How would we test this? I don't know. I don't think we can. Like i mentioned before... ...That's the sort of edge case that may be really hard to test for, but even if we can't explicit test it, we should at least have some sanity check in the distrib coordination code... It's an extreme enough edge case that i don't think we need to jump through a crazy amount of hoops to have a test case for it, i just didn't want to leave such a dangerous trap in the code
          Hide
          Andrew Muldowney added a comment -

          So I may have jumped the gun on the fix. Previously refinement requests would return counts of zero for things that it was asked about but had no values for, this is now no longer true. (It shrinks the required work to merge in refinements and limits how much data we send across the wire). This means that we cannot ask if a refinement has been fulfilled because if a shard doesn't know about a valuePath it will not include it in its response.
          If the shard is responding properly it should still return a facet_pivot with the top level requested pivots, just with everything below that merely an empty list.
          So really the only thing we can check is that the facet_pivot isn't null on a response, since that would indicate that something really awful happened.

          Show
          Andrew Muldowney added a comment - So I may have jumped the gun on the fix. Previously refinement requests would return counts of zero for things that it was asked about but had no values for, this is now no longer true. (It shrinks the required work to merge in refinements and limits how much data we send across the wire). This means that we cannot ask if a refinement has been fulfilled because if a shard doesn't know about a valuePath it will not include it in its response. If the shard is responding properly it should still return a facet_pivot with the top level requested pivots, just with everything below that merely an empty list. So really the only thing we can check is that the facet_pivot isn't null on a response, since that would indicate that something really awful happened.
          Hide
          Hoss Man added a comment -

          Previously refinement requests would return counts of zero for things that it was asked about but had no values for, this is now no longer true. (It shrinks the required work to merge in refinements and limits how much data we send across the wire). This means that we cannot ask if a refinement has been fulfilled because if a shard doesn't know about a valuePath it will not include it in its response.

          So if i'm understanding correctly:

          Old code:

          • expected every shard to reply back with a number (at least 0) for every refinement
          • if it didnt' have a number fro ma shard, it asked again - infinite loop risk
            New Code:
          • expects every shard to reply back with a number for any refinement it has a non-0 count for
          • implicitly assumes a refinement has been fulfilled if it knows it already asked.

          ..so it sounds like you already eliminated the underlying risk .. correct?

          So really the only thing we can check is that the facet_pivot isn't null on a response, since that would indicate that something really awful happened.

          yeah ... sounds good: we know this shard request asked for pivot refinements, if the response doesn't at least contain a facet_pivot response then throw a server error.

          Show
          Hoss Man added a comment - Previously refinement requests would return counts of zero for things that it was asked about but had no values for, this is now no longer true. (It shrinks the required work to merge in refinements and limits how much data we send across the wire). This means that we cannot ask if a refinement has been fulfilled because if a shard doesn't know about a valuePath it will not include it in its response. So if i'm understanding correctly: Old code: expected every shard to reply back with a number (at least 0) for every refinement if it didnt' have a number fro ma shard, it asked again - infinite loop risk New Code: expects every shard to reply back with a number for any refinement it has a non-0 count for implicitly assumes a refinement has been fulfilled if it knows it already asked. ..so it sounds like you already eliminated the underlying risk .. correct? So really the only thing we can check is that the facet_pivot isn't null on a response, since that would indicate that something really awful happened. yeah ... sounds good: we know this shard request asked for pivot refinements, if the response doesn't at least contain a facet_pivot response then throw a server error.
          Hide
          Brett Lucey added a comment -

          ..so it sounds like you already eliminated the underlying risk .. correct?

          Yes. Since we know we'll only need to send refinements once, we added some logic to ensure we don't attempt to re-refine something we have already refined. The bonus to this change is that it should offer a minor performance bump since we won't bother to re-check all of the refined values for further refinement. (We've already asked all the shards about all of the candidates, so there won't be a need to repeat that. We already know we have everything we need after we've refined once.)

          Show
          Brett Lucey added a comment - ..so it sounds like you already eliminated the underlying risk .. correct? Yes. Since we know we'll only need to send refinements once, we added some logic to ensure we don't attempt to re-refine something we have already refined. The bonus to this change is that it should offer a minor performance bump since we won't bother to re-check all of the refined values for further refinement. (We've already asked all the shards about all of the candidates, so there won't be a need to repeat that. We already know we have everything we need after we've refined once.)
          Hide
          Andrew Muldowney added a comment -

          Uploaded latest patch with the refinement optimizations and error check

          Show
          Andrew Muldowney added a comment - Uploaded latest patch with the refinement optimizations and error check
          Hide
          Hoss Man added a comment -

          hey guys, stoked to see all these tests passing!

          I've been slowly working my way through Andrew's latest patch, reviewing all the code and making some tweaks/improvements as I go. Here's a checkpointed update...

          Patch updates in attachment:

          • fix FacetComponent to mirror refactoring done in SOLR-6216
          • fixed up the String.format calls in various classes so they specify Locale.ROOT
            • removed some useless "toString()" calls in these format calls as well, particularly since it looked like they could cause NPEs
          • PivotFacetField
            • javadocs:
              • createFromListOfNamedLists
              • convertToListOfNamedLists
            • eliminate call to PivotFacetFieldValueCollection.contains(...) (see below)
          • PivotFacetValue...
            • javadocs:
              • class
              • createFromNamedList
              • shardHasContributed
              • convertToNamedList
          • PivotFacetFieldValueCollection...
            • javadocs:
              • class
              • refinableCollection
              • refinableSubList
              • refinableSize
              • size
              • get
              • add
            • remove unused methods
              • isEmpty()
              • getValue(Comparable)
              • contains(Comparable)
                • (this was used, but only in a case where it was immediately followed by a call get(Comparable) so i just optimized it away and replaced it with a null check.
            • rename: "isSorted" -> "dirty"
            • rename: "nullValue" -> "missingValue"
              • it was really confusing because "nullValue" could be null, or it could be a PivotFacetValue whose value was null
            • fix add(PivotFacetValue) to set "dirty" directly
            • lock down some stuff...
              • methods for accessing some vars so they don't need to be public
              • make some things specified in constructor final
              • make refinableCollection and refinableSubList return immutable lists

          Some things i'm either confused by and/or debating in my head ... comments/opinions from others would be apreciated:

          • refinement and facet offset
            • I haven't looed into this closely, but i noticed the refinement code seems to only refine things started at the "facetFieldOffset," of the current collection
            • don't we need to refine all the values, starting from the beginging of the list?
            • if if the offset is "1" and the first value X has a count of "100" and the second value Y has an initial count of "50" but a post-refinement count of "150" pushing itself prior to the offset and putting X into the window, then doesn't X miss out on refinement?
          • refinableCollection()
            • I think we probably want to rename refinableCollection() (and refinableSize()) to something more like "getExplicitValuesList() (compared to the getMissingValue() method I just added) to make it more clear what you are really getting form this method ... I recognize that this name comes from the fact that we don't ever really need to refine the count for the missing value, but that seems like an implementaion detail that doesn't affect a lot of places this method is called (and particularly since the childPivots of the missing value do still need refined so even when it is relevant, it's still missleading from a recursion standpoint.)
          • trim
            • from what i can understand of the trim methods - these are typically destructive operations that:
              • should only be called after all refinement is completed
              • prune things that are no longer needed based on the limit/offset params, making the objects unusable for any future modifications/refinement so that it's only good for...
              • should be called just prior to asking for the final NamedList response structure
            • if my understanding is correct, then it seems like it might be safer & more straight forward to instead just refactor this functionality directly into the corrisponding methods for converting to a NamedList, and clearly document those methods as destructive?
              • or at the very least add a "trimmed" boolean and sprinkle arround some asserts in the various methods related to wether the object has/has not already been trimmed
          Show
          Hoss Man added a comment - hey guys, stoked to see all these tests passing! I've been slowly working my way through Andrew's latest patch, reviewing all the code and making some tweaks/improvements as I go. Here's a checkpointed update... Patch updates in attachment: fix FacetComponent to mirror refactoring done in SOLR-6216 fixed up the String.format calls in various classes so they specify Locale.ROOT removed some useless "toString()" calls in these format calls as well, particularly since it looked like they could cause NPEs PivotFacetField javadocs: createFromListOfNamedLists convertToListOfNamedLists eliminate call to PivotFacetFieldValueCollection.contains(...) (see below) PivotFacetValue... javadocs: class createFromNamedList shardHasContributed convertToNamedList PivotFacetFieldValueCollection... javadocs: class refinableCollection refinableSubList refinableSize size get add remove unused methods isEmpty() getValue(Comparable) contains(Comparable) (this was used, but only in a case where it was immediately followed by a call get(Comparable) so i just optimized it away and replaced it with a null check. rename: "isSorted" -> "dirty" rename: "nullValue" -> "missingValue" it was really confusing because "nullValue" could be null, or it could be a PivotFacetValue whose value was null fix add(PivotFacetValue) to set "dirty" directly lock down some stuff... methods for accessing some vars so they don't need to be public make some things specified in constructor final make refinableCollection and refinableSubList return immutable lists Some things i'm either confused by and/or debating in my head ... comments/opinions from others would be apreciated: refinement and facet offset I haven't looed into this closely, but i noticed the refinement code seems to only refine things started at the "facetFieldOffset," of the current collection don't we need to refine all the values, starting from the beginging of the list? if if the offset is "1" and the first value X has a count of "100" and the second value Y has an initial count of "50" but a post-refinement count of "150" pushing itself prior to the offset and putting X into the window, then doesn't X miss out on refinement? refinableCollection() I think we probably want to rename refinableCollection() (and refinableSize() ) to something more like " getExplicitValuesList() (compared to the getMissingValue() method I just added) to make it more clear what you are really getting form this method ... I recognize that this name comes from the fact that we don't ever really need to refine the count for the missing value, but that seems like an implementaion detail that doesn't affect a lot of places this method is called (and particularly since the childPivots of the missing value do still need refined so even when it is relevant, it's still missleading from a recursion standpoint.) trim from what i can understand of the trim methods - these are typically destructive operations that: should only be called after all refinement is completed prune things that are no longer needed based on the limit/offset params, making the objects unusable for any future modifications/refinement so that it's only good for... should be called just prior to asking for the final NamedList response structure if my understanding is correct, then it seems like it might be safer & more straight forward to instead just refactor this functionality directly into the corrisponding methods for converting to a NamedList, and clearly document those methods as destructive? or at the very least add a "trimmed" boolean and sprinkle arround some asserts in the various methods related to wether the object has/has not already been trimmed
          Hide
          Hoss Man added a comment -

          Making good progress (only ~1600 lines of diff left to review!)

          updates in this patch...

          • PivotFacetFieldValueCollection
            • some javadocs
            • refactor away method: nonNullValueIterator()
              • only called in one place
          • PivotFacetField
            • some javaddocs
            • made createFromListOfNamedLists smart enough to return null on null input
              • simplified PivotFacetValue.createFromNamedList
            • made contributeFromShard smart enough to be a no-op on null input
              • simplified all callers (PivotFacet & PivotFacetValue)
            • made some vars final where possible via refactoring constructor & createFromListOfNamedLists
            • refactor skipRefinementAtThisLevel out of the method an up to an instance var since it never changes once the facet params are set in the constructor
            • consolidate skipRefinementAtThisLevel + hasBeenRefined into a single var: needRefinementAtThisLevel
            • simplify BitSet iteration (nextSetBit is always < length)
              • processDefiniteCandidateElement
              • processPossibleCandidateElement
          • PivotFacetValue
            • some javadocs
            • made variables private and added method accessors (w/jdocs) as needed
              • updated other classes as needed to call these new methods instead of the old pub vars
            • made some vars final where possible via refactoring createFromNamedList & constructor
          • PivotFacet
            • some javadocs
            • added getQueuedRefinements(int)
            • made some variables final where possible
            • renamed noRefinementsRequired -> isRefinementsRequired
            • eliminate unused method: areAnyRefinementsQueued
          • FacetComponent
            • switched direct use of PivotFacet.queuedRefinements to use PivotFacet.getQueuedRefinements
              • simplified error checking in several places

          One new question i want to go back and revisit later...

          • do we really need to track "knownShards" in PivotFacet ?
            • ResponseBuilder already maintains a String[] of all shards, getShardNum derived from it
            • can't we just loop from 0 to shards.length? does it ever matter if a shard hasn't participated?
            • ie: is it really important that we skip any "unset bits" in knownShards when looping? (all the current usages seem safe even if a shard has no data for the current pivot)
          Show
          Hoss Man added a comment - Making good progress (only ~1600 lines of diff left to review!) updates in this patch... PivotFacetFieldValueCollection some javadocs refactor away method: nonNullValueIterator() only called in one place PivotFacetField some javaddocs made createFromListOfNamedLists smart enough to return null on null input simplified PivotFacetValue.createFromNamedList made contributeFromShard smart enough to be a no-op on null input simplified all callers (PivotFacet & PivotFacetValue) made some vars final where possible via refactoring constructor & createFromListOfNamedLists refactor skipRefinementAtThisLevel out of the method an up to an instance var since it never changes once the facet params are set in the constructor consolidate skipRefinementAtThisLevel + hasBeenRefined into a single var: needRefinementAtThisLevel simplify BitSet iteration (nextSetBit is always < length) processDefiniteCandidateElement processPossibleCandidateElement PivotFacetValue some javadocs made variables private and added method accessors (w/jdocs) as needed updated other classes as needed to call these new methods instead of the old pub vars made some vars final where possible via refactoring createFromNamedList & constructor PivotFacet some javadocs added getQueuedRefinements(int) made some variables final where possible renamed noRefinementsRequired -> isRefinementsRequired eliminate unused method: areAnyRefinementsQueued FacetComponent switched direct use of PivotFacet.queuedRefinements to use PivotFacet.getQueuedRefinements simplified error checking in several places One new question i want to go back and revisit later... do we really need to track "knownShards" in PivotFacet ? ResponseBuilder already maintains a String[] of all shards, getShardNum derived from it can't we just loop from 0 to shards.length? does it ever matter if a shard hasn't participated? ie: is it really important that we skip any "unset bits" in knownShards when looping? (all the current usages seem safe even if a shard has no data for the current pivot)
          Hide
          Hoss Man added a comment -

          I've been focusing on more tests using facet.offset...

          I haven't looed into this closely, but i noticed the refinement code seems to only refine things started at the "facetFieldOffset," of the current collection don't we need to refine all the values, starting from the beginging of the list?

          There was in fact a bug with refinement when using facet.offset – but i was looking in the wrong place. the code i was refering to before was involved in deciding which values to drilldown into when recursively refining the sub-pivots. that logic was already (mostly) correct because by that point we've already refined the current levle completly, so we can skip past the offset when doing the recursion (the only glitch was a boundary check causing an IOOBE, see detials below). Earlier on in the code however, there was a mistake where only the limit (not the limit+offset) was being used to decide the threshold value for refinement.


          New improvements in this patch...

          • TestCloudPivotFacet
            • increase the odds of overrequest==0
            • randonly include a facet.offset param to sanity check refinement in that case
          • PivotFacetField
            • fix refineNextLevelOfFacets not to ask for a sublist with a start offset bigger then the size of the collection
              • this was causing an IndexOutOfBoundsException pretty quickly when offset was mixed into the random test
            • fix queuePivotRefinementRequests to respect offset when picking the "indexOfCountThreshold"
              • before it was only looking at limit, with offset in the randomized test this was causing failures even when pivots only had one field in them!

          A few more things to consider in the future...

          • PivotFacetFieldValueCollection.refinableSubList is only use to deal with offset+limit sublisting from PivotFacetField.refineNextLevelOfFacets – but PivotFacetFieldValueCollection already knows the offset&limit so maybe it should be a smarter special purpose method with 0 args: getNextLevelValuesToRefine()
          • trim earlier?
            • the way refinement currently works in PivotFacetField, after we've refined our values, we mark that we no longer need refinement, and then on the next call we recursively refine the subpivots of each value – and in both cases we do the offset+limit calculations and hang on to all of the values (both below offset and above limit) as we keep iterating down hte pivots – they don't get thrown away until the final trim() call just before building up the final result.
            • i previously suggested folding the trim() logic into the NamedList response logic – but now i'm wondering if the trim() logic should instead be folded into refinement? so once we're sure a level is fully refined, we go ahead and trim that level before drilling down and refining it's kids?

          Unfortunately, with this new patch, i did uncover a new random failure i can't easily explain (doesn't seem related ot the offset changes since facet.offset isn't evne used in these random params – but it's possible i broke something while fixing that) ...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=775F7BCA685BBC22 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=da_DK -Dtests.timezone=America/Montserrat -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 65.9s | TestCloudPivotFacet.testDistribSearch <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: {main(facet=true&facet.pivot=pivot_tl%2Cpivot_tl%2Cpivot_y_s&facet.pivot=bogus_not_in_any_doc_s%2Cpivot_l1%2Cpivot_td&facet.limit=13&facet.missing=true&facet.sort=count&facet.overrequest.count=2),extra(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count)} ==> bogus_not_in_any_doc_s,pivot_l1,pivot_td: {params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})} expected:<7> but was:<9>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([775F7BCA685BBC22:F6B9F5D21F04DC1E]:0)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187)
             [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
             [junit4]    > Caused by: java.lang.AssertionError: bogus_not_in_any_doc_s,pivot_l1,pivot_td: {params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})} expected:<7> but was:<9>
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:268)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229)
          

          ...i need to dig into this a bit more tommorow.

          Show
          Hoss Man added a comment - I've been focusing on more tests using facet.offset... I haven't looed into this closely, but i noticed the refinement code seems to only refine things started at the "facetFieldOffset," of the current collection don't we need to refine all the values, starting from the beginging of the list? There was in fact a bug with refinement when using facet.offset – but i was looking in the wrong place. the code i was refering to before was involved in deciding which values to drilldown into when recursively refining the sub-pivots. that logic was already (mostly) correct because by that point we've already refined the current levle completly, so we can skip past the offset when doing the recursion (the only glitch was a boundary check causing an IOOBE, see detials below). Earlier on in the code however, there was a mistake where only the limit (not the limit+offset) was being used to decide the threshold value for refinement. New improvements in this patch... TestCloudPivotFacet increase the odds of overrequest==0 randonly include a facet.offset param to sanity check refinement in that case PivotFacetField fix refineNextLevelOfFacets not to ask for a sublist with a start offset bigger then the size of the collection this was causing an IndexOutOfBoundsException pretty quickly when offset was mixed into the random test fix queuePivotRefinementRequests to respect offset when picking the "indexOfCountThreshold" before it was only looking at limit, with offset in the randomized test this was causing failures even when pivots only had one field in them! A few more things to consider in the future... PivotFacetFieldValueCollection.refinableSubList is only use to deal with offset+limit sublisting from PivotFacetField.refineNextLevelOfFacets – but PivotFacetFieldValueCollection already knows the offset&limit so maybe it should be a smarter special purpose method with 0 args: getNextLevelValuesToRefine() trim earlier? the way refinement currently works in PivotFacetField, after we've refined our values, we mark that we no longer need refinement, and then on the next call we recursively refine the subpivots of each value – and in both cases we do the offset+limit calculations and hang on to all of the values (both below offset and above limit) as we keep iterating down hte pivots – they don't get thrown away until the final trim() call just before building up the final result. i previously suggested folding the trim() logic into the NamedList response logic – but now i'm wondering if the trim() logic should instead be folded into refinement? so once we're sure a level is fully refined, we go ahead and trim that level before drilling down and refining it's kids? Unfortunately, with this new patch, i did uncover a new random failure i can't easily explain (doesn't seem related ot the offset changes since facet.offset isn't evne used in these random params – but it's possible i broke something while fixing that) ... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=775F7BCA685BBC22 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=da_DK -Dtests.timezone=America/Montserrat -Dtests.file.encoding=UTF-8 [junit4] FAILURE 65.9s | TestCloudPivotFacet.testDistribSearch <<< [junit4] > Throwable #1: java.lang.AssertionError: {main(facet=true&facet.pivot=pivot_tl%2Cpivot_tl%2Cpivot_y_s&facet.pivot=bogus_not_in_any_doc_s%2Cpivot_l1%2Cpivot_td&facet.limit=13&facet.missing=true&facet.sort=count&facet.overrequest.count=2),extra(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count)} ==> bogus_not_in_any_doc_s,pivot_l1,pivot_td: {params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})} expected:<7> but was:<9> [junit4] > at __randomizedtesting.SeedInfo.seed([775F7BCA685BBC22:F6B9F5D21F04DC1E]:0) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865) [junit4] > at java.lang.Thread.run(Thread.java:744) [junit4] > Caused by: java.lang.AssertionError: bogus_not_in_any_doc_s,pivot_l1,pivot_td: {params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})} expected:<7> but was:<9> [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:268) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229) ...i need to dig into this a bit more tommorow.
          Hide
          Hoss Man added a comment -

          Quick update...

          ...i need to dig into this a bit more tommorow.

          a restless night sleep and semi-fresh eyes make scary bugs shallow: the problem was that PivotFacetField.queuePivotRefinementRequests had a short circuit optimization when valueCollection.refinableCollection().isEmpty() that was preventing the child pivots of the facet.missing count from being refined if there were no matching values in the field.

          This patch fixes that bug and adds an explicit test for this situation to DistributedFacetPivotLargeTest.

          Show
          Hoss Man added a comment - Quick update... ...i need to dig into this a bit more tommorow. a restless night sleep and semi-fresh eyes make scary bugs shallow: the problem was that PivotFacetField.queuePivotRefinementRequests had a short circuit optimization when valueCollection.refinableCollection().isEmpty() that was preventing the child pivots of the facet.missing count from being refined if there were no matching values in the field. This patch fixes that bug and adds an explicit test for this situation to DistributedFacetPivotLargeTest.
          Hide
          Hoss Man added a comment -

          I let my laptop hammer away on TestCloudPivotFacet while i was looking at some other stuff, and got a new reproducible failure...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=EE02505B2F4046AC -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=fi -Dtests.timezone=Asia/Aqtobe -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 56.9s | TestCloudPivotFacet.testDistribSearch <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: {main(facet=true&facet.pivot=pivot_y_s%2Cpivot_b&facet.pivot=pivot_tdt1&facet.limit=4&facet.offset=5&facet.pivot.mincount=17&facet.missing=false&facet.sort=index),extra(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index)} ==> pivot_y_s,pivot_b: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s%7Dg)})} expected:<22> but was:<50>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([EE02505B2F4046AC:6FE4DE43581F2690]:0)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187)
             [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:744)
             [junit4]    > Caused by: java.lang.AssertionError: pivot_y_s,pivot_b: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s%7Dg)})} expected:<22> but was:<50>
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257)
             [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229)
             [junit4]    > 	... 42 more
          

          At first i thought this was simply an issue in how "needRefinementAtThisLevel" assumed we never need refinement for sort=index – that's too general of an assertion, we can only asume no refinement is needed if mincount=0. But fixing that still didn't solve the problem.

          Thinking about the PivotFacetField.queuePivotRefinementRequests logic however made me realize that all of the logic in that method (and it's use of "countThreshold") really only works with sort=count ... for sort=index we shouldn't make any assumptions about the cutoff based on the count.

          Before digging into a fix, I started working on more sort=index tests to try and better excercise this code, and quickly encountered a new (unrelated?) failure that seems to related to mincount==0 on sub pivots...

          I distilled the new mincount failure out into a new isolated test query (that doesn't use sort=index) in DistributedFacetPivotLargeTest:

              rsp = query( "q", "*:*",
                           "rows", "0",
                           "facet","true",
                           "facet.pivot","place_s,company_t",
                           FacetParams.FACET_LIMIT, "50",
                           FacetParams.FACET_PIVOT_MINCOUNT,"0"); 
          

          ...which leads to...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=DistributedFacetPivotLargeTest -Dtests.method=testDistribSearch -Dtests.seed=63DFE6A839DD2C9F -Dtests.slow=true -Dtests.locale=es_NI -Dtests.timezone=Asia/Bishkek -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 43.5s | DistributedFacetPivotLargeTest.testDistribSearch <<<
             [junit4]    > Throwable #1: junit.framework.AssertionFailedError: .facet_counts.facet_pivot.place_s,company_t[1].pivot.length:3!=50
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([63DFE6A839DD2C9F:E23968B04E824CA3]:0)
          

          ...i haven't dug into what exactly is going on here, i've been focusng on more tests for the sort=index refinement bug first (since it's easy to reproduce even w/o sub-pivots)


          In addition to the above mentioned addition to DistributedFacetPivotLargeTest, this new patch also adds some new queries/assertions to DistributedFacetPivotSmallTest that seem to demo the problem with facet.sort=index as the randomized failure (at least ... i think it's the same problem).

          i'm going to work on fixing queuePivotRefinementRequests to account for sort=index tomorow.


          Andrew, Brett: I don't suppose the mincount=0 bug jumps out at you guys as something with an obvious fix?

          Show
          Hoss Man added a comment - I let my laptop hammer away on TestCloudPivotFacet while i was looking at some other stuff, and got a new reproducible failure... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=EE02505B2F4046AC -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=fi -Dtests.timezone=Asia/Aqtobe -Dtests.file.encoding=UTF-8 [junit4] FAILURE 56.9s | TestCloudPivotFacet.testDistribSearch <<< [junit4] > Throwable #1: java.lang.AssertionError: {main(facet=true&facet.pivot=pivot_y_s%2Cpivot_b&facet.pivot=pivot_tdt1&facet.limit=4&facet.offset=5&facet.pivot.mincount=17&facet.missing=false&facet.sort=index),extra(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index)} ==> pivot_y_s,pivot_b: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s%7Dg)})} expected:<22> but was:<50> [junit4] > at __randomizedtesting.SeedInfo.seed([EE02505B2F4046AC:6FE4DE43581F2690]:0) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187) [junit4] > at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865) [junit4] > at java.lang.Thread.run(Thread.java:744) [junit4] > Caused by: java.lang.AssertionError: pivot_y_s,pivot_b: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+786%5D&_test_min=17&_test_miss=false&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s%7Dg)})} expected:<22> but was:<50> [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257) [junit4] > at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229) [junit4] > ... 42 more At first i thought this was simply an issue in how "needRefinementAtThisLevel" assumed we never need refinement for sort=index – that's too general of an assertion, we can only asume no refinement is needed if mincount=0. But fixing that still didn't solve the problem. Thinking about the PivotFacetField.queuePivotRefinementRequests logic however made me realize that all of the logic in that method (and it's use of "countThreshold") really only works with sort=count ... for sort=index we shouldn't make any assumptions about the cutoff based on the count. Before digging into a fix, I started working on more sort=index tests to try and better excercise this code, and quickly encountered a new (unrelated?) failure that seems to related to mincount==0 on sub pivots... I distilled the new mincount failure out into a new isolated test query (that doesn't use sort=index) in DistributedFacetPivotLargeTest: rsp = query( "q" , "*:*" , "rows" , "0" , "facet" , " true " , "facet.pivot" , "place_s,company_t" , FacetParams.FACET_LIMIT, "50" , FacetParams.FACET_PIVOT_MINCOUNT, "0" ); ...which leads to... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=DistributedFacetPivotLargeTest -Dtests.method=testDistribSearch -Dtests.seed=63DFE6A839DD2C9F -Dtests.slow=true -Dtests.locale=es_NI -Dtests.timezone=Asia/Bishkek -Dtests.file.encoding=UTF-8 [junit4] FAILURE 43.5s | DistributedFacetPivotLargeTest.testDistribSearch <<< [junit4] > Throwable #1: junit.framework.AssertionFailedError: .facet_counts.facet_pivot.place_s,company_t[1].pivot.length:3!=50 [junit4] > at __randomizedtesting.SeedInfo.seed([63DFE6A839DD2C9F:E23968B04E824CA3]:0) ...i haven't dug into what exactly is going on here, i've been focusng on more tests for the sort=index refinement bug first (since it's easy to reproduce even w/o sub-pivots) In addition to the above mentioned addition to DistributedFacetPivotLargeTest, this new patch also adds some new queries/assertions to DistributedFacetPivotSmallTest that seem to demo the problem with facet.sort=index as the randomized failure (at least ... i think it's the same problem). i'm going to work on fixing queuePivotRefinementRequests to account for sort=index tomorow. Andrew, Brett: I don't suppose the mincount=0 bug jumps out at you guys as something with an obvious fix?
          Hide
          Hoss Man added a comment -

          Ater working through the fix the the refinement logic in PivotFacetField.queuePivotRefinementRequests the previously failing seed for TestCloudPivotFacet started to pass, but some sort=index tests still weren't working, which lead me to realize 2 things:

          • some of my tests were absurd – i've gotten use to using overrequest=0 as a way to force refinement, but with facet.sort=index combined with limit (and offset) ad mincount it ment that it was impossible for the sort=index facet logic to ever find the results we're looking for. We have to allow some overrequest when mincount>1 or the initial shard requests won't find the values (that will ultimately have a cumulative mincount high enough) in order to even try refining them.
          • offset wasn't being added to the limit in the per-shard requests, so w/o overrequest enabled you would never get teh values you needed even in ideal situations
          • the shard query logic in FacetComponent was ignoring overrequest when sort=index ... this seems broken to me, but from what i can tell, it comes straight form the existing facet.field logic as well.

          I'll open a bug to track the existing broken logic overrequest logic in facet.field – even though i hope that once we're done with this issue, it may be fixed via refactoring and shared code with pivots (i'm not 100% certain: the FacetComponent diff is the bulk of what i still need to review more closely on this issue)

          There's still a failure in DistributedFacetPivotLargeTest (mismatch comapred to control) when i tried using mincount=0 that i'm not certain if/how we can solve...

          // :nocommit: broken honda?
          rsp = query( params( "q", "*:*",
                               "rows", "0",
                               "facet","true",
                               "facet.sort","index",
                               "f.place_s.facet.limit", "20",
                               "f.place_s.facet.offset", "40",
                               FacetParams.FACET_PIVOT_MINCOUNT,"0",
                               "facet.pivot", "place_s,company_t") );
          

          From what I can tell, the gist of the issue is that when dealing with sub-fields of the pivot, the coordination code doesn't know about some of the "0" values if no shard which has the value for the parent field even knows about the existence of the term.

          The simplest example of this discrepency (compared to single node pivots) is to consider an index with only 2 docs...

          [{"id":1,"top_s":"foo","sub_s":"bar"}
           {"id":2,"top_s":"xxx","sub_s":"yyy"}]
          

          If those two docs exist in a single node index, and you pivot on top_s,sub_s using mincount=0 you get a response like this...

          $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true'
          {
            "response":{"numFound":2,"start":0,"docs":[]
            },
            "facet_counts":{
              "facet_queries":{},
              "facet_fields":{},
              "facet_dates":{},
              "facet_ranges":{},
              "facet_intervals":{},
              "facet_pivot":{
                "top_s,sub_s":[{
                    "field":"top_s",
                    "value":"foo",
                    "count":1,
                    "pivot":[{
                        "field":"sub_s",
                        "value":"bar",
                        "count":1},
                      {
                        "field":"sub_s",
                        "value":"yyy",
                        "count":0}]},
                  {
                    "field":"top_s",
                    "value":"xxx",
                    "count":1,
                    "pivot":[{
                        "field":"sub_s",
                        "value":"yyy",
                        "count":1},
                      {
                        "field":"sub_s",
                        "value":"bar",
                        "count":0}]}]}}}
          

          If however you index each of those docs on a seperate shard, the response comes back like this...

          $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
          {
            "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[]
            },
            "facet_counts":{
              "facet_queries":{},
              "facet_fields":{},
              "facet_dates":{},
              "facet_ranges":{},
              "facet_intervals":{},
              "facet_pivot":{
                "top_s,sub_s":[{
                    "field":"top_s",
                    "value":"foo",
                    "count":1,
                    "pivot":[{
                        "field":"sub_s",
                        "value":"bar",
                        "count":1}]},
                  {
                    "field":"top_s",
                    "value":"xxx",
                    "count":1,
                    "pivot":[{
                        "field":"sub_s",
                        "value":"yyy",
                        "count":1}]}]}}}
          

          The only solution i can think of, would be an extra (special to mincount=0) stage of logic, after each PivotFacetField is refined, that would:

          • iterate over all the values of the current pivot
          • build up a Set of all all the known values for the child-pivots of of those values
          • iterate over all the values again, merging in a "0"-count child value for every value in the set

          ...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add a count of '0' for 'v_x' in every 'sub_field' collection nested under the 'top_field' in our 'top_field,sub_field' pivot"

          I haven't thought this idea through enough to be confident it would work, or that it's worth doing ... i'm certainly not convinced that mincount=0 makes enough sense in a facet.pivot usecase to think getting this test working should hold up getting this committed – probably something that should just be committed as is, with an open Jira that it's a known bug.

          Summary Changes in this patch
          • PivotFacet
            • add a new REFINE_PARAM constant for "fpt"
          • PivotFacetProcessor
            • javadocs
            • use REFINE_PARAM constant
          • PivotFacetField
            • processDefiniteCandidateElement
              • javadocs
              • numberOfValuesContributedByShardWasLimitedByFacetFieldLimit can only be trusted when sort=count
            • processPossibleCandidateElement
              • method only useful when sort=count
              • added assert & javadocs making this clear
            • queuePivotRefinementRequests
              • call processDefiniteCandidateElement on all elements when using sort=index
          • FacetComponent
            • applyToShardRequests - removed this method
              • a bunch of it was dead code (if limit > 0, no need to check limit>=0)
              • most of what wasn't dead code was also being done by the callers (ie: redundent overrequest logic)
              • this was also where the original mincount=0 bug lived (mincount was being forced to 1 when called from pivot cade)
            • modifyRequestForIndividualPivotFacets & modifyRequestForFieldFacets
              • made sure they were directly doing the stuff they use to depend on applyToShardRequests for
              • fixed up limit+offset & overrequest logic
            • use REFINE_PARAM constant
          • DistributedFacetPivotLargeTest
            • fixed tests to be less overzealous about overrequest=0
            • added more mincount=0 testing (currently fails)
          Show
          Hoss Man added a comment - Ater working through the fix the the refinement logic in PivotFacetField.queuePivotRefinementRequests the previously failing seed for TestCloudPivotFacet started to pass, but some sort=index tests still weren't working, which lead me to realize 2 things: some of my tests were absurd – i've gotten use to using overrequest=0 as a way to force refinement, but with facet.sort=index combined with limit (and offset) ad mincount it ment that it was impossible for the sort=index facet logic to ever find the results we're looking for. We have to allow some overrequest when mincount>1 or the initial shard requests won't find the values (that will ultimately have a cumulative mincount high enough) in order to even try refining them. offset wasn't being added to the limit in the per-shard requests, so w/o overrequest enabled you would never get teh values you needed even in ideal situations the shard query logic in FacetComponent was ignoring overrequest when sort=index ... this seems broken to me, but from what i can tell, it comes straight form the existing facet.field logic as well. I'll open a bug to track the existing broken logic overrequest logic in facet.field – even though i hope that once we're done with this issue, it may be fixed via refactoring and shared code with pivots (i'm not 100% certain: the FacetComponent diff is the bulk of what i still need to review more closely on this issue) There's still a failure in DistributedFacetPivotLargeTest (mismatch comapred to control) when i tried using mincount=0 that i'm not certain if/how we can solve... // :nocommit: broken honda? rsp = query( params( "q" , "*:*" , "rows" , "0" , "facet" , " true " , "facet.sort" , "index" , "f.place_s.facet.limit" , "20" , "f.place_s.facet.offset" , "40" , FacetParams.FACET_PIVOT_MINCOUNT, "0" , "facet.pivot" , "place_s,company_t" ) ); From what I can tell, the gist of the issue is that when dealing with sub-fields of the pivot, the coordination code doesn't know about some of the "0" values if no shard which has the value for the parent field even knows about the existence of the term. The simplest example of this discrepency (compared to single node pivots) is to consider an index with only 2 docs... [{"id":1,"top_s":"foo","sub_s":"bar"} {"id":2,"top_s":"xxx","sub_s":"yyy"}] If those two docs exist in a single node index, and you pivot on top_s,sub_s using mincount=0 you get a response like this... $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true' { "response":{"numFound":2,"start":0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "top_s,sub_s":[{ "field":"top_s", "value":"foo", "count":1, "pivot":[{ "field":"sub_s", "value":"bar", "count":1}, { "field":"sub_s", "value":"yyy", "count":0}]}, { "field":"top_s", "value":"xxx", "count":1, "pivot":[{ "field":"sub_s", "value":"yyy", "count":1}, { "field":"sub_s", "value":"bar", "count":0}]}]}}} If however you index each of those docs on a seperate shard, the response comes back like this... $ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr' { "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_pivot":{ "top_s,sub_s":[{ "field":"top_s", "value":"foo", "count":1, "pivot":[{ "field":"sub_s", "value":"bar", "count":1}]}, { "field":"top_s", "value":"xxx", "count":1, "pivot":[{ "field":"sub_s", "value":"yyy", "count":1}]}]}}} The only solution i can think of, would be an extra (special to mincount=0) stage of logic, after each PivotFacetField is refined, that would: iterate over all the values of the current pivot build up a Set of all all the known values for the child-pivots of of those values iterate over all the values again, merging in a "0"-count child value for every value in the set ...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add a count of '0' for 'v_x' in every 'sub_field' collection nested under the 'top_field' in our 'top_field,sub_field' pivot" I haven't thought this idea through enough to be confident it would work, or that it's worth doing ... i'm certainly not convinced that mincount=0 makes enough sense in a facet.pivot usecase to think getting this test working should hold up getting this committed – probably something that should just be committed as is, with an open Jira that it's a known bug. Summary Changes in this patch PivotFacet add a new REFINE_PARAM constant for "fpt" PivotFacetProcessor javadocs use REFINE_PARAM constant PivotFacetField processDefiniteCandidateElement javadocs numberOfValuesContributedByShardWasLimitedByFacetFieldLimit can only be trusted when sort=count processPossibleCandidateElement method only useful when sort=count added assert & javadocs making this clear queuePivotRefinementRequests call processDefiniteCandidateElement on all elements when using sort=index FacetComponent applyToShardRequests - removed this method a bunch of it was dead code (if limit > 0, no need to check limit>=0) most of what wasn't dead code was also being done by the callers (ie: redundent overrequest logic) this was also where the original mincount=0 bug lived (mincount was being forced to 1 when called from pivot cade) modifyRequestForIndividualPivotFacets & modifyRequestForFieldFacets made sure they were directly doing the stuff they use to depend on applyToShardRequests for fixed up limit+offset & overrequest logic use REFINE_PARAM constant DistributedFacetPivotLargeTest fixed tests to be less overzealous about overrequest=0 added more mincount=0 testing (currently fails)
          Hide
          Erick Erickson added a comment -

          Chris Hostetter

          I confess I'm barely skimming this (it's big as you are more aware than me!). But there were two recent JIRAs, SOLR-6300 SOLR-6314 ("facet mincount fails if distrib=true" and "multi-threaded facet count returns different results if shards > 1") that sure seem like they could be related. Does that seem plausible? I realize this is pivot faceting, but...

          So I'm thinking if I can get repeatable test case failures for these two JIRAs that I should apply this patch and see if this patch fixes them.

          Thoughts?

          Show
          Erick Erickson added a comment - Chris Hostetter I confess I'm barely skimming this (it's big as you are more aware than me!). But there were two recent JIRAs, SOLR-6300 SOLR-6314 ("facet mincount fails if distrib=true" and "multi-threaded facet count returns different results if shards > 1") that sure seem like they could be related. Does that seem plausible? I realize this is pivot faceting, but... So I'm thinking if I can get repeatable test case failures for these two JIRAs that I should apply this patch and see if this patch fixes them. Thoughts?
          Hide
          Hoss Man added a comment -

          Erick:

          • SOLR-6300: appears to be specific to date/range faceting - almost certainly not related to the problem i found since there's no overrequesting logic with range faceting.
          • SOLR-6314 seems unrelated given how it ties into the threading code, which is "above" the layer of changes i'm talking about ... but anything is possible.
          Show
          Hoss Man added a comment - Erick: SOLR-6300 : appears to be specific to date/range faceting - almost certainly not related to the problem i found since there's no overrequesting logic with range faceting. SOLR-6314 seems unrelated given how it ties into the threading code, which is "above" the layer of changes i'm talking about ... but anything is possible.
          Hide
          Erick Erickson added a comment -

          Rats! And here I was hoping you'd do the work for me ....

          Good to know though, it'll keep me from putting this off. Thanks!

          Show
          Erick Erickson added a comment - Rats! And here I was hoping you'd do the work for me .... Good to know though, it'll keep me from putting this off. Thanks!
          Hide
          Hoss Man added a comment -

          I've finished reviewing all the code and didn't find any new concerns. (woot!)

          I was hoping that more refactoring could be done to share common logic between the facet.field distributed code and the facet.pivot distributed code (akin to what it seemed like "applyToShardRequests()" was aiming for in earlier patches) but between the use of the "DistribFieldFacet" class and the anoying discrepency between "facet.mincount" and "facet.pivot.mincount" that seemed like more trouble then it's worth.

          In addition, my little-laptop-that-could has been churning away of several hundred iterations of TestCloudPivots using tests.nightly=true with this patch for the past few days, w/o any signs of bugs in the refinement code.

          At this point, there are only a handful of 'nocommit' comments left in the patch, that fall into 2 basic categories:

          • methods/variables I still want to rename
          • reminders to create new jira's to track known issues / future improvements

          I plan to deal with those over the next 24 hours, but none of those changes should have any impact on the functionality / performance of the patch as it currently stands.

          Brett Lucey & Andrew Muldowney: I'd really appreciate it if you guys could take a gander at the latest version(s) of the patch and give my any thoughts you have.

          In particular: i know you've been using an older patch in production for a while now, could you take this latest version for a spin using some of your real data & queries and set my mind at ease that i haven't introduced any horrible performance problems with any of hte refacotring/code cleanup / bug fixes i've made?

          Changes in this patch
          • TestCloudPivotFacet
            • a bit more logging
            • dial back overrequest w/ comment (we're focused on refinement here)
            • fix the num iters = 5 (no need to be higher on nightly runs, already increase the index size & num values per field)
          • DistributedFacetPivotLargeTest
            • new commented out test of "limit=0 + mincount=0 + missing=true"
              • i had a concern about this edge case w/refinement, but it turns out this isn't evensupported in the existing pivot code.
          • FacetComponent
            • minor formatting & comment cleanup
            • use PIVOT_KEY consistently throughout file
            • rename pivotPrefix -> PIVOT_REFINE_PREFIX; and move to top of file
            • move pivotRefinementCounter to top of file and add javadocs
            • tweaked handleResponses:
              • check PURPOSE_REFINE_FACETS and PURPOSE_REFINE_PIVOT_FACETS in seperate if blocks (instead of "else if"
              • doesn't change much at the moment, but smelled like a time bomb if/when we ever do pivot refinement in the same requests as facet.field refinement.
            • refactor away sanityCheckRefinements method
              • all it was doing was a single null check, so I inlined that
            • use emptyList() in createPivotFacetOutput
            • tweak variable names in createPivotFacetOutput
          • PivotFacetProcessor
            • clean up nocommits related to using FieldType methods where appropriate
            • javadoc linting
          • PivotFacetField
            • trim() javadocs & comment about future optimization
            • javadoc linting
          • PivotFacetValueCollection
            • trim() javadocs
            • javadoc linting
          • PivotFacet
            • javadoc linting
          • PivotFacetValue
            • javadoc linting
          Show
          Hoss Man added a comment - I've finished reviewing all the code and didn't find any new concerns. (woot!) I was hoping that more refactoring could be done to share common logic between the facet.field distributed code and the facet.pivot distributed code (akin to what it seemed like "applyToShardRequests()" was aiming for in earlier patches) but between the use of the "DistribFieldFacet" class and the anoying discrepency between "facet.mincount" and "facet.pivot.mincount" that seemed like more trouble then it's worth. In addition, my little-laptop-that-could has been churning away of several hundred iterations of TestCloudPivots using tests.nightly=true with this patch for the past few days, w/o any signs of bugs in the refinement code. At this point, there are only a handful of 'nocommit' comments left in the patch, that fall into 2 basic categories: methods/variables I still want to rename reminders to create new jira's to track known issues / future improvements I plan to deal with those over the next 24 hours, but none of those changes should have any impact on the functionality / performance of the patch as it currently stands. Brett Lucey & Andrew Muldowney : I'd really appreciate it if you guys could take a gander at the latest version(s) of the patch and give my any thoughts you have. In particular: i know you've been using an older patch in production for a while now, could you take this latest version for a spin using some of your real data & queries and set my mind at ease that i haven't introduced any horrible performance problems with any of hte refacotring/code cleanup / bug fixes i've made? Changes in this patch TestCloudPivotFacet a bit more logging dial back overrequest w/ comment (we're focused on refinement here) fix the num iters = 5 (no need to be higher on nightly runs, already increase the index size & num values per field) DistributedFacetPivotLargeTest new commented out test of "limit=0 + mincount=0 + missing=true" i had a concern about this edge case w/refinement, but it turns out this isn't evensupported in the existing pivot code. FacetComponent minor formatting & comment cleanup use PIVOT_KEY consistently throughout file rename pivotPrefix -> PIVOT_REFINE_PREFIX; and move to top of file move pivotRefinementCounter to top of file and add javadocs tweaked handleResponses: check PURPOSE_REFINE_FACETS and PURPOSE_REFINE_PIVOT_FACETS in seperate if blocks (instead of "else if" doesn't change much at the moment, but smelled like a time bomb if/when we ever do pivot refinement in the same requests as facet.field refinement. refactor away sanityCheckRefinements method all it was doing was a single null check, so I inlined that use emptyList() in createPivotFacetOutput tweak variable names in createPivotFacetOutput PivotFacetProcessor clean up nocommits related to using FieldType methods where appropriate javadoc linting PivotFacetField trim() javadocs & comment about future optimization javadoc linting PivotFacetValueCollection trim() javadocs javadoc linting PivotFacet javadoc linting PivotFacetValue javadoc linting
          Hide
          Andrew Muldowney added a comment -

          I'm on this, we'll test this against our current version and see how it shakes out.

          Show
          Andrew Muldowney added a comment - I'm on this, we'll test this against our current version and see how it shakes out.
          Hide
          Hoss Man added a comment -

          Fingers crossed, this is the final patch.

          Not functional changes, just resolving hte prviously mentioned nocommits by renaming variables/methods or replacing comments about Jiras for future improvements with the actual jira numbers.

          ant precommit passes.

          Show
          Hoss Man added a comment - Fingers crossed, this is the final patch. Not functional changes, just resolving hte prviously mentioned nocommits by renaming variables/methods or replacing comments about Jiras for future improvements with the actual jira numbers. ant precommit passes.
          Hide
          Hoss Man added a comment -

          No substance changes in this patch update, just needed updated to trunk since there has been a bunch of churn due to SOLR-4385.

          Show
          Hoss Man added a comment - No substance changes in this patch update, just needed updated to trunk since there has been a bunch of churn due to SOLR-4385 .
          Hide
          Andrew Muldowney added a comment -

          Inital reports are that the newest version is a fair bit slower.

          Caveats: We're still on 4.2 in production. So I've backported this to our 4.2 for testing. After going down that rabbit hole for a few days I've got the .wars so I can better test tomorrow but a small sample of 400 production queries on 166,343,278 documents had the following results

          Old Patch
          Average Query Time: 20.56ms

          New Refactor
          Average Query Time: 63.47ms

          I'm using SolrMeter to run these at 200 qpm on a set of five slaves. Tomorrow I'll give each version a much larger burn in, (the query file is 651mbs of queries). I'm not sure these are statistically accurate but I wanted to share what I'm seeing at the moment.

          Show
          Andrew Muldowney added a comment - Inital reports are that the newest version is a fair bit slower. Caveats: We're still on 4.2 in production. So I've backported this to our 4.2 for testing. After going down that rabbit hole for a few days I've got the .wars so I can better test tomorrow but a small sample of 400 production queries on 166,343,278 documents had the following results Old Patch Average Query Time: 20.56ms New Refactor Average Query Time: 63.47ms I'm using SolrMeter to run these at 200 qpm on a set of five slaves. Tomorrow I'll give each version a much larger burn in, (the query file is 651mbs of queries). I'm not sure these are statistically accurate but I wanted to share what I'm seeing at the moment.
          Hide
          Hoss Man added a comment -

          Damn... that is unfortunate.

          Which older patch are you comparing with?
          Do you have any idea where the slowdown may have been introduced?
          Can you post some details about the structure of the requests? (any chance the speed diff is just due to legitimate bugs in the older patch that have been fixed and now result in additional refinement?)

          Show
          Hoss Man added a comment - Damn... that is unfortunate. Which older patch are you comparing with? Do you have any idea where the slowdown may have been introduced? Can you post some details about the structure of the requests? (any chance the speed diff is just due to legitimate bugs in the older patch that have been fixed and now result in additional refinement?)
          Hide
          Andrew Muldowney added a comment -

          My previous results are crap. The logs were so full of trash their results are useless. After filtering out all refinement queries and other log lines that aren't genuine queries the results have changed significantly.

          Old:
          average 125.64ms @ 10273 queries
          New:
          average 131.29 @ 10279 queries

          Show
          Andrew Muldowney added a comment - My previous results are crap. The logs were so full of trash their results are useless. After filtering out all refinement queries and other log lines that aren't genuine queries the results have changed significantly. Old: average 125.64ms @ 10273 queries New: average 131.29 @ 10279 queries
          Hide
          Hoss Man added a comment -

          Whew ... ok, that diff looks much better.

          I'm still curious though about what exactly your comparisons look like ... how "old" is the version of the patch you are comparing with?

          Depending on how old it is, some of the bugs we've fixed over hte last few months could totally explain the perf change (ie: it may have been fast, but the numbers may be wrong and/or prone to infinite looping)

          Specific examples i'm thinking of where the gains in correctness would have definitely impacted performance...

          • the int overflow bug fixed ~ 16/Jul/14 13:58 prevented a bunch of refinement
          • if you use facet.offset: not enough refinement happening until ~ 28/Jul/14 18:54
          • if you use facet.missing + facet.mincount: sub-pivots of missing may not have been refined correctly until ~ 29/Jul/14 10:44
          • if you ever use facet.sort=index: refinement wasn't happening until ~ 04/Aug/14 09:18

          However: if you're comparing my latest patch against the last patch Andrew Muldowney uploaded (~ 18/Jul/14 08:11) and if you don't use facet.offset, or facet.missing, or facet.mincount, or facet.sort=index in any of those queries ... then i'm surprised that you would see much perf difference.


          My current thinking is that we should move forward with getting this committed to trunk, let it soak for a few days and get hammered by jenkins and then move from there to backport to 4x. We can always revisit performance improvements later, now that we (in my opinion anyway) have decent confidence in the correctness of behavior. (and it's not like the performance is abismal)

          Does anyone have concerns with moving forward and revisiting questions about performance improvements in other issues?

          Show
          Hoss Man added a comment - Whew ... ok, that diff looks much better. I'm still curious though about what exactly your comparisons look like ... how "old" is the version of the patch you are comparing with? Depending on how old it is, some of the bugs we've fixed over hte last few months could totally explain the perf change (ie: it may have been fast, but the numbers may be wrong and/or prone to infinite looping) Specific examples i'm thinking of where the gains in correctness would have definitely impacted performance... the int overflow bug fixed ~ 16/Jul/14 13:58 prevented a bunch of refinement if you use facet.offset: not enough refinement happening until ~ 28/Jul/14 18:54 if you use facet.missing + facet.mincount: sub-pivots of missing may not have been refined correctly until ~ 29/Jul/14 10:44 if you ever use facet.sort=index: refinement wasn't happening until ~ 04/Aug/14 09:18 However: if you're comparing my latest patch against the last patch Andrew Muldowney uploaded (~ 18/Jul/14 08:11) and if you don't use facet.offset, or facet.missing, or facet.mincount, or facet.sort=index in any of those queries ... then i'm surprised that you would see much perf difference. My current thinking is that we should move forward with getting this committed to trunk, let it soak for a few days and get hammered by jenkins and then move from there to backport to 4x. We can always revisit performance improvements later, now that we (in my opinion anyway) have decent confidence in the correctness of behavior. (and it's not like the performance is abismal) Does anyone have concerns with moving forward and revisiting questions about performance improvements in other issues?
          Hide
          Erick Erickson added a comment -

          bq: My current thinking is that we should move forward with getting this committed to trunk....

          Correct behavior always trumps performance IMO so I agree. Especially for a 5% difference in perf.....

          Andrew:

          Many thanks for reporting this info!

          Show
          Erick Erickson added a comment - bq: My current thinking is that we should move forward with getting this committed to trunk.... Correct behavior always trumps performance IMO so I agree. Especially for a 5% difference in perf..... Andrew: Many thanks for reporting this info!
          Hide
          Andrew Muldowney added a comment - - edited

          The "Old" but I was using was from 6/18. So its missing a whole bunch of the latest fixes. I agree the new stuff is certainly more accurate and the performance is basically indistinguishable.

          Any word on a release date for 4.10?

          Show
          Andrew Muldowney added a comment - - edited The "Old" but I was using was from 6/18. So its missing a whole bunch of the latest fixes. I agree the new stuff is certainly more accurate and the performance is basically indistinguishable. Any word on a release date for 4.10?
          Hide
          Hoss Man added a comment -

          Patch updated to trunk to deal with some minor compilation failures introduced by a (largely) unrelated commit a few hours ago (BytesRefBuilder)

          I'm currently running precommit - but once that's done i'll push to trunk.

          Show
          Hoss Man added a comment - Patch updated to trunk to deal with some minor compilation failures introduced by a (largely) unrelated commit a few hours ago (BytesRefBuilder) I'm currently running precommit - but once that's done i'll push to trunk.
          Hide
          ASF subversion and git services added a comment -

          Commit 1617789 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1617789 ]

          SOLR-2894: Distributed query support for facet.pivot

          Show
          ASF subversion and git services added a comment - Commit 1617789 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1617789 ] SOLR-2894 : Distributed query support for facet.pivot
          Hide
          Mark Miller added a comment -

          I just hit a fail. I've attatched the log.

          Show
          Mark Miller added a comment - I just hit a fail. I've attatched the log.