Solr
  1. Solr
  2. SOLR-2894

Implement distributed pivot faceting

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented.

      1. SOLR-2894.patch
        94 kB
        Dan Cooper
      2. SOLR-2894.patch
        94 kB
        Erik Hatcher
      3. SOLR-2894-reworked.patch
        47 kB
        Dzmitry Zhemchuhou
      4. SOLR-2894.patch
        49 kB
        Chris Russell
      5. SOLR-2894.patch
        51 kB
        Chris Russell
      6. SOLR-2894.patch
        60 kB
        Chris Russell
      7. SOLR-2894.patch
        60 kB
        Chris Russell
      8. SOLR-2894.patch
        141 kB
        Andrew Muldowney
      9. SOLR-2894.patch
        101 kB
        Andrew Muldowney
      10. SOLR-2894.patch
        98 kB
        Andrew Muldowney
      11. SOLR-2894.patch
        106 kB
        Andrew Muldowney
      12. SOLR-2894.patch
        109 kB
        Andrew Muldowney
      13. SOLR-2894.patch
        113 kB
        Andrew Muldowney
      14. SOLR-2894.patch
        116 kB
        Andrew Muldowney
      15. SOLR-2894.patch
        135 kB
        Andrew Muldowney
      16. SOLR-2894.patch
        127 kB
        Brett Lucey
      17. SOLR-2894.patch
        94 kB
        Brett Lucey
      18. SOLR-2894.patch
        101 kB
        Brett Lucey
      19. dateToObject.patch
        2 kB
        Elran Dvir
      20. SOLR-2894.patch
        102 kB
        Brett Lucey
      21. SOLR-2894.patch
        102 kB
        Brett Lucey

        Issue Links

          Activity

          Erik Hatcher created issue -
          Hide
          Ben Roubicek added a comment -

          Based on SOLR-792, it looked like there was some traction in getting distributed pivoting in the trunk codebase beyond the functional prototype. This feature has a lot of value within my company where we perform 50 separate queries where one would suffice if we had distributed pivot support.

          Show
          Ben Roubicek added a comment - Based on SOLR-792 , it looked like there was some traction in getting distributed pivoting in the trunk codebase beyond the functional prototype. This feature has a lot of value within my company where we perform 50 separate queries where one would suffice if we had distributed pivot support.
          Antoine Le Floc'h made changes -
          Field Original Value New Value
          Link This issue is related to SOLR-792 [ SOLR-792 ]
          Hide
          Antoine Le Floc'h added a comment -

          Do you think that this will be available for Solr 4.0 ? I would think that this is very similar to distributing regular facets ?

          Show
          Antoine Le Floc'h added a comment - Do you think that this will be available for Solr 4.0 ? I would think that this is very similar to distributing regular facets ?
          Dan Cooper made changes -
          Attachment distribPatch-05-03-12.txt [ 12517097 ]
          Hide
          Dan Cooper added a comment -

          Added a patch to provide distributed pivot faceting. We've been running this code for a while now and it seems to work OK, also created a unit test to test distributed pivot faceting on a small set of data.

          The patch was created against Solr trunk revision 1297102.

          It should perform in much the same way as single shard pivot faceting. It only sorts by count if you specify that option otherwise it returns results in the order they were generated (may be useful is performance is important but ordering is not). Most will want to specify facet.sort=count. This patch also supports limiting results using facet.limit.

          To do the merge I'm converting the NamedList objects that get returned by each shard in a giant map (should be more efficient for merging the results) and then converting back into a NamedList when the merge is complete. This merge should support N depth pivots but I've only properly tested a depth of 2.

          I've added some new parameters to support the features we require from pivot faceting and thought they may as well go in the patch in case others find them useful.

          • facet.pivot.limit.method
            • set to 'combined' if you want only the N number of top results to be returned across all pivots, where N is set by facet.limit. e.g. if you pivoted by country,manufacturer and limited by 5, obviously the top 5 countries would be returned, but only the top 5 manufacturers by combined total would be returned too. e.g. Each country would return the same 5 manufacturers (or less if no results).
          • facet.pivot.limit.ignore
            • Ignores the specified field from the limiting operations. e.g. if you pivoted by country,manufacturer and limited by 5 and set facet.pivot.limit.ignore=country then you would get all available countries returned (not limited) but only 5 manufacturers for each country.

          Can someone test the patch and give some feedback?

          Show
          Dan Cooper added a comment - Added a patch to provide distributed pivot faceting. We've been running this code for a while now and it seems to work OK, also created a unit test to test distributed pivot faceting on a small set of data. The patch was created against Solr trunk revision 1297102. It should perform in much the same way as single shard pivot faceting. It only sorts by count if you specify that option otherwise it returns results in the order they were generated (may be useful is performance is important but ordering is not). Most will want to specify facet.sort=count. This patch also supports limiting results using facet.limit. To do the merge I'm converting the NamedList objects that get returned by each shard in a giant map (should be more efficient for merging the results) and then converting back into a NamedList when the merge is complete. This merge should support N depth pivots but I've only properly tested a depth of 2. I've added some new parameters to support the features we require from pivot faceting and thought they may as well go in the patch in case others find them useful. facet.pivot.limit.method set to 'combined' if you want only the N number of top results to be returned across all pivots, where N is set by facet.limit. e.g. if you pivoted by country,manufacturer and limited by 5, obviously the top 5 countries would be returned, but only the top 5 manufacturers by combined total would be returned too. e.g. Each country would return the same 5 manufacturers (or less if no results). facet.pivot.limit.ignore Ignores the specified field from the limiting operations. e.g. if you pivoted by country,manufacturer and limited by 5 and set facet.pivot.limit.ignore=country then you would get all available countries returned (not limited) but only 5 manufacturers for each country. Can someone test the patch and give some feedback?
          Dan Cooper made changes -
          Attachment distribPatch-05-03-12.txt [ 12517097 ]
          Dan Cooper made changes -
          Attachment SOLR-2894.patch [ 12517098 ]
          Dan Cooper made changes -
          Attachment SOLR-2894.patch [ 12517098 ]
          Dan Cooper made changes -
          Attachment SOLR-2894.patch [ 12517099 ]
          Mark Miller made changes -
          Fix Version/s 4.0 [ 12314992 ]
          Affects Version/s 4.0 [ 12314992 ]
          Hide
          Chris Russell added a comment - - edited

          Hi Dan.
          I have been working with your patch, 2894, to Solr and I am having some issues with unit testing.

          First of all the patch doesn't seem to apply cleanly:
          crussell@WAT-CRUSSELL /cygdrive/d/matrixdev/solr_1297102/CBSolr/SolrLucene
          $ patch -p0 -i SOLR-2894.patch
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #10 FAILED at 797.
          1 out of 16 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #2 FAILED at 106.
          1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/util/NamedListHelper.java
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          $ patch --version
          patch 2.5.8
          Copyright (C) 1988 Larry Wall
          Copyright (C) 2002 Free Software Foundation, Inc.

          A lot of the contents of your original patch seemed to be formatting changes like where line breaks should go and how spacing should be handled.
          I was able to examine the patch and manually incorporate the changes from the failed chunks.
          One question, on this line (1119) of your patch, why did you choose not to initialize the map as the ones above it are? Couldn't that cause an NRE?
          + public SimpleOrderedMap<List<NamedList<Object>>> pivotFacets;

          While looking at DistributedFacetPivotTest.java I noticed an error on line 42. Your "q" should probably be ":" instead of "*". Edit: asterisk colon asterisk instead of just asterisk.

          I've attached the patch file I came up with. I added the initialization and test correction I mentioned.

          When I ran the solr/lucene unit tests after I patched, there are some unit test failures like this one:
          [junit] Testsuite: org.apache.solr.TestDistributedGrouping
          [junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedGrouping): FAILED
          [junit] .facet_counts.size()==5,4skipped=0,0
          [junit] junit.framework.AssertionFailedError: .facet_counts.size()==5,4skipped=0,0
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:656)
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:383)
          [junit] at org.apache.solr.TestDistributedGrouping.doTest(TestDistributedGrouping.java:51)
          [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:671)
          [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20)
          [junit] at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:736)
          [junit] at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:632)
          [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
          [junit] at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:531)
          [junit] at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:593)
          [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
          [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
          [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20)
          [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
          [junit]
          [junit]
          [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 2.685 sec
          [junit]
          [junit] ------------- Standard Error -----------------
          [junit] 2520 T1 oas.BaseDistributedSearchTestCase.compareResponses SEVERE Mismatched responses:
          [junit] {responseHeader=

          {status=0,QTime=19}

          ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={},facet_pivot={}}}
          [junit] {responseHeader=

          {status=0,QTime=18}

          ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={}}}
          [junit] NOTE: reproduce with: ant test -Dtestcase=TestDistributedGrouping -Dtestmethod=testDistribSearch -Dtests.seed=-c7cfa73dbca93c9:-4751af558bf6f59:3e523b50870b3b1b -Dargs="-Dfile.encoding=Cp1252"

          It looks like the facet_pivot is being included in the results of one query, and not the other. I'm trying to figure out why this is occurring.
          Any insight would be appreciated.

          Show
          Chris Russell added a comment - - edited Hi Dan. I have been working with your patch, 2894, to Solr and I am having some issues with unit testing. First of all the patch doesn't seem to apply cleanly: crussell@WAT-CRUSSELL /cygdrive/d/matrixdev/solr_1297102/CBSolr/SolrLucene $ patch -p0 -i SOLR-2894 .patch patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #10 FAILED at 797. 1 out of 16 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #2 FAILED at 106. 1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/util/NamedListHelper.java patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java $ patch --version patch 2.5.8 Copyright (C) 1988 Larry Wall Copyright (C) 2002 Free Software Foundation, Inc. A lot of the contents of your original patch seemed to be formatting changes like where line breaks should go and how spacing should be handled. I was able to examine the patch and manually incorporate the changes from the failed chunks. One question, on this line (1119) of your patch, why did you choose not to initialize the map as the ones above it are? Couldn't that cause an NRE? + public SimpleOrderedMap<List<NamedList<Object>>> pivotFacets; While looking at DistributedFacetPivotTest.java I noticed an error on line 42. Your "q" should probably be " : " instead of "*". Edit: asterisk colon asterisk instead of just asterisk. I've attached the patch file I came up with. I added the initialization and test correction I mentioned. When I ran the solr/lucene unit tests after I patched, there are some unit test failures like this one: [junit] Testsuite: org.apache.solr.TestDistributedGrouping [junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedGrouping): FAILED [junit] .facet_counts.size()==5,4skipped=0,0 [junit] junit.framework.AssertionFailedError: .facet_counts.size()==5,4skipped=0,0 [junit] at org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:656) [junit] at org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:383) [junit] at org.apache.solr.TestDistributedGrouping.doTest(TestDistributedGrouping.java:51) [junit] at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:671) [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20) [junit] at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:736) [junit] at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:632) [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) [junit] at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:531) [junit] at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:593) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) [junit] at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:20) [junit] at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) [junit] [junit] [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 2.685 sec [junit] [junit] ------------- Standard Error ----------------- [junit] 2520 T1 oas.BaseDistributedSearchTestCase.compareResponses SEVERE Mismatched responses: [junit] {responseHeader= {status=0,QTime=19} ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={},facet_pivot={}}} [junit] {responseHeader= {status=0,QTime=18} ,grouped={a_si={matches=0,groups=[]}},facet_counts={facet_queries={},facet_fields={a_t={}},facet_dates={},facet_ranges={}}} [junit] NOTE: reproduce with: ant test -Dtestcase=TestDistributedGrouping -Dtestmethod=testDistribSearch -Dtests.seed=-c7cfa73dbca93c9:-4751af558bf6f59:3e523b50870b3b1b -Dargs="-Dfile.encoding=Cp1252" It looks like the facet_pivot is being included in the results of one query, and not the other. I'm trying to figure out why this is occurring. Any insight would be appreciated.
          Hide
          Chris Russell added a comment -

          Some modifications to SOLR-2894.patch that I made while trying to get it to patch on rev 1297102.

          Show
          Chris Russell added a comment - Some modifications to SOLR-2894 .patch that I made while trying to get it to patch on rev 1297102.
          Chris Russell made changes -
          Attachment distributed_pivot.patch [ 12523869 ]
          Hide
          Chris Russell added a comment -

          I figured out the unit test. It's because facet_pivot is different from the other facet_blah in that it only comes back when you request pivots, whereas the others always come back. In Dan's patch he had facet_pivot coming back even when it was empty or not requested, and this did not match the behavior of pivots in a non-distributed setting. I am working on an update to my patch.

          Show
          Chris Russell added a comment - I figured out the unit test. It's because facet_pivot is different from the other facet_blah in that it only comes back when you request pivots, whereas the others always come back. In Dan's patch he had facet_pivot coming back even when it was empty or not requested, and this did not match the behavior of pivots in a non-distributed setting. I am working on an update to my patch.
          Hide
          Chris Russell added a comment -

          facet_pivot will not show up in distrib search if no contents, reversed behavior of sorting to comply with solr standard for facet.sort

          Show
          Chris Russell added a comment - facet_pivot will not show up in distrib search if no contents, reversed behavior of sorting to comply with solr standard for facet.sort
          Chris Russell made changes -
          Attachment distributed_pivot.patch [ 12524113 ]
          Hide
          Hoss Man added a comment -

          Erik: Can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?

          Show
          Hoss Man added a comment - Erik: Can you triage this for 4.0? commit if you think it's ready, otherwise remove the fix version?
          Hoss Man made changes -
          Assignee Erik Hatcher [ ehatcher ]
          Hide
          Trey Grainger added a comment -

          For what it's worth, we're actively using the April 25th version of this patch in production at CareerBuilder (with an older version of trunk) with no issues.

          Show
          Trey Grainger added a comment - For what it's worth, we're actively using the April 25th version of this patch in production at CareerBuilder (with an older version of trunk) with no issues.
          Hide
          Erik Hatcher added a comment -

          Trey - thanks for the positive feedback. I'll apply the patch, run the tests, review the code, and so on. Might be a couple of weeks, unless I can get to this today.

          Show
          Erik Hatcher added a comment - Trey - thanks for the positive feedback. I'll apply the patch, run the tests, review the code, and so on. Might be a couple of weeks, unless I can get to this today.
          Hide
          Erik Hatcher added a comment -

          Patch updated to 4x branch.

          Simon, just for you, I removed NamedListHelper as well (folded its one method into PivotFacetHelper)

          Tests pass.

          Show
          Erik Hatcher added a comment - Patch updated to 4x branch. Simon, just for you, I removed NamedListHelper as well (folded its one method into PivotFacetHelper) Tests pass.
          Erik Hatcher made changes -
          Attachment SOLR-2894.patch [ 12532127 ]
          Hide
          Erik Hatcher added a comment -

          Trey - would you be in a position to test out the latest patch? I built my latest one by starting with the March 5, 2012 SOLR-2894.patch file.

          Show
          Erik Hatcher added a comment - Trey - would you be in a position to test out the latest patch? I built my latest one by starting with the March 5, 2012 SOLR-2894 .patch file.
          Hide
          Chris Russell added a comment -

          Erik, what revision of solr did you apply the patch to? Did you not encounter the issues I encountered?

          Show
          Chris Russell added a comment - Erik, what revision of solr did you apply the patch to? Did you not encounter the issues I encountered?
          Hide
          Chris Russell added a comment -

          Erik, I can't get your patch to apply cleanly to solr 1350445

          $ patch -p0 -i SOLR-2894.patch
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #2 FAILED at 103.
          1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #11 FAILED at 799.
          1 out of 17 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          patching file solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java

          Show
          Chris Russell added a comment - Erik, I can't get your patch to apply cleanly to solr 1350445 $ patch -p0 -i SOLR-2894 .patch patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #2 FAILED at 103. 1 out of 2 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #11 FAILED at 799. 1 out of 17 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java patching file solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java
          Hide
          Erik Hatcher added a comment -

          Chris - I generated the patch from an r1350348 checkout (which is not branch_4x as I mentioned above). It might be a couple of weeks before I can get back to this and sort it out though, unfortunately. Sorry.

          Show
          Erik Hatcher added a comment - Chris - I generated the patch from an r1350348 checkout (which is not branch_4x as I mentioned above). It might be a couple of weeks before I can get back to this and sort it out though, unfortunately. Sorry.
          Hide
          Chris Russell added a comment -

          I have posted an enhancement to this patch as SOLR-3583. It is based on the Apr 25th version.

          Show
          Chris Russell added a comment - I have posted an enhancement to this patch as SOLR-3583 . It is based on the Apr 25th version.
          Hide
          Trey Grainger added a comment -

          Hi Erik,

          Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version.

          I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend.

          Thanks,

          -Trey

          Show
          Trey Grainger added a comment - Hi Erik, Sorry, I missed your original message asking me if I could test out the latest patch - I'd be happy to help. I just tried both your patch and the April 25th patch against the Solr 4.0 Alpha revision and neither applied immediately. I'll see if I can find some time on Sunday to try to get a revision sorted out which will work with the current version. I think there are some changes in the April 24th patch which may need to be re-applied if your changes were based upon the earlier patch. I'll know more once I've had a chance to dig in later this weekend. Thanks, -Trey
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hoss Man made changes -
          Fix Version/s 4.0 [ 12322455 ]
          Fix Version/s 4.0-ALPHA [ 12314992 ]
          Hide
          Erik Hatcher added a comment -

          No way I'm getting to this any time soon, sorry. Trey - feel free to prod this one forward with more testing/feedback.

          Show
          Erik Hatcher added a comment - No way I'm getting to this any time soon, sorry. Trey - feel free to prod this one forward with more testing/feedback.
          Erik Hatcher made changes -
          Assignee Erik Hatcher [ ehatcher ]
          Hide
          Dzmitry Zhemchuhou added a comment -

          I have reapplied the SOLR-2894 patch from Jun 14th to the trunk while removing most of code formatting changes that were in it. On top of that I changed the FacetComponent.refineFacets() method to add facet_pivot key-value only when there are values in the pivotFacets map, which fixes distributed search unit tests.

          Show
          Dzmitry Zhemchuhou added a comment - I have reapplied the SOLR-2894 patch from Jun 14th to the trunk while removing most of code formatting changes that were in it. On top of that I changed the FacetComponent.refineFacets() method to add facet_pivot key-value only when there are values in the pivotFacets map, which fixes distributed search unit tests.
          Dzmitry Zhemchuhou made changes -
          Attachment SOLR-2894-reworked.patch [ 12538443 ]
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Robert Muir made changes -
          Fix Version/s 4.0 [ 12322551 ]
          Fix Version/s 4.0-BETA [ 12322455 ]
          Hide
          Robert Muir added a comment -

          moving all 4.0 issues not touched in a month to 4.1

          Show
          Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12322551 ]
          Hide
          Chris Russell added a comment - - edited

          Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis?
          So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible?
          For example a user might set:

          f.field1.facet.limit=10
          f.field1.facet.pivot.limit.method=combined
          f.field2.facet.limit=20

          And the combined method will not be used...
          If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to?

          Show
          Chris Russell added a comment - - edited Regarding facet.pivot.limit.method and facet.limit, it looks like these are not checked on a per-field basis? So, if a user sets different limits for different fields and wants 'combined' limiting, that is not possible? For example a user might set: f.field1.facet.limit=10 f.field1.facet.pivot.limit.method=combined f.field2.facet.limit=20 And the combined method will not be used... If the user sets facet.pivot.limit.method=combined it looks like the same limit will be used for all fields? Whatever the global facet.limit is set to?
          Hide
          Chris Russell added a comment -

          In my experience using this patch, it seems that it does not over-request when enforcing a limit?
          This is problematic because, for example, in a situation where you have many slaves and you are pivoting on a fairly evenly distributed field and setting your facet limit to X, the Xth distinct value for that field by document count on each slave is likely to be different. The result is that some facet values close to your limit boundary will not get reported for aggregation, which will make your ultimate results somewhat inaccurate.

          It was my impression that other facet-based features of solr over-request when there is a limit to combat this situation? For example if you specify limit 10, the distributed query might have limit 100 or 1000, and then during aggregation it would be limited to the top 10.

          I am working on similar functionality for this patch.

          Show
          Chris Russell added a comment - In my experience using this patch, it seems that it does not over-request when enforcing a limit? This is problematic because, for example, in a situation where you have many slaves and you are pivoting on a fairly evenly distributed field and setting your facet limit to X, the Xth distinct value for that field by document count on each slave is likely to be different. The result is that some facet values close to your limit boundary will not get reported for aggregation, which will make your ultimate results somewhat inaccurate. It was my impression that other facet-based features of solr over-request when there is a limit to combat this situation? For example if you specify limit 10, the distributed query might have limit 100 or 1000, and then during aggregation it would be limited to the top 10. I am working on similar functionality for this patch.
          Hide
          Chris Russell added a comment -

          In regards to my above comment, I have determined that it is because if you specify a limit for a field that you are not requesting facet counts for, solr will not automatically over-request on that field.
          i.e.
          facet.pivot=somefield
          f.somefield.facet.limit=10

          This will make your pivots weird because the limit of 10 will not be over requested unless you add this line:
          facet.field=somefield

          Since solr does not do distributed pivoting yet, this has not been an issue yet.
          I am working on an update to the patch that will correct this issue.

          Show
          Chris Russell added a comment - In regards to my above comment, I have determined that it is because if you specify a limit for a field that you are not requesting facet counts for, solr will not automatically over-request on that field. i.e. facet.pivot=somefield f.somefield.facet.limit=10 This will make your pivots weird because the limit of 10 will not be over requested unless you add this line: facet.field=somefield Since solr does not do distributed pivoting yet, this has not been an issue yet. I am working on an update to the patch that will correct this issue.
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12553119 ]
          Hide
          Chris Russell added a comment - - edited

          Updated to apply to trunk 1404975. (Based on Dzmitry's update)

          Added ability to limit on individual fields. (f.fieldname.facet.limit)
          Added pivot fields to fields being over-requested during distributed queries.
          Made it so you can pivot on a single field.

          Show
          Chris Russell added a comment - - edited Updated to apply to trunk 1404975. (Based on Dzmitry's update) Added ability to limit on individual fields. (f.fieldname.facet.limit) Added pivot fields to fields being over-requested during distributed queries. Made it so you can pivot on a single field.
          Chris Russell made changes -
          Attachment distributed_pivot.patch [ 12524113 ]
          Chris Russell made changes -
          Attachment distributed_pivot.patch [ 12523869 ]
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12553194 ]
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12553119 ]
          Hide
          Shahar Davidson added a comment -

          Just thought I'd add some feedback on this valuable patch.

          I run some tests with the latest patch (Nov. 12) and the limit-per-field feature seems to be working alright. (Nice job Chris!)

          I did, however, encounter 2 other issues (which I guess are related):
          (1) There's no default sorting method. i.e. if no facet.sort is specified then results are not sorted. (this is a deviation from the current non-distributed pivot faceting behavior)
          (2) Sorting per-field does not work. (i.e. f.<field>.facet.sort=<method> does not work)

          Show
          Shahar Davidson added a comment - Just thought I'd add some feedback on this valuable patch. I run some tests with the latest patch (Nov. 12) and the limit-per-field feature seems to be working alright. (Nice job Chris!) I did, however, encounter 2 other issues (which I guess are related): (1) There's no default sorting method. i.e. if no facet.sort is specified then results are not sorted. (this is a deviation from the current non-distributed pivot faceting behavior) (2) Sorting per-field does not work. (i.e. f.<field>.facet.sort=<method> does not work)
          Hide
          Chris Russell added a comment -

          Thanks Shahar.

          In regard to #1, there is (count) and there are tests that cover that in DistributedFacetPivotTest.java. Are you sure the patch applied correctly to your version of solr?

          In regard to #2, that is correct. Does per field sorting work with non-distributed pivots? I guess it was never implemented by the original author.

          Show
          Chris Russell added a comment - Thanks Shahar. In regard to #1, there is (count) and there are tests that cover that in DistributedFacetPivotTest.java. Are you sure the patch applied correctly to your version of solr? In regard to #2, that is correct. Does per field sorting work with non-distributed pivots? I guess it was never implemented by the original author.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          #1 Yes, I believe the patch applied correctly. Once more, default sorting method is not defined and I didn't see where DistributedFacetPivotTest is testing the default sort. (I did however see where facet.sort is tested after facet.sort=count was explicitly specified).
          #2 Yes, Per field sorting works with non-distributed pivots. We were working in a non-distributed configuration up to some point and, until then, per field sorting worked properly (that was on Solr 4.0)

          We also encountered another issue when sorting by count:
          There might be a case where 2 of the returned values have the same count. In such a case, the PivotNamedListCountComparator attempts to sort by value. The problem is that when facet.missing=true is specified then one of the 2 values which have the same count may be null (missing) and in such cases the get("value").toString() operation will fail (NullPointerException).

          Show
          Shahar Davidson added a comment - Hi Chris, #1 Yes, I believe the patch applied correctly. Once more, default sorting method is not defined and I didn't see where DistributedFacetPivotTest is testing the default sort. (I did however see where facet.sort is tested after facet.sort=count was explicitly specified). #2 Yes, Per field sorting works with non-distributed pivots. We were working in a non-distributed configuration up to some point and, until then, per field sorting worked properly (that was on Solr 4.0) We also encountered another issue when sorting by count: There might be a case where 2 of the returned values have the same count. In such a case, the PivotNamedListCountComparator attempts to sort by value. The problem is that when facet.missing=true is specified then one of the 2 values which have the same count may be null (missing) and in such cases the get("value").toString() operation will fail (NullPointerException).
          Hide
          Chris Russell added a comment -

          Implemented default pivot facet sort.
          Implemented per-field pivot facet sorting.
          Fixed NRE with sorting when facet.missing is on.

          Show
          Chris Russell added a comment - Implemented default pivot facet sort. Implemented per-field pivot facet sorting. Fixed NRE with sorting when facet.missing is on.
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12561977 ]
          Hide
          Chris Russell added a comment -

          Shahar, I have corrected the issues that you mentioned, I believe.
          Default sorting is now count if a facet limit exists, otherwise index.
          I fixed the facet.missing stuff, and I went with the sorting convention that the rest of solr seems to have which is that null values always go to the bottom of the list.

          Try the new patch.

          Show
          Chris Russell added a comment - Shahar, I have corrected the issues that you mentioned, I believe. Default sorting is now count if a facet limit exists, otherwise index. I fixed the facet.missing stuff, and I went with the sorting convention that the rest of solr seems to have which is that null values always go to the bottom of the list. Try the new patch.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          I appreciate your efforts on this.
          I tried the new patch and run into numerous NPEs and didn't get to verify the sorting.

          Here's what I'm getting:

          SEVERE: null:java.lang.NullPointerException
                  at java.util.TreeMap.compare(TreeMap.java:1188)
                  at java.util.TreeMap.put(TreeMap.java:531)
                  at org.apache.solr.handler.component.PivotFacetHelper.convertPivotsToMaps(PivotFacetHelper.java:317)
                  at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:542)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
                  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) 
                  ....
          

          And also:

          SEVERE: null:java.lang.NullPointerException
                  at java.util.TreeMap.getEntry(TreeMap.java:342)
                  at java.util.TreeMap.get(TreeMap.java:273)
                  at org.apache.solr.handler.component.FacetComponent.mergePivotFacet(FacetComponent.java:692)
                  at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:552)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
                  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
                  ....
          

          Since I couldn't get to test the sorting fix, I peaked at the PivotNamedListCountComparator fix and I believe it will sort the "null" values on top (due to the return values of handleSortWhenOneValueIsNull) and regardless of the count number (due to the fact that the null values are checked before comparing the counts) - correct me if I'm wrong.

          Appreciate your help with this,

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, I appreciate your efforts on this. I tried the new patch and run into numerous NPEs and didn't get to verify the sorting. Here's what I'm getting: SEVERE: null:java.lang.NullPointerException at java.util.TreeMap.compare(TreeMap.java:1188) at java.util.TreeMap.put(TreeMap.java:531) at org.apache.solr.handler.component.PivotFacetHelper.convertPivotsToMaps(PivotFacetHelper.java:317) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:542) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) .... And also: SEVERE: null:java.lang.NullPointerException at java.util.TreeMap.getEntry(TreeMap.java:342) at java.util.TreeMap.get(TreeMap.java:273) at org.apache.solr.handler.component.FacetComponent.mergePivotFacet(FacetComponent.java:692) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:552) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:336) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) .... Since I couldn't get to test the sorting fix, I peaked at the PivotNamedListCountComparator fix and I believe it will sort the "null" values on top (due to the return values of handleSortWhenOneValueIsNull) and regardless of the count number (due to the fact that the null values are checked before comparing the counts) - correct me if I'm wrong. Appreciate your help with this, Shahar.
          Hide
          Chris Russell added a comment -

          That is odd, it worked when I tested it on my box. I will take another look.

          Show
          Chris Russell added a comment - That is odd, it worked when I tested it on my box. I will take another look.
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.1 [ 12321141 ]
          Hide
          Chris Russell added a comment - - edited

          Fixed NRE when using facet.missing.
          Added test for over-requesting / refinement.
          Fixed issue that broke over-requesting when local params were present in facet.field.

          In correcting the issue with over-requesting I noticed that no refinement is being done for distributed pivot facets.
          This is an issue because it means that your pivot facet counts may not be correct.
          I am working on an enhancement to add refinement for pivot facets.

          Show
          Chris Russell added a comment - - edited Fixed NRE when using facet.missing. Added test for over-requesting / refinement. Fixed issue that broke over-requesting when local params were present in facet.field. In correcting the issue with over-requesting I noticed that no refinement is being done for distributed pivot facets. This is an issue because it means that your pivot facet counts may not be correct. I am working on an enhancement to add refinement for pivot facets.
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12563081 ]
          Hide
          Chris Russell added a comment -

          Shahar, I think I've corrected the issues you reported. Take a look.
          And yes, the sorting will always put the nulls on one side, although I believe it is the bottom of the list based on my testing. This matches current single-core behavior from solr, as far as I can tell.

          Show
          Chris Russell added a comment - Shahar, I think I've corrected the issues you reported. Take a look. And yes, the sorting will always put the nulls on one side, although I believe it is the bottom of the list based on my testing. This matches current single-core behavior from solr, as far as I can tell.
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          Thanks for the updated patch.

          I started testing this latest patch and encountered a few problems when NULL values are present:
          (1) If, for example, FIELD_A contains null values and FIELD_B does not, then "facet.pivot=FIELD_A,FIELD_B" will return more than 1 entry of NULLs for FIELD_A. To be exact, it return an NULL entry per shard. (In non-distributed search, there is a single NULL entry for FIELD_A)
          (2) If facet.pivot=FIELD_B,FIELD_A then one may see that for a given of field FIELD_B there is more than 1 entry of FIELD_A with NULL value. (In non-distributed search, there's only 1 entry of a NULL value under a given FIELD_B)

          It seems as if there's a problem when merging null values from cross-shard pivots.
          Does the relevant test-case include some sort of check on data with null values?

          As far as sorting is concerned, results seem to be sorted properly (per field as well).

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, Thanks for the updated patch. I started testing this latest patch and encountered a few problems when NULL values are present: (1) If, for example, FIELD_A contains null values and FIELD_B does not, then "facet.pivot=FIELD_A,FIELD_B" will return more than 1 entry of NULLs for FIELD_A. To be exact, it return an NULL entry per shard. (In non-distributed search, there is a single NULL entry for FIELD_A) (2) If facet.pivot=FIELD_B,FIELD_A then one may see that for a given of field FIELD_B there is more than 1 entry of FIELD_A with NULL value. (In non-distributed search, there's only 1 entry of a NULL value under a given FIELD_B) It seems as if there's a problem when merging null values from cross-shard pivots. Does the relevant test-case include some sort of check on data with null values? As far as sorting is concerned, results seem to be sorted properly (per field as well). Shahar.
          Hide
          Chris Russell added a comment -

          Corrected null aggregation issues when docs contain null values for fields pivoting on. Added logic to remove local params from pivot QS vars when determining over-request.

          Show
          Chris Russell added a comment - Corrected null aggregation issues when docs contain null values for fields pivoting on. Added logic to remove local params from pivot QS vars when determining over-request.
          Chris Russell made changes -
          Attachment SOLR-2894.patch [ 12564765 ]
          Hide
          Ken Ip added a comment -

          Hi Chris,

          Thanks for the patch. Any chance this can be applied to 4_0 or 4_1 branch? We have no problem applying it to truck but it can't be applied to 4_0 nor 4_1. Appreciated.

          ➜ lucene_solr_4_1 patch -p0 -i SOLR-2894.patch --dry-run
          patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java
          patching file solr/core/src/test/org/apache/solr/SingleDocShardFeeder.java
          patching file solr/core/src/test/org/apache/solr/TestRefinementAndOverrequestingForFieldFacetCounts.java
          patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java
          Hunk #1 FAILED at 16.
          Hunk #2 FAILED at 35.
          Hunk #5 succeeded at 287 with fuzz 2.
          2 out of 5 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej
          patching file solr/core/src/java/org/apache/solr/handler/component/NullGoesLastComparator.java
          patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
          Hunk #1 FAILED at 17.
          Hunk #2 FAILED at 43.
          2 out of 11 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej
          patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java
          patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java

          Show
          Ken Ip added a comment - Hi Chris, Thanks for the patch. Any chance this can be applied to 4_0 or 4_1 branch? We have no problem applying it to truck but it can't be applied to 4_0 nor 4_1. Appreciated. ➜ lucene_solr_4_1 patch -p0 -i SOLR-2894 .patch --dry-run patching file solr/core/src/test/org/apache/solr/handler/component/DistributedFacetPivotTest.java patching file solr/core/src/test/org/apache/solr/SingleDocShardFeeder.java patching file solr/core/src/test/org/apache/solr/TestRefinementAndOverrequestingForFieldFacetCounts.java patching file solr/core/src/java/org/apache/solr/handler/component/EntryCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotNamedListCountComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java Hunk #1 FAILED at 16. Hunk #2 FAILED at 35. Hunk #5 succeeded at 287 with fuzz 2. 2 out of 5 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/PivotFacetHelper.java.rej patching file solr/core/src/java/org/apache/solr/handler/component/NullGoesLastComparator.java patching file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 43. 2 out of 11 hunks FAILED – saving rejects to file solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java.rej patching file solr/core/src/java/org/apache/solr/util/PivotListEntry.java patching file solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java
          Hide
          Shahar Davidson added a comment -

          Hi Chris,

          Once again, your efforts are much appreciated!!

          I tested the latest patch (dated Jan. 14th) and the null aggregation issues were indeed resolved - thanks!

          Any chance this could be integrated into the upcoming release? (4.2?)

          Thanks,

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Chris, Once again, your efforts are much appreciated!! I tested the latest patch (dated Jan. 14th) and the null aggregation issues were indeed resolved - thanks! Any chance this could be integrated into the upcoming release? (4.2?) Thanks, Shahar.
          Hide
          Chris Russell added a comment -

          Ken and Shahar, thank you both for your interest in getting this patch usable with the current and upcoming releases.
          At this point in time, since refinement is not present in distributed pivoting, I would be hesitant to integrate this with release versions of solr. Indeed, where I work we have put features related to this functionality on hold until refinement can be implemented.
          I started working on implementing refinement for distributed pivots a couple of weeks ago and have handed it off to a teammate who will be seeing it through to completion.
          Currently we expect this will take another two to four weeks.
          At that time we will update what is here and look in to backporting / getting this committed.
          Thanks again.

          Show
          Chris Russell added a comment - Ken and Shahar, thank you both for your interest in getting this patch usable with the current and upcoming releases. At this point in time, since refinement is not present in distributed pivoting, I would be hesitant to integrate this with release versions of solr. Indeed, where I work we have put features related to this functionality on hold until refinement can be implemented. I started working on implementing refinement for distributed pivots a couple of weeks ago and have handed it off to a teammate who will be seeing it through to completion. Currently we expect this will take another two to four weeks. At that time we will update what is here and look in to backporting / getting this committed. Thanks again.
          Hide
          Andrew Muldowney added a comment - - edited

          As Chris said above me I've been working on distributed pivot faceting. The model had to be reworked to support this change. The new workflow is each shards result goes into a shardRequest map, those maps are combined and converted into a list what is what is is looked at. The combined results are compared to each shard's and refinement requests are made. Once all the refinement requests are made another round of refinement is queued up, this time going one level deeper than the previous refinement requests. Because of the nature of the system distributedprocess(rb) needed to be called from the refinepivotfacets to actually take the enqueued refinement requests and make them into fully fledged refinement queries. This caused some issue where other refinment types would get recalled as their refinement identifier was never cleared, Field facets are slightly modified to have a boolean for "needsRefinements" and one for "wasRefined" to help distinguish.

          Show
          Andrew Muldowney added a comment - - edited As Chris said above me I've been working on distributed pivot faceting. The model had to be reworked to support this change. The new workflow is each shards result goes into a shardRequest map, those maps are combined and converted into a list what is what is is looked at. The combined results are compared to each shard's and refinement requests are made. Once all the refinement requests are made another round of refinement is queued up, this time going one level deeper than the previous refinement requests. Because of the nature of the system distributedprocess(rb) needed to be called from the refinepivotfacets to actually take the enqueued refinement requests and make them into fully fledged refinement queries. This caused some issue where other refinment types would get recalled as their refinement identifier was never cleared, Field facets are slightly modified to have a boolean for "needsRefinements" and one for "wasRefined" to help distinguish.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12569602 ]
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12569602 ]
          Hide
          Andrew Muldowney added a comment - - edited

          An update to my previous patch. After more consideration, I've pulled the pivot facet logic out of DistributedProcess and call those specific parts when doing the iterative refinement process to avoid any side effects of other people putting code into DistributedProcess. I've also added better support for null values and added tests to ensure that you can send refinement requests for null values and not blow up the stack.

          Show
          Andrew Muldowney added a comment - - edited An update to my previous patch. After more consideration, I've pulled the pivot facet logic out of DistributedProcess and call those specific parts when doing the iterative refinement process to avoid any side effects of other people putting code into DistributedProcess. I've also added better support for null values and added tests to ensure that you can send refinement requests for null values and not blow up the stack.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12570315 ]
          Hide
          Shawn Heisey added a comment -

          Andrew, someone on IRC is trying to apply your patch, but we can't find a revision that it will apply to successfully, and there's no revision information in the patch. Also, whatever diff utility you used has put backslashes into the paths and that has to be manually fixed before it'll find the files on Linux. Can you update your source tree to the current version and use 'svn diff' (or 'git diff' if you've checked out with git) to create the patch and re-upload?

          Show
          Shawn Heisey added a comment - Andrew, someone on IRC is trying to apply your patch, but we can't find a revision that it will apply to successfully, and there's no revision information in the patch. Also, whatever diff utility you used has put backslashes into the paths and that has to be manually fixed before it'll find the files on Linux. Can you update your source tree to the current version and use 'svn diff' (or 'git diff' if you've checked out with git) to create the patch and re-upload?
          Hide
          Andrew Muldowney added a comment -

          New patch file for trunk, its a git patch from trunk solr pulled today, let me know if there are any issues applying.

          Show
          Andrew Muldowney added a comment - New patch file for trunk, its a git patch from trunk solr pulled today, let me know if there are any issues applying.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12572409 ]
          Hide
          Mark Miller added a comment -

          Thanks Andrew - this isn't my area of expertise, but hopefully someone will get to this eventually. Lot's of votes and watchers

          Show
          Mark Miller added a comment - Thanks Andrew - this isn't my area of expertise, but hopefully someone will get to this eventually. Lot's of votes and watchers
          Hide
          Monica Skidmore added a comment -

          Thanks, Andrew! We're upgrading our version of Solr, and this will make our users very happy. I'm hoping it will be committed now...

          Show
          Monica Skidmore added a comment - Thanks, Andrew! We're upgrading our version of Solr, and this will make our users very happy. I'm hoping it will be committed now...
          Hide
          William Harris added a comment - - edited

          Hey, Andrew. I'm that someone from IRC.
          I managed to apply your patch ( even though a single hunk failed ), but I get a NullPointerException when I attempt a query with 3 pivots.

          SEVERE: null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.PivotFacetHelper.getCountFromPath(PivotFacetHelper.java:373)
                  at org.apache.solr.handler.component.FacetComponent.processTopElement(FacetComponent.java:779)
                  ...
          
          Show
          William Harris added a comment - - edited Hey, Andrew. I'm that someone from IRC. I managed to apply your patch ( even though a single hunk failed ), but I get a NullPointerException when I attempt a query with 3 pivots. SEVERE: null :java.lang.NullPointerException at org.apache.solr.handler.component.PivotFacetHelper.getCountFromPath(PivotFacetHelper.java:373) at org.apache.solr.handler.component.FacetComponent.processTopElement(FacetComponent.java:779) ...
          Hide
          Andrew Muldowney added a comment -

          This patch applies cleanly to trunk for me, apologies as the last one upon review was flawed.

          Show
          Andrew Muldowney added a comment - This patch applies cleanly to trunk for me, apologies as the last one upon review was flawed.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12572555 ]
          Hide
          Chris Russell added a comment -

          I have created a unit test that causes an NPE on a 3 pivot request, I am taking a look now.

          Show
          Chris Russell added a comment - I have created a unit test that causes an NPE on a 3 pivot request, I am taking a look now.
          Hide
          Andrew Muldowney added a comment -

          Fixes the NPEs in 3pivot along with solving a hidden issue with null values when .missing was false and dealt with an issue where the facet.mincount is different from the facet.pivot.mincount (which is true for the default). The mincount issue only showed itself during index sorting.

          Show
          Andrew Muldowney added a comment - Fixes the NPEs in 3pivot along with solving a hidden issue with null values when .missing was false and dealt with an issue where the facet.mincount is different from the facet.pivot.mincount (which is true for the default). The mincount issue only showed itself during index sorting.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12573195 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.2 [ 12323893 ]
          Hide
          William Harris added a comment -

          I get the following error with the latest patch.

          org.apache.solr.search.SyntaxError: Expected identifier at pos 28 str='{!terms=xxxx xxxx 2}field1,field2,field3'
          

          As far as I can tell it seems to occur whenever it encounters values in field1 that end with an integer, or contain non-alphanumeric characters.

          Show
          William Harris added a comment - I get the following error with the latest patch. org.apache.solr.search.SyntaxError: Expected identifier at pos 28 str='{!terms=xxxx xxxx 2}field1,field2,field3' As far as I can tell it seems to occur whenever it encounters values in field1 that end with an integer, or contain non-alphanumeric characters.
          Hide
          Andrew Muldowney added a comment -

          The issue lies in how the refinement requests were formatted and how they were parsed on the shard side, I've made changes that should alleviate this issue and I'll push out a patch soon

          Show
          Andrew Muldowney added a comment - The issue lies in how the refinement requests were formatted and how they were parsed on the shard side, I've made changes that should alleviate this issue and I'll push out a patch soon
          Hide
          Shahar Davidson added a comment -

          Hi Andrew (and Chris), I just wanted to report a problem that we found in the patch from Jan 14th.

          In short, the problem seems to be related to facet.limit and the symptom is that a distributed pivot returns less terms than expected.

          Here's a simple scenario:
          If I run a (non-distributed) pivot such as:

          http://myHost:8999/solr/core-A/select?q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count

          then I would get N terms for field_A. (where, in my case, N is in the thousands)

          BUT, if I run a distributed pivot such as:

          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count

          then I would get at most 160 terms for field_A.
          (Why exactly 160?? I have no idea)

          On the other hand, if I use f.<field_name>.facet.limit=-1 then things work as expected. For example:

          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&f.field_A.facet.limit=-1&f.field_B.facet.limit=-1&facet.sort=count

          This will return exactly N terms for field_A as expected.

          I'll appreciate your help with this.

          Shahar.

          Show
          Shahar Davidson added a comment - Hi Andrew (and Chris), I just wanted to report a problem that we found in the patch from Jan 14th. In short, the problem seems to be related to facet.limit and the symptom is that a distributed pivot returns less terms than expected. Here's a simple scenario: If I run a (non-distributed) pivot such as: http://myHost:8999/solr/core-A/select?q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count then I would get N terms for field_A. (where, in my case, N is in the thousands) BUT, if I run a distributed pivot such as: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&facet.limit=-1&facet.sort=count then I would get at most 160 terms for field_A. (Why exactly 160?? I have no idea) On the other hand, if I use f.<field_name>.facet.limit=-1 then things work as expected. For example: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A,field_B&rows=0&f.field_A.facet.limit=-1&f.field_B.facet.limit=-1&facet.sort=count This will return exactly N terms for field_A as expected. I'll appreciate your help with this. Shahar.
          Hide
          Andrew Muldowney added a comment -

          The Jan 14th patch is totally missing the refinement step, so its hard for me to say what problem is causing your issue. Please download the newest patch and let me know if that problem continues.

          Show
          Andrew Muldowney added a comment - The Jan 14th patch is totally missing the refinement step, so its hard for me to say what problem is causing your issue. Please download the newest patch and let me know if that problem continues.
          Hide
          Vishal Deshmukh added a comment -

          Hi, we are looking for this feature desperately. can anyone give ETA on this? Thanks in advance.

          Show
          Vishal Deshmukh added a comment - Hi, we are looking for this feature desperately. can anyone give ETA on this? Thanks in advance.
          Hide
          Andrew Muldowney added a comment -

          This was built for 4_1_0 but git thinks it'll apply to trunk no problem.

          This solves a myriad of issues surrounding the formatting of the refinement requests, should support all field types and deals with jagged pivot facet result sets due to nulls or empty data on pivoted fields.

          Show
          Andrew Muldowney added a comment - This was built for 4_1_0 but git thinks it'll apply to trunk no problem. This solves a myriad of issues surrounding the formatting of the refinement requests, should support all field types and deals with jagged pivot facet result sets due to nulls or empty data on pivoted fields.
          Andrew Muldowney made changes -
          Attachment 0001-Pivot-Faceting-and-refinement.patch [ 12574798 ]
          Andrew Muldowney made changes -
          Attachment 0001-Pivot-Faceting-and-refinement.patch [ 12574798 ]
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12574862 ]
          Hide
          Stein J. Gran added a comment -

          Andrew, which version does the latest patch apply to? I've tried applying it to trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing this patch in a SolrCloud environment with lots of pivot facet queries.

          For trunk I get this:
          patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java'
          Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines).
          Hunk #2 FAILED at 374.
          1 out of 2 hunks FAILED – saving rejects to solr/core/src/java/org/apache/solr/
          request/SimpleFacets.java.rej

          The rej file seems similar for trunk and the 4.2.1 tag

          Show
          Stein J. Gran added a comment - Andrew, which version does the latest patch apply to? I've tried applying it to trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing this patch in a SolrCloud environment with lots of pivot facet queries. For trunk I get this: patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java' Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines). Hunk #2 FAILED at 374. 1 out of 2 hunks FAILED – saving rejects to solr/core/src/java/org/apache/solr/ request/SimpleFacets.java.rej The rej file seems similar for trunk and the 4.2.1 tag
          Hide
          Sviatoslav Lisenkin added a comment -

          Hello, everyone.
          I had applied the latest patch two weeks ago (rev.1465879), faced the issues with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them manually (just renaming). Simple pivot faceting via web UI and sample Solr installation with two nodes worked fine. I really appreciate if someone have a chance to test it under load etc.
          Hope, this patch (and feature) will be included in the upcoming release.

          Show
          Sviatoslav Lisenkin added a comment - Hello, everyone. I had applied the latest patch two weeks ago (rev.1465879), faced the issues with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them manually (just renaming). Simple pivot faceting via web UI and sample Solr installation with two nodes worked fine. I really appreciate if someone have a chance to test it under load etc. Hope, this patch (and feature) will be included in the upcoming release.
          Hide
          Andrew Muldowney added a comment -

          It was built for 4.1. There are a few application failures with 4_2 but nothing major from what I've seen.

          Show
          Andrew Muldowney added a comment - It was built for 4.1. There are a few application failures with 4_2 but nothing major from what I've seen.
          Hide
          Stein J. Gran added a comment -

          Sviatoslav and Andrew: Thank you, only small changes to the SimpleFacets.java file was necessary to get the patch in for the 4.2.1 tag.

          I have now been testing the patch in a small SolrCloud environment with two shards (-DnumShards=2), and I have found the following:
          1. Distributed pivot facets work great on string fields
          2. No values are returned if one of the facet.pivot fields is a date field

          For scenario 2:
          a) There are no error messages in the Solr log file

          b) The URL I use is http://localhost:8983/solr/coxitocollection/select?facet=true&facet.sort=true&q=*:*&facet.limit=1000&facet.pivot=dateday_datetime,firstreplytime

          c) If I add "&distrib=false" to the URL, I get values back

          d) The fields used are defined like this in schema.xml:
          <field name="dateday_datetime" type="date" indexed="true" stored="true" multiValued="false" />
          <field name="firstreplytime" type="int" stored="true" multiValued="false" />

          e) I tried using the tdate field instead of date, but this had no effect

          f) The date and tdate fields are defined like this in schema.xml:
          <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
          <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>

          g) If I run with -DnumShards=1 this scenario works great, both with and without "distrib=false"

          h) This was tested with 4.2.1 with the patch from March 21st with the following change: The non-existing variable "mincount" was replaced with "incomingMinCount" in SimpleFacets.java

          Show
          Stein J. Gran added a comment - Sviatoslav and Andrew: Thank you, only small changes to the SimpleFacets.java file was necessary to get the patch in for the 4.2.1 tag. I have now been testing the patch in a small SolrCloud environment with two shards (-DnumShards=2), and I have found the following: 1. Distributed pivot facets work great on string fields 2. No values are returned if one of the facet.pivot fields is a date field For scenario 2: a) There are no error messages in the Solr log file b) The URL I use is http://localhost:8983/solr/coxitocollection/select?facet=true&facet.sort=true&q=*:*&facet.limit=1000&facet.pivot=dateday_datetime,firstreplytime c) If I add "&distrib=false" to the URL, I get values back d) The fields used are defined like this in schema.xml: <field name="dateday_datetime" type="date" indexed="true" stored="true" multiValued="false" /> <field name="firstreplytime" type="int" stored="true" multiValued="false" /> e) I tried using the tdate field instead of date, but this had no effect f) The date and tdate fields are defined like this in schema.xml: <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/> g) If I run with -DnumShards=1 this scenario works great, both with and without "distrib=false" h) This was tested with 4.2.1 with the patch from March 21st with the following change: The non-existing variable "mincount" was replaced with "incomingMinCount" in SimpleFacets.java
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Hide
          Elran Dvir added a comment -

          Hi,

          I want to report a problem that we found in the patch of March 21st.
          It seems that the problem Shahar reported is now solved, but there is another similar problem.
          In short, the problem seems to be related to facet.limit per field definition and the symptom is that a distributed pivot returns less terms than expected.
          Here's a simple scenario:

          if I run a distributed pivot such as:
          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&facet.limit=-1&facet.sort=index

          it will return exactly number of terms for field_A as expected.

          On the other hand, if I use f.<field_name>.facet.limit=-1:
          http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&f.field_A.facet.limit=-1&facet.sort=index

          then it will return at most 100 terms for field_A.

          I'll appreciate your help with this.

          Thanks.

          Show
          Elran Dvir added a comment - Hi, I want to report a problem that we found in the patch of March 21st. It seems that the problem Shahar reported is now solved, but there is another similar problem. In short, the problem seems to be related to facet.limit per field definition and the symptom is that a distributed pivot returns less terms than expected. Here's a simple scenario: if I run a distributed pivot such as: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&facet.limit=-1&facet.sort=index it will return exactly number of terms for field_A as expected. On the other hand, if I use f.<field_name>.facet.limit=-1: http://myHost:8999/solr/core-B/select?shards=myHost:8999/solr/core-A&q=*:*&wt=xml&facet=true&facet.pivot=field_A&rows=0&f.field_A.facet.limit=-1&facet.sort=index then it will return at most 100 terms for field_A. I'll appreciate your help with this. Thanks.
          Hide
          Chris Russell added a comment -

          I will take a look.

          Show
          Chris Russell added a comment - I will take a look.
          Hide
          Otis Gospodnetic added a comment -

          Tons of votes and watchers on this issue!
          Chris Russell, Andrew Muldowney, and Trey Grainger - any luck with this by any chance? It would be great to get this in 4.4!

          Show
          Otis Gospodnetic added a comment - Tons of votes and watchers on this issue! Chris Russell , Andrew Muldowney , and Trey Grainger - any luck with this by any chance? It would be great to get this in 4.4!
          Hide
          Andrew Muldowney added a comment -

          Im working on this patch again, looking into the limit issue and the fact that exclusion tags aren't being respected. They both boil down to improperly formatted refinement requests, so I'm going through and cleaning those up to look more and more like the distributed field facet code. Should also have time to get to the datetime problem, where you cannot refine on datetimes because the datetime format returned by the shards is not queryable when refining.

          Show
          Andrew Muldowney added a comment - Im working on this patch again, looking into the limit issue and the fact that exclusion tags aren't being respected. They both boil down to improperly formatted refinement requests, so I'm going through and cleaning those up to look more and more like the distributed field facet code. Should also have time to get to the datetime problem, where you cannot refine on datetimes because the datetime format returned by the shards is not queryable when refining.
          Hide
          Trey Grainger added a comment -

          @Otis Gospodnetic, we have this patch live in production for several use cases (as a pre-requisite for SOLR-3583, which we've also worked on @CareerBuilder), but the currently known issues which would prevent this from being committed include:
          1) Tags and Excludes are not being respected beyond the first level
          2) The facet.limit=-1 issue (not returning all values)
          3) The lack of support for datetimes

          We need #1 and Andrew is working on a project currently to fix this. He's also looking to fix #3 and find a reasonably scalable solution to #2. I'm not sure when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks until this patch is all wrapped up.

          Meanwhile, if anyone else finds any issues with the patch, please let us know so they can be looked into. Thanks!

          Show
          Trey Grainger added a comment - @ Otis Gospodnetic , we have this patch live in production for several use cases (as a pre-requisite for SOLR-3583 , which we've also worked on @CareerBuilder), but the currently known issues which would prevent this from being committed include: 1) Tags and Excludes are not being respected beyond the first level 2) The facet.limit=-1 issue (not returning all values) 3) The lack of support for datetimes We need #1 and Andrew is working on a project currently to fix this. He's also looking to fix #3 and find a reasonably scalable solution to #2. I'm not sure when the Solr 4.4 vote is going to be, but it'll probably be a few more weeks until this patch is all wrapped up. Meanwhile, if anyone else finds any issues with the patch, please let us know so they can be looked into. Thanks!
          Otis Gospodnetic made changes -
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Hide
          Andrew Muldowney added a comment - - edited

          Built on 4_4
          This version fixes the following:

          1) Indecisive faceting not being respected on refinement queries
          2) Key not being respected
          3) Facet.offset not being respected
          4) datetimes breaking when trying to refine

          One point of contention is this:
          The SolrExampleTests.java (for the SolrJ stuff) had a check that required pivot facet boolean results as strict Boolean.TRUE as opposed to the string "true".

          This came about from the change that was required to fix datetime.

          I can't find anywhere else where we require a boolean field's value to equal Boolean.True so I think this test was just an artifact of how the original pivot facetting code was written.

          As it stands now the SolrExampleTests.doPivotFacetTest:1151 has been changed to "true" instead of Boolean.TRUE

          Show
          Andrew Muldowney added a comment - - edited Built on 4_4 This version fixes the following: 1) Indecisive faceting not being respected on refinement queries 2) Key not being respected 3) Facet.offset not being respected 4) datetimes breaking when trying to refine One point of contention is this: The SolrExampleTests.java (for the SolrJ stuff) had a check that required pivot facet boolean results as strict Boolean.TRUE as opposed to the string "true". This came about from the change that was required to fix datetime. I can't find anywhere else where we require a boolean field's value to equal Boolean.True so I think this test was just an artifact of how the original pivot facetting code was written. As it stands now the SolrExampleTests.doPivotFacetTest:1151 has been changed to "true" instead of Boolean.TRUE
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12593594 ]
          Hide
          Elran Dvir added a comment -

          Andrew, Thank you very much for the fix!

          Does this version fix the issue of f.field.facet.limit not being respected?

          Thanks.

          Show
          Elran Dvir added a comment - Andrew, Thank you very much for the fix! Does this version fix the issue of f.field.facet.limit not being respected? Thanks.
          Hide
          Andrew Muldowney added a comment - - edited

          Yes, it should

          Show
          Andrew Muldowney added a comment - - edited Yes, it should
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          I have tried applying latest patch to 4.2.1 and there were a few problems.
          Which version does it apply to?

          Thanks.

          Show
          Elran Dvir added a comment - Hi Andrew, I have tried applying latest patch to 4.2.1 and there were a few problems. Which version does it apply to? Thanks.
          Hide
          Otis Gospodnetic added a comment -

          Elran Dvir - didn't try applying it yet, but 99.9% sure it is/was the trunk.

          Show
          Otis Gospodnetic added a comment - Elran Dvir - didn't try applying it yet, but 99.9% sure it is/was the trunk.
          Hide
          Andrew Muldowney added a comment -

          I built it for 4_4.

          But I didn't have trouble patching it to 4_2_1. Did you pull from git or svn?

          Show
          Andrew Muldowney added a comment - I built it for 4_4. But I didn't have trouble patching it to 4_2_1. Did you pull from git or svn?
          Hide
          Elran Dvir added a comment -

          I have downloaded the source code from Solr's website.
          Then opened it with my IDE: Intellij.
          when I tried applying the patch, Intellij reported there were problems with some files.

          Thanks.

          Show
          Elran Dvir added a comment - I have downloaded the source code from Solr's website. Then opened it with my IDE: Intellij. when I tried applying the patch, Intellij reported there were problems with some files. Thanks.
          Hide
          Andrew Muldowney added a comment -

          Fixed an issue where commas in string fields would cause infinite refinement loops.

          Show
          Andrew Muldowney added a comment - Fixed an issue where commas in string fields would cause infinite refinement loops.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12594227 ]
          Hide
          Stein J. Gran added a comment -

          I have now re-tested the scenarios I used on April 10th (see my comment above from that date), and all of those issues I found then are now resolved I applied the July 25th patch to the lucene_solr_4_4 branch (Github) and performed the tests on this version.

          Well done Andrew Thumbs up from me.

          Show
          Stein J. Gran added a comment - I have now re-tested the scenarios I used on April 10th (see my comment above from that date), and all of those issues I found then are now resolved I applied the July 25th patch to the lucene_solr_4_4 branch (Github) and performed the tests on this version. Well done Andrew Thumbs up from me.
          Hide
          Andrew Muldowney added a comment -

          I've found a small error which causes largely sharded (30+) data to spiral out of control on refinement requests.

          I've fixed the error on a previous version of solr and I'll be forward porting it to my 4_4 build by tomorrow.

          If you are having issues with complex string fields this should help.

          Show
          Andrew Muldowney added a comment - I've found a small error which causes largely sharded (30+) data to spiral out of control on refinement requests. I've fixed the error on a previous version of solr and I'll be forward porting it to my 4_4 build by tomorrow. If you are having issues with complex string fields this should help.
          Hide
          Otis Gospodnetic added a comment -

          Andrew Muldowney - I didn't have time to link issues, but Joel Bernstein is working on at least one issue that is, if I recall correctly, an alternative implementation of this....

          Show
          Otis Gospodnetic added a comment - Andrew Muldowney - I didn't have time to link issues, but Joel Bernstein is working on at least one issue that is, if I recall correctly, an alternative implementation of this....
          Hide
          Andrew Muldowney added a comment -

          Fixed the run-away-but-eventually-coalesing refinement query issue

          At this point all known issues have been resolved.

          Show
          Andrew Muldowney added a comment - Fixed the run-away-but-eventually-coalesing refinement query issue At this point all known issues have been resolved.
          Andrew Muldowney made changes -
          Attachment SOLR-2894.patch [ 12596728 ]
          Hide
          William Harris added a comment - - edited

          Hey, Andrew. Really appreciate the effort here!
          I am seeing this error with the latest patch.

          "error": {
              "msg": "java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format",
              "trace": "org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang.Thread.run(Thread.java:724)\nCaused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:110)\n\tat org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:407)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:155)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t... 1 more\n",
              "code": 500
            }
          

          Trying it with 2 pivots across 8 shards with a total of ~3 million docs.
          Not sure what's causing it, but let me know if I can do anything to help!

          Show
          William Harris added a comment - - edited Hey, Andrew. Really appreciate the effort here! I am seeing this error with the latest patch. "error" : { "msg" : "java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format" , "trace" : "org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang. Thread .run( Thread .java:724)\nCaused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format\n\tat org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:110)\n\tat org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:407)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:155)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:166)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\t... 1 more\n" , "code" : 500 } Trying it with 2 pivots across 8 shards with a total of ~3 million docs. Not sure what's causing it, but let me know if I can do anything to help!
          Hide
          Andrew Muldowney added a comment -

          What are your exact field types? What is your query?
          From what I'm seeing online that error is because a shard failed for some reason. Knowing why a shard failed will yield a much better error message.

          Show
          Andrew Muldowney added a comment - What are your exact field types? What is your query? From what I'm seeing online that error is because a shard failed for some reason. Knowing why a shard failed will yield a much better error message.
          Hide
          William Harris added a comment - - edited

          its a pretty simple query: q=<star>:<star>&facet=on&facet.pivot=fieldA,fieldB , both regular single valued solr.TextFields with solr.LowerCaseFilterFactory filters. All shards work well individually.
          I'm looking at the logs but unfortunately I'm not seeing any other error messages there.

          It works as long as I use less than 6 shards. With 6 or more it fails with that error, regardless of which shards I use.

          Show
          William Harris added a comment - - edited its a pretty simple query: q=<star>:<star>&facet=on&facet.pivot=fieldA,fieldB , both regular single valued solr.TextFields with solr.LowerCaseFilterFactory filters. All shards work well individually. I'm looking at the logs but unfortunately I'm not seeing any other error messages there. It works as long as I use less than 6 shards. With 6 or more it fails with that error, regardless of which shards I use.
          Hide
          Andrew Muldowney added a comment -

          It shouldn't be a shard amount issue, were running this patch on a 50 shard cluster over several servers with solid results.
          Try the shards.tolerant=true parameter on your distributed search? It supposedly includes error information if available.

          Show
          Andrew Muldowney added a comment - It shouldn't be a shard amount issue, were running this patch on a 50 shard cluster over several servers with solid results. Try the shards.tolerant=true parameter on your distributed search? It supposedly includes error information if available.
          Hide
          William Harris added a comment - - edited

          shards.tolerant=true did indeed yield a more descriptive error:

          ERROR - 2013-08-21 12:54:17.392; org.apache.solr.common.SolrException; null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.FacetComponent.refinePivotFacets(FacetComponent.java:882)
                  at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:411)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
                  at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
                  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
                  at java.lang.Thread.run(Thread.java:724
          

          I also reindexed everything replacing the values of all string fields with their corresponding hashes in order to see if the error could be caused by some odd strings, but the same error occurs.
          I am also seeing this error after i switched to MD5 hashes for document IDs:

          ERROR - 2013-08-22 14:28:25.248; org.apache.solr.common.SolrException; null:java.lang.NullPointerException
                  at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:903)
                  at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649)
                  at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
                  at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
                  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
                  at java.lang.Thread.run(Thread.java:724
          
          Show
          William Harris added a comment - - edited shards.tolerant=true did indeed yield a more descriptive error: ERROR - 2013-08-21 12:54:17.392; org.apache.solr.common.SolrException; null :java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refinePivotFacets(FacetComponent.java:882) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:411) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang. Thread .run( Thread .java:724 I also reindexed everything replacing the values of all string fields with their corresponding hashes in order to see if the error could be caused by some odd strings, but the same error occurs. I am also seeing this error after i switched to MD5 hashes for document IDs: ERROR - 2013-08-22 14:28:25.248; org.apache.solr.common.SolrException; null :java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:903) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:649) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1850) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang. Thread .run( Thread .java:724
          Hide
          Andrew Muldowney added a comment -

          The first error occurs on a line where the refinement response is being mined for its information. The line asks for the value and it gets an NPE. Does your data contain nulls? I have code in to deal with that situation but its possible I'm missing an edge case. Do you have any suggestions for a test case that would create this error?

          The second error never gets to anything I've changed so I think MD5ing your docIDs is causing all sorts of other issues unrelated to this patch.

          Show
          Andrew Muldowney added a comment - The first error occurs on a line where the refinement response is being mined for its information. The line asks for the value and it gets an NPE. Does your data contain nulls? I have code in to deal with that situation but its possible I'm missing an edge case. Do you have any suggestions for a test case that would create this error? The second error never gets to anything I've changed so I think MD5ing your docIDs is causing all sorts of other issues unrelated to this patch.
          Hide
          William Harris added a comment - - edited

          I thought the issue might be related to me not assigning those fields values in every document, but I tried reindexing giving them all values and the error still occurs.
          I tampered with the source a bit, and managed to trace the error to srsp.getSolrResponse().getResponse(), meaning getSolrResponse() is returning null. Hope that helps.

          Show
          William Harris added a comment - - edited I thought the issue might be related to me not assigning those fields values in every document, but I tried reindexing giving them all values and the error still occurs. I tampered with the source a bit, and managed to trace the error to srsp.getSolrResponse().getResponse(), meaning getSolrResponse() is returning null. Hope that helps.
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          in class PivotFacetProcessor in function doPivots, I have noticed a change in the code.
          the current implementation is:
          SimpleOrderedMap<Object> pivot = new SimpleOrderedMap<Object>();
          pivot.add( "field", field );
          if (null == fieldValue)

          { pivot.add( "value", null ); }

          else

          { termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( "value", fieldValue ); }

          pivot.add( "count", kv.getValue() );

          It means we are getting the values as strings in pivots.

          In the past the implementation was "pivot.add( "value", ftype.toObject(sfield, termval))" - meaning we were getting the values as objects.
          I am using SolrJ and it's very important to me to get the values as objects.

          Is there a reason not returning the values as objects?

          Thank you very much.

          Show
          Elran Dvir added a comment - Hi Andrew, in class PivotFacetProcessor in function doPivots, I have noticed a change in the code. the current implementation is: SimpleOrderedMap<Object> pivot = new SimpleOrderedMap<Object>(); pivot.add( "field", field ); if (null == fieldValue) { pivot.add( "value", null ); } else { termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( "value", fieldValue ); } pivot.add( "count", kv.getValue() ); It means we are getting the values as strings in pivots. In the past the implementation was "pivot.add( "value", ftype.toObject(sfield, termval))" - meaning we were getting the values as objects. I am using SolrJ and it's very important to me to get the values as objects. Is there a reason not returning the values as objects? Thank you very much.
          Hide
          Andrew Muldowney added a comment -

          I wrote about that on 22/Jul/13 22:15.

          By having it .ToObject it was taking an internal datetime reference and giving me the pretty format, which was nonconvertible when trying to do refinement on that value. I'm not sure where the .ToObject call should go now as the datetimes in facets never seem to come out in that pretty format.

          Show
          Andrew Muldowney added a comment - I wrote about that on 22/Jul/13 22:15. By having it .ToObject it was taking an internal datetime reference and giving me the pretty format, which was nonconvertible when trying to do refinement on that value. I'm not sure where the .ToObject call should go now as the datetimes in facets never seem to come out in that pretty format.
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 4.5 [ 12324743 ]
          Hide
          Elran Dvir added a comment -

          Hi Andrew,

          Sorry for the long delay.
          I am still seeing the issue I reported on 20/May/13 12:27 (f.field_A.facet.limit=-1 returns at most 100 terms for field_A).

          Also, can you please dircet me to the line of code where datetimes are breaking when trying to refine (caused by "pivot.add( "value", ftype.toObject(sfield, termval))")
          I need pivot to return values as objects.

          Thank you very much.

          Show
          Elran Dvir added a comment - Hi Andrew, Sorry for the long delay. I am still seeing the issue I reported on 20/May/13 12:27 (f.field_A.facet.limit=-1 returns at most 100 terms for field_A). Also, can you please dircet me to the line of code where datetimes are breaking when trying to refine (caused by "pivot.add( "value", ftype.toObject(sfield, termval))") I need pivot to return values as objects. Thank you very much.
          Hide
          Andrew Muldowney added a comment -

          By returning an object you get a pretty date format that does not work when doing:
          getListedTermCounts(field,firstFieldsValues); -PivotFacetProcessor::104

          When it attempts to convert the pretty date it has into the datatype of the field it will fail to do so.

          Show
          Andrew Muldowney added a comment - By returning an object you get a pretty date format that does not work when doing: getListedTermCounts(field,firstFieldsValues); -PivotFacetProcessor::104 When it attempts to convert the pretty date it has into the datatype of the field it will fail to do so.
          Hide
          Elran Dvir added a comment -

          I exained the code.
          It seems that the refinement process occures before doPivots (where the call "ftype.toObject(sfield, termval))" was).
          So it seems toObject shouldn't affect the refinement process .
          What am I missing?

          Thanks.

          Show
          Elran Dvir added a comment - I exained the code. It seems that the refinement process occures before doPivots (where the call "ftype.toObject(sfield, termval))" was). So it seems toObject shouldn't affect the refinement process . What am I missing? Thanks.
          Hide
          Andrew Muldowney added a comment -

          DoPivots is called on every shard once to get each shard's response. It has built into it support for refinement but this first run through has no refinement information yet.

          When the master box sends out its refinement requests, DoPivots is run again on the shards recieving requests, at this point it utilizes the refinement steps DoPivots has, and blows up on dates, because instead of getting something like "1995-12-31T23:59:59Z" it gets "Tuesday December 12th, 1995 at 23:59". The latter does not convert to the former, so the entire thing blows up.

          Show
          Andrew Muldowney added a comment - DoPivots is called on every shard once to get each shard's response. It has built into it support for refinement but this first run through has no refinement information yet. When the master box sends out its refinement requests, DoPivots is run again on the shards recieving requests, at this point it utilizes the refinement steps DoPivots has, and blows up on dates, because instead of getting something like "1995-12-31T23:59:59Z" it gets "Tuesday December 12th, 1995 at 23:59". The latter does not convert to the former, so the entire thing blows up.
          Hide
          Elran Dvir added a comment -

          I didn't manage to make ditributed pivot on date field to blow up with toObject.
          Can you please attach an example query that blows Solr up and I'll adjust it to my environment?

          Thanks.

          Show
          Elran Dvir added a comment - I didn't manage to make ditributed pivot on date field to blow up with toObject. Can you please attach an example query that blows Solr up and I'll adjust it to my environment? Thanks.
          Uwe Schindler made changes -
          Fix Version/s 4.7 [ 12325573 ]
          Fix Version/s 4.6 [ 12325000 ]
          Hide
          Trey Grainger added a comment -

          FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter.

          Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week.

          Show
          Trey Grainger added a comment - FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter. Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week.
          Hide
          Yonik Seeley added a comment -

          Sweet, nice work Trey!

          Show
          Yonik Seeley added a comment - Sweet, nice work Trey!
          Hide
          Trey Grainger added a comment -

          Thanks, Yonik. I worked on the architecture and design, but it's really been a team effort by several of us at CB. Chris worked with the initial patch, Andrew hardened it, and Brett (who will post the next version) focused on the soon-to-be-posted performance optimizations. We're deploying the new version to production right now to sanity check it before posting the patch, but I think the upcoming version will finally be ready for review for committing.

          Show
          Trey Grainger added a comment - Thanks, Yonik. I worked on the architecture and design, but it's really been a team effort by several of us at CB. Chris worked with the initial patch, Andrew hardened it, and Brett (who will post the next version) focused on the soon-to-be-posted performance optimizations. We're deploying the new version to production right now to sanity check it before posting the patch, but I think the upcoming version will finally be ready for review for committing.
          Hide
          Brett Lucey added a comment -

          This is the updated version of our implementation of Pivot Facets, as mentioned by Trey. We have significantly improved performance for cases which involve a large number of shards through changing the underlying data structure and the way that data from the shards is merged together.

          Show
          Brett Lucey added a comment - This is the updated version of our implementation of Pivot Facets, as mentioned by Trey. We have significantly improved performance for cases which involve a large number of shards through changing the underlying data structure and the way that data from the shards is merged together.
          Brett Lucey made changes -
          Attachment SOLR-2894.patch [ 12629612 ]
          Hide
          Elran Dvir added a comment -

          Brett (and your team), thank you very much for your hard work. We've been having a lot of performance issues with using the previous version patch (which we love), and initial testing shows they are now resolved.

          We did also notice a few issues:

          1) f.field.facet.limit=-1 is not being respected (as reported by me on 20/May/13 10:27)

          2) pivot queries are returning String instead of Object (as reported by me on 25/Aug/13 07:38) except for boolean fields.
          I know the reason is there was a problem with datetime fields. I changed it back to toObject and I can't reproduce any issues running unit test locally via maven.
          I'd be glad to help fix this problem, if anyone can create a simple test case that fails ?

          3) The following query throws an exception:

          q=:&rows=0&f.fieldA.facet.sort=index&f.fieldA.facet.limit=-1&f.fieldA.facet.missing=true&f.fieldA.facet.mincount=1&f.fieldB.facet.sort=index&f.fieldB.facet.limit=-1&f.fieldB.facet.missing=true&f.fieldB.facet.mincount=1&facet=true&facet.pivot=fieldA,fieldB&shards=127.0.0.1:8983/solr/shardA,127.0.0.1:8983/solr/shardB

          java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.ArrayList.subListRangeCheck(ArrayList.java:975) at java.util.ArrayList.subList(ArrayList.java:965) at org.apache.solr.handler.component.PivotFacetField.refineNextLevelOfFacets(PivotFacetField.java:276) at org.apache.solr.handler.component.PivotFacetField.queuePivotRefinementRequests(PivotFacetField.java:231) at org.apache.solr.handler.component.PivotFacet.queuePivotRefinementRequests(PivotFacet.java:86) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:565) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:413) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:804)

          Thank you again for this great patch!

          Show
          Elran Dvir added a comment - Brett (and your team), thank you very much for your hard work. We've been having a lot of performance issues with using the previous version patch (which we love), and initial testing shows they are now resolved. We did also notice a few issues: 1) f.field.facet.limit=-1 is not being respected (as reported by me on 20/May/13 10:27) 2) pivot queries are returning String instead of Object (as reported by me on 25/Aug/13 07:38) except for boolean fields. I know the reason is there was a problem with datetime fields. I changed it back to toObject and I can't reproduce any issues running unit test locally via maven. I'd be glad to help fix this problem, if anyone can create a simple test case that fails ? 3) The following query throws an exception: q= : &rows=0&f.fieldA.facet.sort=index&f.fieldA.facet.limit=-1&f.fieldA.facet.missing=true&f.fieldA.facet.mincount=1&f.fieldB.facet.sort=index&f.fieldB.facet.limit=-1&f.fieldB.facet.missing=true&f.fieldB.facet.mincount=1&facet=true&facet.pivot=fieldA,fieldB&shards=127.0.0.1:8983/solr/shardA,127.0.0.1:8983/solr/shardB java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.ArrayList.subListRangeCheck(ArrayList.java:975) at java.util.ArrayList.subList(ArrayList.java:965) at org.apache.solr.handler.component.PivotFacetField.refineNextLevelOfFacets(PivotFacetField.java:276) at org.apache.solr.handler.component.PivotFacetField.queuePivotRefinementRequests(PivotFacetField.java:231) at org.apache.solr.handler.component.PivotFacet.queuePivotRefinementRequests(PivotFacet.java:86) at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:565) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:413) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:804) Thank you again for this great patch!
          Hide
          Chris Russell added a comment -

          We are at DevNexus today and tomorrow, but Brett, Andrew and I will look at it on Weds. Thanks for the feedback.

          Show
          Chris Russell added a comment - We are at DevNexus today and tomorrow, but Brett, Andrew and I will look at it on Weds. Thanks for the feedback.
          Hide
          Brett Lucey added a comment -

          Elran – Thanks for the feedback. I'm glad to hear the performance issues have been fixed. The exception is being thrown because the facet limit is -1 which causes an attempt to allocate a negatively sized sub list. If you need a work around for the short term, just avoid using that for the facet limit. I will try to fix that this week.

          Show
          Brett Lucey added a comment - Elran – Thanks for the feedback. I'm glad to hear the performance issues have been fixed. The exception is being thrown because the facet limit is -1 which causes an attempt to allocate a negatively sized sub list. If you need a work around for the short term, just avoid using that for the facet limit. I will try to fix that this week.
          Hide
          Brett Lucey added a comment -

          This is an update to the previous patch I uploaded which excludes whitespace changes and eliminates dead code. This does not yet include a fix for the -1 facet limit.

          Show
          Brett Lucey added a comment - This is an update to the previous patch I uploaded which excludes whitespace changes and eliminates dead code. This does not yet include a fix for the -1 facet limit.
          Brett Lucey made changes -
          Attachment SOLR-2894.patch [ 12630958 ]
          Hide
          Elran Dvir added a comment -

          By the way, when I profiled our very slow distributed pivot, I noticed most of the time is wasted in val.get in trimExcessValuesBasedUponFacetLimitAndOffset in PivotFacetHelper.java.
          The following change (in the first two lines of the function) has shown a significant improvement:
          List<NamedList<Object>> newVal = new LinkedList<NamedList<Object>>();
          if (val == null) return val;
          to:
          if (val == null) return val;
          List<NamedList<Object>> newVal = new ArrayList<NamedList<Object>>(val.size());

          Thanks.

          Show
          Elran Dvir added a comment - By the way, when I profiled our very slow distributed pivot, I noticed most of the time is wasted in val.get in trimExcessValuesBasedUponFacetLimitAndOffset in PivotFacetHelper.java. The following change (in the first two lines of the function) has shown a significant improvement: List<NamedList<Object>> newVal = new LinkedList<NamedList<Object>>(); if (val == null) return val; to: if (val == null) return val; List<NamedList<Object>> newVal = new ArrayList<NamedList<Object>>(val.size()); Thanks.
          Hide
          Brett Lucey added a comment -

          Hi Elran,

          Regarding the string/object issue: We have not been able to revert back to toObject for all data types because doing so results in the following exception being thrown by the testDistribSearch case of DistributedFacetPivotTest.
          org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid Date String:'Sat Sep 01 08:30:00 BOT 2012

          This is part of the test case which demonstrates the issue:
          //datetime
          this.query( "q", ":",
          "rows", "0",
          "facet","true",
          "facet.pivot","hiredate_dt,place_s,company_t",
          "f.hiredate_dt.facet.limit","2",
          "f.hiredate_dt.facet.offset","1",
          FacetParams.FACET_LIMIT, "4"); //test default sort (count)

          I am producing that error by running:
          ant -Dtests.class="org.apache.solr.handler.component.DistributedFacetPivotTest" clean test

          Show
          Brett Lucey added a comment - Hi Elran, Regarding the string/object issue: We have not been able to revert back to toObject for all data types because doing so results in the following exception being thrown by the testDistribSearch case of DistributedFacetPivotTest. org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid Date String:'Sat Sep 01 08:30:00 BOT 2012 This is part of the test case which demonstrates the issue: //datetime this.query( "q", " : ", "rows", "0", "facet","true", "facet.pivot","hiredate_dt,place_s,company_t", "f.hiredate_dt.facet.limit","2", "f.hiredate_dt.facet.offset","1", FacetParams.FACET_LIMIT, "4"); //test default sort (count) I am producing that error by running: ant -Dtests.class="org.apache.solr.handler.component.DistributedFacetPivotTest" clean test
          Hide
          Brett Lucey added a comment -

          We have been trying to finish up the distributed pivot facet patch to submit it for committing, but we've run into one issue raised in the JIRA issue for the patch.

          Distributed field facets appear to convert all values to strings, and refine on those strings when needed. The solrj test cases around the distributed field facets also expect all values to be represented as strings. Please correct me if I'm wrong.

          The most recent revision of the patch for SOLR-2894 converts all values to strings as well, except in the case of booleans. If we don't have this exception, the solrj test case of distributed pivot facets will fail because there is an assert verifying that the value returned is Boolean.TRUE rather than "true". If we do not convert object values to strings, we are unable to refine on date time's because the format will be incorrect and the date time class is unable to convert it back.

          Our feeling is that this assert is incorrect and that it should be looking for the string true. Changing this test case would allow us to bring the behavior of distributed pivot facets into consistency with the behavior of distributed field facets. We didn't want to just go ahead and make that change because it would change the behavior of solrj.

          It appears our options are:
          1. Convert all values to strings. Modify the solrj test case to expect this. We feel this makes things consistent and is the ideal option.
          2. Convert all values exception booleans to strings.
          3. Leave all values except datettimes as is.

          We are just looking for some input from the community prior to adding this change to our patch.

          Show
          Brett Lucey added a comment - We have been trying to finish up the distributed pivot facet patch to submit it for committing, but we've run into one issue raised in the JIRA issue for the patch. Distributed field facets appear to convert all values to strings, and refine on those strings when needed. The solrj test cases around the distributed field facets also expect all values to be represented as strings. Please correct me if I'm wrong. The most recent revision of the patch for SOLR-2894 converts all values to strings as well, except in the case of booleans. If we don't have this exception, the solrj test case of distributed pivot facets will fail because there is an assert verifying that the value returned is Boolean.TRUE rather than "true". If we do not convert object values to strings, we are unable to refine on date time's because the format will be incorrect and the date time class is unable to convert it back. Our feeling is that this assert is incorrect and that it should be looking for the string true. Changing this test case would allow us to bring the behavior of distributed pivot facets into consistency with the behavior of distributed field facets. We didn't want to just go ahead and make that change because it would change the behavior of solrj. It appears our options are: 1. Convert all values to strings. Modify the solrj test case to expect this. We feel this makes things consistent and is the ideal option. 2. Convert all values exception booleans to strings. 3. Leave all values except datettimes as is. We are just looking for some input from the community prior to adding this change to our patch.
          Hide
          Elran Dvir added a comment -

          I think we should preserve object values in disruibuted pivot, as in regular pivot.
          I want to help fix the toObject problem with datetime fields.
          I tried to apply the most recent patch to the latest Solr trunk revision. There were some problems applying it.
          Can you please create a new patch against latest Solr trunk revision, or indicate which revision the patch was created againt?

          Thanks!

          Show
          Elran Dvir added a comment - I think we should preserve object values in disruibuted pivot, as in regular pivot. I want to help fix the toObject problem with datetime fields. I tried to apply the most recent patch to the latest Solr trunk revision. There were some problems applying it. Can you please create a new patch against latest Solr trunk revision, or indicate which revision the patch was created againt? Thanks!
          Hide
          Brett Lucey added a comment -

          This is an updated patch which should apply cleanly against trunk. I've used this against revision 885cdea13918fee0c49d5ac0c5fa1fd286d5b466.

          This should include a fix for the unlimited facet that Elran brought up. It does not address the toObject issue being discussed.

          Does anyone have additional input or thoughts as to which route to go with the toObject/string issue?

          Show
          Brett Lucey added a comment - This is an updated patch which should apply cleanly against trunk. I've used this against revision 885cdea13918fee0c49d5ac0c5fa1fd286d5b466. This should include a fix for the unlimited facet that Elran brought up. It does not address the toObject issue being discussed. Does anyone have additional input or thoughts as to which route to go with the toObject/string issue?
          Brett Lucey made changes -
          Attachment SOLR-2894.patch [ 12632849 ]
          Hide
          Elran Dvir added a comment -

          Thanks for the patch.
          I will take a lool at the toObject problem with datetime fields.
          Does the patch fix issues 1 and 3 I reported on February 24?

          Thanks.

          Show
          Elran Dvir added a comment - Thanks for the patch. I will take a lool at the toObject problem with datetime fields. Does the patch fix issues 1 and 3 I reported on February 24? Thanks.
          Hide
          Brett Lucey added a comment -

          Elran - A facet limit of -1 in distributed pivot facets is not a use case we use in our environment, but we did go ahead and make the fixes in order to support the community. I've tested the changes locally on a box with success and added unit tests around it, but we have not yet deployed those changes to a production cluster. The exception you were seeing was directly related to the facet limit being negative, and that has been fixed in the patch I uploaded yesterday.

          Show
          Brett Lucey added a comment - Elran - A facet limit of -1 in distributed pivot facets is not a use case we use in our environment, but we did go ahead and make the fixes in order to support the community. I've tested the changes locally on a box with success and added unit tests around it, but we have not yet deployed those changes to a production cluster. The exception you were seeing was directly related to the facet limit being negative, and that has been fixed in the patch I uploaded yesterday.
          Hide
          Elran Dvir added a comment -

          I think I solved the the toObject problem with datetime fields.
          Please see the patch attached.
          All tests pass now.
          Let me know what you think.
          Thanks.

          Show
          Elran Dvir added a comment - I think I solved the the toObject problem with datetime fields. Please see the patch attached. All tests pass now. Let me know what you think. Thanks.
          Elran Dvir made changes -
          Attachment dateToObject.patch [ 12633586 ]
          Hide
          Elran Dvir added a comment -

          I have checked the latest patch.
          Problem 3 (field with negative limit threw exception) is now solved. Thanks!
          But I still see problem 1 (f.field.facet.limit=-1 is not being respected).

          Thank you very much.

          Show
          Elran Dvir added a comment - I have checked the latest patch. Problem 3 (field with negative limit threw exception) is now solved. Thanks! But I still see problem 1 (f.field.facet.limit=-1 is not being respected). Thank you very much.
          Hide
          Brett Lucey added a comment -

          Elran - Can you give me an example test case or query for which the -1 facet limit fails? I'll be glad to take a look and fix it if I can reproduce an issue with it.

          Show
          Brett Lucey added a comment - Elran - Can you give me an example test case or query for which the -1 facet limit fails? I'll be glad to take a look and fix it if I can reproduce an issue with it.
          Hide
          Elran Dvir added a comment -

          Hi,

          I don't know where I should exactly put the test in DistributedFacetPivotTest, but this the test:
          1)index more than 100 docs (you can index docs only with id)
          2)run the following query:
          this.query( "q", ":",
          "rows", "0",
          "facet","true",
          "facet.pivot","id",
          "f.id.facet.limit","-1");

          you expect to get as many ids as you indexed, but you will get only 100.

          Thanks.

          Show
          Elran Dvir added a comment - Hi, I don't know where I should exactly put the test in DistributedFacetPivotTest, but this the test: 1)index more than 100 docs (you can index docs only with id) 2)run the following query: this.query( "q", " : ", "rows", "0", "facet","true", "facet.pivot","id", "f.id.facet.limit","-1"); you expect to get as many ids as you indexed, but you will get only 100. Thanks.
          Hide
          Chris Russell added a comment -

          Elran, interesting, does that happen if you use facet.limit=-1 instead of the f.fieldname.facet.limit syntax? I am wondering if some code is checking the global limit but not the per-field limit.

          Show
          Chris Russell added a comment - Elran, interesting, does that happen if you use facet.limit=-1 instead of the f.fieldname.facet.limit syntax? I am wondering if some code is checking the global limit but not the per-field limit.
          Hide
          Elran Dvir added a comment -

          No.
          It doesn't happen when I use facet.limit=-1 instead of the f.fieldname.facet.limit syntax.

          Thanks.

          Show
          Elran Dvir added a comment - No. It doesn't happen when I use facet.limit=-1 instead of the f.fieldname.facet.limit syntax. Thanks.
          Hide
          Brett Lucey added a comment -

          I've uploaded the newest version of the patch. This includes a fix for the -1 facet limit when specified on a specific field and incorporates Elran's toObject fix.

          Show
          Brett Lucey added a comment - I've uploaded the newest version of the patch. This includes a fix for the -1 facet limit when specified on a specific field and incorporates Elran's toObject fix.
          Brett Lucey made changes -
          Attachment SOLR-2894.patch [ 12633988 ]
          Hide
          AJ Lemke added a comment -

          Thanks for the hard work on this.

          After applying the patch (to trunk) I noticed that the results for the pivot facet have changed from the example given here:
          http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1
          My results: http://pastebin.com/zs9rWMC5
          Is this expected behavior?

          Also I am getting "java.lang.String cannot be cast to org.apache.lucene.util.BytesRef" errors when sorting boolean fields.
          http://localhost:8983/solr/select?wt=json&q=*:*&fl=*, score&sort=inStock desc
          My results: http://pastebin.com/x9QnGfZA

          My environment: http://pastebin.com/AyMAwXD3

          Show
          AJ Lemke added a comment - Thanks for the hard work on this. After applying the patch (to trunk) I noticed that the results for the pivot facet have changed from the example given here: http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1 My results: http://pastebin.com/zs9rWMC5 Is this expected behavior? Also I am getting "java.lang.String cannot be cast to org.apache.lucene.util.BytesRef" errors when sorting boolean fields. http://localhost:8983/solr/select?wt=json&q=*:*&fl=* , score&sort=inStock desc My results: http://pastebin.com/x9QnGfZA My environment: http://pastebin.com/AyMAwXD3
          Hide
          Brett Lucey added a comment -

          AJ - It appears that the pivot output is different in trunk because the example data has changed. I manually checked it against the trunk example data and the counts you are seeing look to be correct. The wiki is just based on older example files.

          I tried your second query on a 4.2 box and a trunk box and neither of them had an exception thrown. Do you get that error with the same version of Solr without using this patch? If so, that might be something worth raising on the solr users list.

          Show
          Brett Lucey added a comment - AJ - It appears that the pivot output is different in trunk because the example data has changed. I manually checked it against the trunk example data and the counts you are seeing look to be correct. The wiki is just based on older example files. I tried your second query on a 4.2 box and a trunk box and neither of them had an exception thrown. Do you get that error with the same version of Solr without using this patch? If so, that might be something worth raising on the solr users list.
          Brett Lucey made changes -
          Comment [ Hi AJ - Hmn. I'll take a look and try to fix that/write some unit tests that map to those cases. ]
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326254 ]
          Fix Version/s 4.7 [ 12325573 ]
          Hide
          AJ Lemke added a comment - - edited

          Brett,

          When looking at the the output of the pivot, the data type has changed from a struct to an array.
          Is this the expected behavior or is the data type for each of the pivot facets to remain as a struct and I am experiencing a bug?

          Examples:
          4.7 pre patch:
          {
          field: "cat",
          value: "electronics",
          count: 12,
          pivot: [

          {field: "popularity",value: 7,count: 4}

          ,

          {field: "popularity",value: 6,count: 3}

          ,

          {field: "popularity",value: 1,count: 2}

          ,

          {field: "popularity",value: 0,count: 1}

          ,

          {field: "popularity",value: 5,count: 1}

          ,

          {field: "popularity",value: 10,count: 1}

          ]
          },

          Post Patch:
          [
          "field",
          "cat",
          "value",
          "electronics",
          "count",
          12,
          "pivot",
          [
          ["field","popularity","value",7,"count",4],
          ["field","popularity","value",6,"count",3],
          ["field","popularity","value",1,"count",2],
          ["field","popularity","value",0,"count",1],
          ["field","popularity","value",5,"count",1],
          ["field","popularity","value",10,"count",1]
          ]
          ],

          Edit: formatting.

          Show
          AJ Lemke added a comment - - edited Brett, When looking at the the output of the pivot, the data type has changed from a struct to an array. Is this the expected behavior or is the data type for each of the pivot facets to remain as a struct and I am experiencing a bug? Examples: 4.7 pre patch: { field: "cat", value: "electronics", count: 12, pivot: [ {field: "popularity",value: 7,count: 4} , {field: "popularity",value: 6,count: 3} , {field: "popularity",value: 1,count: 2} , {field: "popularity",value: 0,count: 1} , {field: "popularity",value: 5,count: 1} , {field: "popularity",value: 10,count: 1} ] }, Post Patch: [ "field", "cat", "value", "electronics", "count", 12, "pivot", [ ["field","popularity","value",7,"count",4] , ["field","popularity","value",6,"count",3] , ["field","popularity","value",1,"count",2] , ["field","popularity","value",0,"count",1] , ["field","popularity","value",5,"count",1] , ["field","popularity","value",10,"count",1] ] ], Edit: formatting.
          Hide
          Brett Lucey added a comment -

          Hi AJ,

          I just pulled the latest code from the repo, and when I patched the tip of the 4.7 branch with the SOLR-2894 patch, I am not able to reproduce this issue. (Revision 5981529c65d4ae671895948f43d8770daa58746b) This is the output I get using a pivot facet of cat,popularity. I used this url while running the example: http://localhost:8983/solr/select?wt=json&q=*:*&rows=0&facet=true&facet.pivot=cat,popularity

          [
          {
          "field": "cat",
          "value": "electronics",
          "count": 12,
          "pivot": [

          { "field": "popularity", "value": 7, "count": 4 }

          ,

          { "field": "popularity", "value": 6, "count": 3 }

          ,

          { "field": "popularity", "value": 1, "count": 2 }

          ,

          { "field": "popularity", "value": 0, "count": 1 }

          ,

          { "field": "popularity", "value": 5, "count": 1 }

          ,

          { "field": "popularity", "value": 10, "count": 1 }

          ]
          },
          ...
          ]

          Could you try with this same revision, applying only the SOLR-2894 patch and let me know if you see something different?

          -Brett

          Show
          Brett Lucey added a comment - Hi AJ, I just pulled the latest code from the repo, and when I patched the tip of the 4.7 branch with the SOLR-2894 patch, I am not able to reproduce this issue. (Revision 5981529c65d4ae671895948f43d8770daa58746b) This is the output I get using a pivot facet of cat,popularity. I used this url while running the example: http://localhost:8983/solr/select?wt=json&q=*:*&rows=0&facet=true&facet.pivot=cat,popularity [ { "field": "cat", "value": "electronics", "count": 12, "pivot": [ { "field": "popularity", "value": 7, "count": 4 } , { "field": "popularity", "value": 6, "count": 3 } , { "field": "popularity", "value": 1, "count": 2 } , { "field": "popularity", "value": 0, "count": 1 } , { "field": "popularity", "value": 5, "count": 1 } , { "field": "popularity", "value": 10, "count": 1 } ] }, ... ] Could you try with this same revision, applying only the SOLR-2894 patch and let me know if you see something different? -Brett
          Hide
          AJ Lemke added a comment -

          Hi Brett,

          I grabbed revision "5981529c65d4ae671895948f43d8770daa58746b" from the git repository and ran my tests.
          I am seeing still seeing the pivots as arrays rather than JSON objects.

          If I change numShards to 1 I see the the pivots as an array of objects.
          (http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=2&maxShardsPerNode=4)

              field: "cat",
              value: "electronics",
              count: 12,
              pivot: [
                  {field: "popularity",value: 7,count: 4},
                  {field: "popularity",value: 6,count: 3},
                  {field: "popularity",value: 1,count: 2},
                  {field: "popularity",value: 0,count: 1},
                  {field: "popularity",value: 5,count: 1},
                  {field: "popularity",value: 10,count: 1}
              ]
          

          If I change numShards to > 1 I see the the pivots as an array of arrays.
          (http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4)

              "field",
              "cat",
              "value",
              "electronics",
              "count",
              12,
              "pivot",
              [
                  ["field","popularity","value",7,"count",4],
                  ["field","popularity","value",6,"count",3],
                  ["field","popularity","value",1,"count",2],
                  ["field","popularity","value",0,"count",1],
                  ["field","popularity","value",5,"count",1],
                  ["field","popularity","value",10,"count",1]
              ]
          

          I am using Windows 7 as my test environment and starting with this batch file:

          Batch File
          echo Deleting Temp Data
          rmdir /s /q D:\dev\lucene-solr\solr\node1
          rmdir /s /q D:\dev\lucene-solr\solr\node2
          
          echo Copying the example folder
          xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node1
          xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node2
          
          echo removing the collection1 folder
          rmdir /s /q D:\dev\lucene-solr\solr\node1\solr\collection1
          rmdir /s /q D:\dev\lucene-solr\solr\node2\solr\collection1
          
          echo Starting Solr Processes
          cd D:\dev\lucene-solr\solr\node1
          start cmd /c java -DzkRun -jar start.jar
          sleep 10
          
          cd D:\dev\lucene-solr\solr\node2
          start cmd /c java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar
          
          echo All Solr Processes started
          
          sleep 10
          
          cd D:\dev\lucene-solr\solr\example\scripts\cloud-scripts
          cmd /c zkcli.bat -zkhost localhost:9983 -cmd upconfig -confdir D:\dev\lucene-solr\solr\example\solr\collection1\conf -confname collection1
          cmd /c zkcli.bat -zkhost localhost:9983 -cmd linkconfig -collection collection1 -confname collection1
          echo Zookeeper updated to contain collection1
          
          cmd /c curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4"
          cmd /c curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=collection1"
          
          sleep 10
          
          cd D:\dev\lucene-solr\solr\example\exampledocs
          cmd /c java -Dtype=text/xml -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.xml
          cmd /c java -Dtype=text/csv -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.csv
          cmd /c java -Dtype=application/json -Doptimize=yes -Durl=http://localhost:8983/solr/collection1/update -jar post.jar *.json
          echo Collection1 has been bootstrapped and optimized
          

          Could you show me how you are starting yours?

          AJ

          Show
          AJ Lemke added a comment - Hi Brett, I grabbed revision "5981529c65d4ae671895948f43d8770daa58746b" from the git repository and ran my tests. I am seeing still seeing the pivots as arrays rather than JSON objects. If I change numShards to 1 I see the the pivots as an array of objects. ( http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=1&replicationFactor=2&maxShardsPerNode=4 ) field: "cat" , value: "electronics" , count: 12, pivot: [ {field: "popularity" ,value: 7,count: 4}, {field: "popularity" ,value: 6,count: 3}, {field: "popularity" ,value: 1,count: 2}, {field: "popularity" ,value: 0,count: 1}, {field: "popularity" ,value: 5,count: 1}, {field: "popularity" ,value: 10,count: 1} ] If I change numShards to > 1 I see the the pivots as an array of arrays. ( http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4 ) "field" , "cat" , "value" , "electronics" , "count" , 12, "pivot" , [ [ "field" , "popularity" , "value" ,7, "count" ,4], [ "field" , "popularity" , "value" ,6, "count" ,3], [ "field" , "popularity" , "value" ,1, "count" ,2], [ "field" , "popularity" , "value" ,0, "count" ,1], [ "field" , "popularity" , "value" ,5, "count" ,1], [ "field" , "popularity" , "value" ,10, "count" ,1] ] I am using Windows 7 as my test environment and starting with this batch file: Batch File echo Deleting Temp Data rmdir /s /q D:\dev\lucene-solr\solr\node1 rmdir /s /q D:\dev\lucene-solr\solr\node2 echo Copying the example folder xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node1 xcopy /s /e /y /i D:\dev\lucene-solr\solr\example D:\dev\lucene-solr\solr\node2 echo removing the collection1 folder rmdir /s /q D:\dev\lucene-solr\solr\node1\solr\collection1 rmdir /s /q D:\dev\lucene-solr\solr\node2\solr\collection1 echo Starting Solr Processes cd D:\dev\lucene-solr\solr\node1 start cmd /c java -DzkRun -jar start.jar sleep 10 cd D:\dev\lucene-solr\solr\node2 start cmd /c java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar echo All Solr Processes started sleep 10 cd D:\dev\lucene-solr\solr\example\scripts\cloud-scripts cmd /c zkcli.bat -zkhost localhost:9983 -cmd upconfig -confdir D:\dev\lucene-solr\solr\example\solr\collection1\conf -confname collection1 cmd /c zkcli.bat -zkhost localhost:9983 -cmd linkconfig -collection collection1 -confname collection1 echo Zookeeper updated to contain collection1 cmd /c curl "http: //localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=2&maxShardsPerNode=4" cmd /c curl "http: //localhost:8983/solr/admin/collections?action=RELOAD&name=collection1" sleep 10 cd D:\dev\lucene-solr\solr\example\exampledocs cmd /c java -Dtype=text/xml -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.xml cmd /c java -Dtype=text/csv -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.csv cmd /c java -Dtype=application/json -Doptimize=yes -Durl=http: //localhost:8983/solr/collection1/update -jar post.jar *.json echo Collection1 has been bootstrapped and optimized Could you show me how you are starting yours? AJ
          Hide
          Brett Lucey added a comment - - edited

          Hi AJ,

          Thanks for all that info. It was super helpful and I've narrowed down the issue. If you'd like to fix it on your setup, you only need to make two changes to PivotFacetValue.java. At the top, add an import for org.apache.solr.common.util.SimpleOrderedMap, and in the convertToNamedList() function in that file, change this line:
          NamedList<Object> newList = new NamedList<Object>();
          to
          NamedList<Object> newList = new SimpleOrderedMap<Object>();

          I will be posting a new SOLR-2894 patch within the next day or two, but that patch will be for trunk and will likely not apply cleanly to the 4.7 branch. If you need this working in 4.7, make the above changes. If you are using trunk, then this fix will be incorporated into the upcoming patch.

          -Brett

          Show
          Brett Lucey added a comment - - edited Hi AJ, Thanks for all that info. It was super helpful and I've narrowed down the issue. If you'd like to fix it on your setup, you only need to make two changes to PivotFacetValue.java. At the top, add an import for org.apache.solr.common.util.SimpleOrderedMap, and in the convertToNamedList() function in that file, change this line: NamedList<Object> newList = new NamedList<Object>(); to NamedList<Object> newList = new SimpleOrderedMap<Object>(); I will be posting a new SOLR-2894 patch within the next day or two, but that patch will be for trunk and will likely not apply cleanly to the 4.7 branch. If you need this working in 4.7, make the above changes. If you are using trunk, then this fix will be incorporated into the upcoming patch. -Brett
          Hide
          Brett Lucey added a comment -

          I've uploaded a patch to include changes needed to patch against trunk. (Revision caccba783be7c9f4d7b25c992ed4c49e5a2bddf7). Additionally, this fixes the JSON output formatting issue discovered by AJ.

          Show
          Brett Lucey added a comment - I've uploaded a patch to include changes needed to patch against trunk. (Revision caccba783be7c9f4d7b25c992ed4c49e5a2bddf7). Additionally, this fixes the JSON output formatting issue discovered by AJ.
          Brett Lucey made changes -
          Attachment SOLR-2894.patch [ 12639070 ]
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Uwe Schindler made changes -
          Fix Version/s 4.9 [ 12326731 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.8 [ 12326254 ]
          Hide
          Trey Grainger added a comment -

          After nearly 2 years of on-and-off development, I think this patch is finally ready to be committed. Brett's most recent patch includes significant performance improvements as well as fixes to all of the reported issues and edge cases mentioned by the others currently using this patch. We have just finished a large spike of work to get this ready for commit, so I'd love to get it pushed in soon unless there are any objections.

          Erik Hatcher, do you have any time to review this for suitability to be committed (since you are the reporter)? If there is anything additional that needs to be changed, I'll happily sign us up (either myself or someone on my team at CareerBuilder) to do it it will help.

          Show
          Trey Grainger added a comment - After nearly 2 years of on-and-off development, I think this patch is finally ready to be committed. Brett's most recent patch includes significant performance improvements as well as fixes to all of the reported issues and edge cases mentioned by the others currently using this patch. We have just finished a large spike of work to get this ready for commit, so I'd love to get it pushed in soon unless there are any objections. Erik Hatcher , do you have any time to review this for suitability to be committed (since you are the reporter)? If there is anything additional that needs to be changed, I'll happily sign us up (either myself or someone on my team at CareerBuilder) to do it it will help.

            People

            • Assignee:
              Unassigned
              Reporter:
              Erik Hatcher
            • Votes:
              48 Vote for this issue
              Watchers:
              64 Start watching this issue

              Dates

              • Created:
                Updated:

                Development