Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 5.3, 6.0
    • Component/s: search
    • Labels:
      None

      Description

      Facet pivot provide hierarchical support for computing data used to populate a treemap or similar visualization. TreeMaps usually offer users extra information by applying an overlay color on top of the existing square sizes based on hierarchical counts. This second count is based on user choices, representing, usually with gradient, the proportion of the square that fits the user's choices.

      The proposition is to use local parameters to specify facet query to apply for pivot which matches a tag set on facet query. Parameter format would look like:

      facet.pivot=

      {!query=r1}

      category,manufacturer
      facet.query=

      {!tag=r1}somequery
      facet.query={!tag=r1}

      somedate:[NOW-1YEAR TO NOW]

      1. patch-4212.txt
        9 kB
        Steve Molloy
      2. SOLR-4212.patch
        97 kB
        Shalin Shekhar Mangar
      3. SOLR-4212.patch
        97 kB
        Steve Molloy
      4. SOLR-4212.patch
        98 kB
        Steve Molloy
      5. SOLR-4212.patch
        17 kB
        Steve Molloy
      6. SOLR-4212.patch
        16 kB
        Steve Molloy
      7. SOLR-4212.patch
        9 kB
        Steve Molloy
      8. SOLR-4212-multiple-q.patch
        58 kB
        Steve Molloy
      9. SOLR-4212-multiple-q.patch
        22 kB
        Steve Molloy
      10. SOLR-6353-4212.patch
        228 kB
        Shalin Shekhar Mangar
      11. SOLR-6353-4212.patch
        227 kB
        Shalin Shekhar Mangar
      12. SOLR-6353-4212.patch
        227 kB
        Shalin Shekhar Mangar
      13. SOLR-6353-4212.patch
        227 kB
        Shalin Shekhar Mangar
      14. SOLR-6353-4212.patch
        215 kB
        Hoss Man
      15. SOLR-6353-4212.patch
        211 kB
        Shalin Shekhar Mangar
      16. SOLR-6353-6686-4212.patch
        224 kB
        Shalin Shekhar Mangar
      17. SOLR-6353-6686-4212.patch
        213 kB
        Shalin Shekhar Mangar
      18. SOLR-6353-6686-4212.patch
        193 kB
        Shalin Shekhar Mangar
      19. SOLR-6353-6686-4212.patch
        190 kB
        Shalin Shekhar Mangar
      20. SOLR-6353-6686-4212.patch
        186 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Steve Molloy added a comment -

          Initial patch proposal.

          Show
          Steve Molloy added a comment - Initial patch proposal.
          Hide
          Otis Gospodnetic added a comment -

          Thanks Steve!
          Had a quick look at the patch - seems very small, very simple (1 new method), has a test, clean formatting and ASL in place. Would be great to have one of the more active Solr developers check this out and commit.

          Show
          Otis Gospodnetic added a comment - Thanks Steve! Had a quick look at the patch - seems very small, very simple (1 new method), has a test, clean formatting and ASL in place. Would be great to have one of the more active Solr developers check this out and commit.
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Steve Molloy added a comment -

          Adjust for latest patch for SOLR-2894 and 4.9 code.

          Show
          Steve Molloy added a comment - Adjust for latest patch for SOLR-2894 and 4.9 code.
          Hide
          Steve Molloy added a comment -

          Fix latest patch which wasn't properly merging q-counts from shards

          Show
          Steve Molloy added a comment - Fix latest patch which wasn't properly merging q-counts from shards
          Hide
          Steve Molloy added a comment -

          (right attchment this time) Adapted patch for 4.10 code and ensured all tests passed.

          Show
          Steve Molloy added a comment - (right attchment this time) Adapted patch for 4.10 code and ensured all tests passed.
          Hide
          Steve Molloy added a comment -

          Add support for multiple queries per field.

          Show
          Steve Molloy added a comment - Add support for multiple queries per field.
          Hide
          Steve Molloy added a comment -

          Add test and build map of parsed queries once

          Show
          Steve Molloy added a comment - Add test and build map of parsed queries once
          Hide
          Hoss Man added a comment -

          I haven't had a chance to review this patch, but in response to a dev@lucene thread about it...

          Base idea is to have something like: facet.pivot=field1,field2,field3f.field2.facet.pivot.q=somequery&f.field3.facet.pivot.q=somedate:[NOW-1YEAR TO NOW]&f.field3.facet.pivot.q=someotherquery ... Which would add results similar to facet queries, at the appropriate level in the pivots.

          From a functionality standpoint, what you are setting out to do here seems like a great idea – but personally i think that syntax looks really cumbersome?

          From a user API standpoint, It seems your gola here would gel really well with the syntax i proposed in SOLR-6348/(SOLR-6351,SOLR-6352,SOLR-6353) if you think about it in terms of "hanging query facets off of pivots" ... ie:

          facet.pivot={!query=r1}category,manufacturer
          facet.query={!tag=r1}somequery
          facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW]
          

          that seems like it might be cleaner, and fit better with some of the other ongoing work, what do you think?

          Show
          Hoss Man added a comment - I haven't had a chance to review this patch, but in response to a dev@lucene thread about it... Base idea is to have something like: facet.pivot=field1,field2,field3f.field2.facet.pivot.q=somequery&f.field3.facet.pivot.q=somedate:[NOW-1YEAR TO NOW]&f.field3.facet.pivot.q=someotherquery ... Which would add results similar to facet queries, at the appropriate level in the pivots. From a functionality standpoint, what you are setting out to do here seems like a great idea – but personally i think that syntax looks really cumbersome? From a user API standpoint, It seems your gola here would gel really well with the syntax i proposed in SOLR-6348 /( SOLR-6351 , SOLR-6352 , SOLR-6353 ) if you think about it in terms of "hanging query facets off of pivots" ... ie: facet.pivot={!query=r1}category,manufacturer facet.query={!tag=r1}somequery facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW] that seems like it might be cleaner, and fit better with some of the other ongoing work, what do you think?
          Hide
          Steve Molloy added a comment -

          That's what I'm starting to realize by looking into SOLR-6351... It makes a lot of sense, I'll try to adapt and see if I can get facet ranges (SOLR-6353) covered at the same time, they should be similar with your proposed approach.

          Show
          Steve Molloy added a comment - That's what I'm starting to realize by looking into SOLR-6351 ... It makes a lot of sense, I'll try to adapt and see if I can get facet ranges ( SOLR-6353 ) covered at the same time, they should be similar with your proposed approach.
          Hide
          Steve Molloy added a comment -

          Following approach under SOLR-6348, facet queries and facet reanges now included in facet pivots if their tag matches the localparam facet.query or facet.range of the facet.pivot request parameter. Added distributed unit tests for both as well as Solrj test additions. This also covers SOLR-6353.

          Show
          Steve Molloy added a comment - Following approach under SOLR-6348 , facet queries and facet reanges now included in facet pivots if their tag matches the localparam facet.query or facet.range of the facet.pivot request parameter. Added distributed unit tests for both as well as Solrj test additions. This also covers SOLR-6353 .
          Hide
          Steve Molloy added a comment -

          Small adjustments to fit latest changes in SOLR-6351.

          Show
          Steve Molloy added a comment - Small adjustments to fit latest changes in SOLR-6351 .
          Hide
          Shalin Shekhar Mangar added a comment -

          Here's a patch updated to trunk.

          Show
          Shalin Shekhar Mangar added a comment - Here's a patch updated to trunk.
          Hide
          Shalin Shekhar Mangar added a comment -

          I decided to combine the patches for SOLR-4212 and SOLR-6686 because the refactoring in SOLR-6686 is useful here as well.

          Changes:

          1. I borrowed the refactoring of SimpleFacets in SOLR-6686 here which simplifies things by a lot
          2. A new RangeFacetProcessor is refactored out of SimpleFacets on the same lines as PivotFacetProcessor
          3. FacetComponent does the work of adding range and date facets to the response
          4. The above breaks a bunch of tests which expected facet_counts to be ordered in a certain way so the patch sets the facet_counts to be unordered
          Show
          Shalin Shekhar Mangar added a comment - I decided to combine the patches for SOLR-4212 and SOLR-6686 because the refactoring in SOLR-6686 is useful here as well. Changes: I borrowed the refactoring of SimpleFacets in SOLR-6686 here which simplifies things by a lot A new RangeFacetProcessor is refactored out of SimpleFacets on the same lines as PivotFacetProcessor FacetComponent does the work of adding range and date facets to the response The above breaks a bunch of tests which expected facet_counts to be ordered in a certain way so the patch sets the facet_counts to be unordered
          Hide
          Steve Molloy added a comment -

          Shalin Shekhar Mangar Thanks for updating the patch. Took a quick look and seems good, will try to find time to actually try it tomorrow.

          Show
          Steve Molloy added a comment - Shalin Shekhar Mangar Thanks for updating the patch. Took a quick look and seems good, will try to find time to actually try it tomorrow.
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Further refactored and shared code between FacetComponent and PivotFacetField/PivotFacetHelper for range faceting
          2. Removed dependency on mahout-collections. I found that using it actually made the merging logic more complex involving reordering the facet results in the output
          3. Introduced a RangeFacetAccumulator which is actually quite simple and encapsulates all logic for tracking and merging range facet results from different shards and enforcing mincount requirements.
          4. I couldn't find a nice way to re-use merging logic for queries but since merging them is so simple, I let it be and just have a bit of duplicated code in PivotFacetHelper.

          All tests pass. A review would be appreciated.

          Show
          Shalin Shekhar Mangar added a comment - Further refactored and shared code between FacetComponent and PivotFacetField/PivotFacetHelper for range faceting Removed dependency on mahout-collections. I found that using it actually made the merging logic more complex involving reordering the facet results in the output Introduced a RangeFacetAccumulator which is actually quite simple and encapsulates all logic for tracking and merging range facet results from different shards and enforcing mincount requirements. I couldn't find a nice way to re-use merging logic for queries but since merging them is so simple, I let it be and just have a bit of duplicated code in PivotFacetHelper. All tests pass. A review would be appreciated.
          Hide
          Hoss Man added a comment -

          I didn't get a chance to review as in depth as i would have liked, but here are some comments based on a quick skim, followed by some more detailed review of a few classes...

          • it seems verbose and confusing to use "facet.range" and "facet.query" as the local params for this...
            • Confusing: these local params refer to tag names where as people might think they can inline arbitrary query expressions / field names to compute ranges on
            • verbose: why not just "ranges" and "queries" for the local param names, similar to how "stats" is already used?
          • I'm not a big fan of the fact that this patch breaks the determinism of the order that the different types of facets are returned in – It's probably not a blocker, but i suspect there may be some clients (or perhaps even client libraries other then SolrJ) which will be broken by this.
          • In the SolrJ PivotField class, this patch adds "NamedList<Object> getRanges()" and "NamedList<Integer> getQueryCounts()" methods. We should really be consistent with the existing equivalent methods in QueryResponse...
            • "Map<String,Integer>getFacetQuery()"
            • "List<RangeFacet> getFacetRanges()"
          • in PivotFacetValue: "// named list with objects because depending on how big the counts are we may get either a long or an int"
            • FWIW: I had seriously never noticed until today that these facet counts had variable type
            • why can't we just use "Number" instead of "Object" in this new code since that's the lavel all of the casting code that deals with this list seems to use anyway?
            • doesn't this break the SolrJ code if/when it returns a Long? (see above new method "NamedList<Integer> getQueryCounts()")
          • DateFacetProcessor
            • this entire class should be marked deprecated
            • maybe i'm missing something, but what exactly is the advantage of this subclassing RangeFacetProcessor? ... if they are sharing code it's not obvious to me, and if they aren't (intentionally) sharing code then this subclass relationship seems dangerous if/when future improvements to range faceting are made.
          • FacetComponent
            • why does doDistribRanges() still need to exist? why not just move that casting logic directly into RangeFacetAccumulator.mergeContributionFromShard like the rest of the code that use to be here and call it directly?
            • now that these skethy "num()" functions are no longer private, can we please get some javadocs on them.
          • PivotFacetProcessor
            • unless i'm missunderstanding the usage, the way addPivotQueriesAndRanges (and removeUnwantedQueriesAndRanges) works means that every facet.query and facet.range param value (with all localparams) is going to be reparsed over and over and over again for every unique value in every pivot field – just to check the "tag" values and see if it's one that should be computed for this pivot.
              • This seems really unneccessary – why not parse each param once into a simple datastructure (isn't that what the new ParsedParams" class is designed for?), and then put them in a map by tag available fro mthe request context – just like we did for the stats with StatsInfo.getStatsFieldsByTag(String) ?
              • in particular won't this slow down existing requests containing both facet.pivot and facet.range || facet.query) ... even if the later aren't tagged or hung off of the pivots at all? because they'll still get parsed over and over again won't they?
            • this logic also seems to completely break instances of facet.query used w/o linking it to a face.tpivot
              • http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query=inStock:true&facet=true&facet.pivot=manu_id_s,inStock
              • java.lang.NullPointerException
                	at org.apache.solr.handler.component.PivotFacetProcessor.removeUnwantedQueriesAndRanges(PivotFacetProcessor.java:399)
                	at org.apache.solr.handler.component.PivotFacetProcessor.addPivotQueriesAndRanges(PivotFacetProcessor.java:371)
                	at org.apache.solr.handler.component.PivotFacetProcessor.doPivots(PivotFacetProcessor.java:273)
                	at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:194)
                	at org.apache.solr.handler.component.PivotFacetProcessor.process(PivotFacetProcessor.java:135)
                	at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:121)
                	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
                	at 
                
            • also broken: neither of these requests should result in the facet.query hanging off of the pivots, but because of how StringUtils.contains() is used they both do erroniously...
              http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query={!tag=ttt}inStock:true&facet=true&facet.pivot={!facet.query=t}manu_id_s,inStock
              http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query={!tag=t}inStock:true&hang=&facet=true&facet.pivot={!facet.query=$hang}manu_id_s,inStock
              
          • tests...
            • a lot of code in the new test classes seems to have been copied verbatim from other existing tests – in some cases this is fine, because the copied test logic has been modified to include new params+asserts of the new functionality – but theres still a lot of redundent copy/past cruft w/o any logic changes
              • eg: DistributedFacetPivotQuerySmallTest lines 428-532 seem to be verbatim copied from DistributedFacetPivotSmallTest w/o any additions to test the racet.query logic, or even new negative-assertions that stra facet.queries are hung off by mistake (ie: to catch the bugs i mentioned above)
              • dito for DistributedFacetPivotRangeSmallTest
            • there doesn't seem to be any new tests that show hanging both ranges & queries onto a pivot at the same time – let alone combining it with the existing stats logic
            • likewise i don't see any testing of using the same tag with multiple facet.query instances (or multiple facet.range instances) and confirming that both get hung off of the same pivot.
          Show
          Hoss Man added a comment - I didn't get a chance to review as in depth as i would have liked, but here are some comments based on a quick skim, followed by some more detailed review of a few classes... it seems verbose and confusing to use "facet.range" and "facet.query" as the local params for this... Confusing: these local params refer to tag names where as people might think they can inline arbitrary query expressions / field names to compute ranges on verbose: why not just "ranges" and "queries" for the local param names, similar to how "stats" is already used? I'm not a big fan of the fact that this patch breaks the determinism of the order that the different types of facets are returned in – It's probably not a blocker, but i suspect there may be some clients (or perhaps even client libraries other then SolrJ) which will be broken by this. In the SolrJ PivotField class, this patch adds "NamedList<Object> getRanges()" and "NamedList<Integer> getQueryCounts()" methods. We should really be consistent with the existing equivalent methods in QueryResponse... "Map<String,Integer>getFacetQuery()" "List<RangeFacet> getFacetRanges()" in PivotFacetValue: "// named list with objects because depending on how big the counts are we may get either a long or an int" FWIW: I had seriously never noticed until today that these facet counts had variable type why can't we just use "Number" instead of "Object" in this new code since that's the lavel all of the casting code that deals with this list seems to use anyway? doesn't this break the SolrJ code if/when it returns a Long? (see above new method "NamedList<Integer> getQueryCounts()") DateFacetProcessor this entire class should be marked deprecated maybe i'm missing something, but what exactly is the advantage of this subclassing RangeFacetProcessor? ... if they are sharing code it's not obvious to me, and if they aren't (intentionally) sharing code then this subclass relationship seems dangerous if/when future improvements to range faceting are made. FacetComponent why does doDistribRanges() still need to exist? why not just move that casting logic directly into RangeFacetAccumulator.mergeContributionFromShard like the rest of the code that use to be here and call it directly? now that these skethy "num()" functions are no longer private, can we please get some javadocs on them. PivotFacetProcessor unless i'm missunderstanding the usage, the way addPivotQueriesAndRanges (and removeUnwantedQueriesAndRanges) works means that every facet.query and facet.range param value (with all localparams) is going to be reparsed over and over and over again for every unique value in every pivot field – just to check the "tag" values and see if it's one that should be computed for this pivot. This seems really unneccessary – why not parse each param once into a simple datastructure (isn't that what the new ParsedParams" class is designed for?), and then put them in a map by tag available fro mthe request context – just like we did for the stats with StatsInfo.getStatsFieldsByTag(String) ? in particular won't this slow down existing requests containing both facet.pivot and facet.range || facet.query) ... even if the later aren't tagged or hung off of the pivots at all? because they'll still get parsed over and over again won't they? this logic also seems to completely break instances of facet.query used w/o linking it to a face.tpivot http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query=inStock:true&facet=true&facet.pivot=manu_id_s,inStock java.lang.NullPointerException at org.apache.solr.handler.component.PivotFacetProcessor.removeUnwantedQueriesAndRanges(PivotFacetProcessor.java:399) at org.apache.solr.handler.component.PivotFacetProcessor.addPivotQueriesAndRanges(PivotFacetProcessor.java:371) at org.apache.solr.handler.component.PivotFacetProcessor.doPivots(PivotFacetProcessor.java:273) at org.apache.solr.handler.component.PivotFacetProcessor.processSingle(PivotFacetProcessor.java:194) at org.apache.solr.handler.component.PivotFacetProcessor.process(PivotFacetProcessor.java:135) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:121) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) at also broken: neither of these requests should result in the facet.query hanging off of the pivots, but because of how StringUtils.contains() is used they both do erroniously... http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query={!tag=ttt}inStock:true&facet=true&facet.pivot={!facet.query=t}manu_id_s,inStock http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&facet.query={!tag=t}inStock:true&hang=&facet=true&facet.pivot={!facet.query=$hang}manu_id_s,inStock tests... a lot of code in the new test classes seems to have been copied verbatim from other existing tests – in some cases this is fine, because the copied test logic has been modified to include new params+asserts of the new functionality – but theres still a lot of redundent copy/past cruft w/o any logic changes eg: DistributedFacetPivotQuerySmallTest lines 428-532 seem to be verbatim copied from DistributedFacetPivotSmallTest w/o any additions to test the racet.query logic, or even new negative-assertions that stra facet.queries are hung off by mistake (ie: to catch the bugs i mentioned above) dito for DistributedFacetPivotRangeSmallTest there doesn't seem to be any new tests that show hanging both ranges & queries onto a pivot at the same time – let alone combining it with the existing stats logic likewise i don't see any testing of using the same tag with multiple facet.query instances (or multiple facet.range instances) and confirming that both get hung off of the same pivot.
          Hide
          Mike Murphy added a comment -

          Per SOLR-7296, shouldn't this go into lucene facets?
          Also, does the new facet module in SOLR-7214 already have this?
          This is continuing down the path of adding more different APIs to do the same thing.

          Show
          Mike Murphy added a comment - Per SOLR-7296 , shouldn't this go into lucene facets? Also, does the new facet module in SOLR-7214 already have this? This is continuing down the path of adding more different APIs to do the same thing.
          Hide
          Steve Molloy added a comment -

          Indeed, we need to find a way to reconcile, but as the JSON API will be experimental in 5.1, I don't think we should stop this new functionality from getting in. It's in line with all the other work under this unbrella, some of which is already in 5.0. Whether it ends up in Lucene, Solr facet component or facet module, the functionality is something needed and I don't think we should hold it up as it is still the official facet implementation after all...

          Show
          Steve Molloy added a comment - Indeed, we need to find a way to reconcile, but as the JSON API will be experimental in 5.1, I don't think we should stop this new functionality from getting in. It's in line with all the other work under this unbrella, some of which is already in 5.0. Whether it ends up in Lucene, Solr facet component or facet module, the functionality is something needed and I don't think we should hold it up as it is still the official facet implementation after all...
          Hide
          Yonik Seeley added a comment -

          Experimental has nothing to do with "officialness" - everything we release is official. It's only the degree to which we try to keep backward compatibility, and it can be a good idea for any sufficiently complex enough API. So "experimental" only means "hey, we still have a chance to easily improve the API before it gets locked down harder".

          Show
          Yonik Seeley added a comment - Experimental has nothing to do with "officialness" - everything we release is official. It's only the degree to which we try to keep backward compatibility, and it can be a good idea for any sufficiently complex enough API. So "experimental" only means "hey, we still have a chance to easily improve the API before it gets locked down harder".
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for the detailed review, Hoss.

          verbose: why not just "ranges" and "queries" for the local param names, similar to how "stats" is already used?

          That was an oversight. Fixed.

          I'm not a big fan of the fact that this patch breaks the determinism of the order that the different types of facets are returned in – It's probably not a blocker, but i suspect there may be some clients (or perhaps even client libraries other then SolrJ) which will be broken by this

          Fixed.

          In the SolrJ PivotField class, this patch adds "NamedList<Object> getRanges()" and "NamedList<Integer> getQueryCounts()" methods. We should really be consistent with the existing equivalent methods in QueryResponse

          Fixed.

          why can't we just use "Number" instead of "Object" in this new code since that's the lavel all of the casting code that deals with this list seems to use anyway?

          Done.

          doesn't this break the SolrJ code if/when it returns a Long? (see above new method "NamedList<Integer> getQueryCounts()")

          Yes it does. And the current SolrJ code for range & query facets also needs to be fixed. I'll open another issue to fix the client side of things.

          DateFacetProcessor – this entire class should be marked deprecated

          Done.

          maybe i'm missing something, but what exactly is the advantage of this subclassing RangeFacetProcessor? ... if they are sharing code it's not obvious to me, and if they aren't (intentionally) sharing code then this subclass relationship seems dangerous if/when future improvements to range faceting are made.

          This was an oversight. Fixed.

          FacetComponent – why does doDistribRanges() still need to exist? why not just move that casting logic directly into RangeFacetAccumulator.mergeContributionFromShard like the rest of the code that use to be here and call it directly?

          I inlined the method into FacetComponent.countFacets. I didn't move the casting logic though. The mergeContributionFromShard method could in theory accept an object and cast it to the right type but a method accepting just a java.lang.Object doesn't feel right to me.

          FacetComponent – now that these skethy "num()" functions are no longer private, can we please get some javadocs on them.

          Done.

          • PivotFacetProcessor
            • unless i'm missunderstanding the usage, the way addPivotQueriesAndRanges (and removeUnwantedQueriesAndRanges) works means that every facet.query and facet.range param value (with all localparams) is going to be reparsed over and over and over again for every unique value in every pivot field – just to check the "tag" values and see if it's one that should be computed for this pivot.
              This seems really unneccessary – why not parse each param once into a simple datastructure (isn't that what the new ParsedParams" class is designed for?), and then put them in a map by tag available fro mthe request context – just like we did for the stats with StatsInfo.getStatsFieldsByTag(String) ?
            • in particular won't this slow down existing requests containing both facet.pivot and facet.range || facet.query) ... even if the later aren't tagged or hung off of the pivots at all? because they'll still get parsed over and over again won't they?

          You're right. This was horrible and I should've noticed it myself. We now cache the ParsedParam by tags and use them instead of removing unwanted ranges/queries and re-parsing the request.

          this logic also seems to completely break instances of facet.query used w/o linking it to a face.tpivot

          This is also fixed.

          also broken: neither of these requests should result in the facet.query hanging off of the pivots, but because of how StringUtils.contains() is used they both do erroniously...

          Also fixed.

          • tests...
            • a lot of code in the new test classes seems to have been copied verbatim from other existing tests – in some cases this is fine, because the copied test logic has been modified to include new params+asserts of the new functionality – but theres still a lot of redundent copy/past cruft w/o any logic changes
              • eg: DistributedFacetPivotQuerySmallTest lines 428-532 seem to be verbatim copied from DistributedFacetPivotSmallTest w/o any additions to test the racet.query logic, or even new negative-assertions that stra facet.queries are hung off by mistake (ie: to catch the bugs i mentioned above)
              • dito for DistributedFacetPivotRangeSmallTest
            • there doesn't seem to be any new tests that show hanging both ranges & queries onto a pivot at the same time – let alone combining it with the existing stats logic
            • likewise i don't see any testing of using the same tag with multiple facet.query instances (or multiple facet.range instances) and confirming that both get hung off of the same pivot.
          1. I refactored and folded in the test logic back inside the DistributedFacetPivotSmallTest.
          2. There was really no need to duplicate the PivotFieldComparator, UnorderedEqualityArrayList and ComparablePivotField classes.
          3. DistributedFacetPivotSmallTest.testNegativeFacetQuery and testNegativeFacetRange test for negative assertions to ensure that extra facet.queries and facet.ranges are not hung by mistake
          4. DistributedFacetPivotSmallTest.testPivotFacetRangeAndQuery tests range and queries hanging off pivots together along with stats
          Show
          Shalin Shekhar Mangar added a comment - Thanks for the detailed review, Hoss. verbose: why not just "ranges" and "queries" for the local param names, similar to how "stats" is already used? That was an oversight. Fixed. I'm not a big fan of the fact that this patch breaks the determinism of the order that the different types of facets are returned in – It's probably not a blocker, but i suspect there may be some clients (or perhaps even client libraries other then SolrJ) which will be broken by this Fixed. In the SolrJ PivotField class, this patch adds "NamedList<Object> getRanges()" and "NamedList<Integer> getQueryCounts()" methods. We should really be consistent with the existing equivalent methods in QueryResponse Fixed. why can't we just use "Number" instead of "Object" in this new code since that's the lavel all of the casting code that deals with this list seems to use anyway? Done. doesn't this break the SolrJ code if/when it returns a Long? (see above new method "NamedList<Integer> getQueryCounts()") Yes it does. And the current SolrJ code for range & query facets also needs to be fixed. I'll open another issue to fix the client side of things. DateFacetProcessor – this entire class should be marked deprecated Done. maybe i'm missing something, but what exactly is the advantage of this subclassing RangeFacetProcessor? ... if they are sharing code it's not obvious to me, and if they aren't (intentionally) sharing code then this subclass relationship seems dangerous if/when future improvements to range faceting are made. This was an oversight. Fixed. FacetComponent – why does doDistribRanges() still need to exist? why not just move that casting logic directly into RangeFacetAccumulator.mergeContributionFromShard like the rest of the code that use to be here and call it directly? I inlined the method into FacetComponent.countFacets. I didn't move the casting logic though. The mergeContributionFromShard method could in theory accept an object and cast it to the right type but a method accepting just a java.lang.Object doesn't feel right to me. FacetComponent – now that these skethy "num()" functions are no longer private, can we please get some javadocs on them. Done. PivotFacetProcessor unless i'm missunderstanding the usage, the way addPivotQueriesAndRanges (and removeUnwantedQueriesAndRanges) works means that every facet.query and facet.range param value (with all localparams) is going to be reparsed over and over and over again for every unique value in every pivot field – just to check the "tag" values and see if it's one that should be computed for this pivot. This seems really unneccessary – why not parse each param once into a simple datastructure (isn't that what the new ParsedParams" class is designed for?), and then put them in a map by tag available fro mthe request context – just like we did for the stats with StatsInfo.getStatsFieldsByTag(String) ? in particular won't this slow down existing requests containing both facet.pivot and facet.range || facet.query) ... even if the later aren't tagged or hung off of the pivots at all? because they'll still get parsed over and over again won't they? You're right. This was horrible and I should've noticed it myself. We now cache the ParsedParam by tags and use them instead of removing unwanted ranges/queries and re-parsing the request. this logic also seems to completely break instances of facet.query used w/o linking it to a face.tpivot This is also fixed. also broken: neither of these requests should result in the facet.query hanging off of the pivots, but because of how StringUtils.contains() is used they both do erroniously... Also fixed. tests... a lot of code in the new test classes seems to have been copied verbatim from other existing tests – in some cases this is fine, because the copied test logic has been modified to include new params+asserts of the new functionality – but theres still a lot of redundent copy/past cruft w/o any logic changes eg: DistributedFacetPivotQuerySmallTest lines 428-532 seem to be verbatim copied from DistributedFacetPivotSmallTest w/o any additions to test the racet.query logic, or even new negative-assertions that stra facet.queries are hung off by mistake (ie: to catch the bugs i mentioned above) dito for DistributedFacetPivotRangeSmallTest there doesn't seem to be any new tests that show hanging both ranges & queries onto a pivot at the same time – let alone combining it with the existing stats logic likewise i don't see any testing of using the same tag with multiple facet.query instances (or multiple facet.range instances) and confirming that both get hung off of the same pivot. I refactored and folded in the test logic back inside the DistributedFacetPivotSmallTest. There was really no need to duplicate the PivotFieldComparator, UnorderedEqualityArrayList and ComparablePivotField classes. DistributedFacetPivotSmallTest.testNegativeFacetQuery and testNegativeFacetRange test for negative assertions to ensure that extra facet.queries and facet.ranges are not hung by mistake DistributedFacetPivotSmallTest.testPivotFacetRangeAndQuery tests range and queries hanging off pivots together along with stats
          Hide
          Mike Murphy added a comment -

          A new RangeFacetProcessor is refactored out of SimpleFacets

          It's great that you are re-factoring code out of the horror that is SimpleFacets, but there is already a class called FacetRangeProcessor in the new facet module. That was very confusing when I was trying to make sense of this in eclipse.

          Show
          Mike Murphy added a comment - A new RangeFacetProcessor is refactored out of SimpleFacets It's great that you are re-factoring code out of the horror that is SimpleFacets, but there is already a class called FacetRangeProcessor in the new facet module. That was very confusing when I was trying to make sense of this in eclipse.
          Hide
          Hoss Man added a comment -

          feedback on the latest patch...

          • DateFacetProcessor and RangeFacetProcessor should be moved into handler/component along side PivotFacetProcessor - SimpleFacts is only in the the request subpackage for historical reasons, we shouldn't keep making that mistake
          • Why was SimpleFacets.getFacetCounts() removed completely?
            • in the current patch, the existing callers of that method (FacetComponent and MLT Handler) now have cut/paste duplicated code.
            • why can't that method continue to exist (modified to use the new processors) to eliminate this duplication?
          • PivotFacetProcessor.getTaggedFacets
            • why is this method returning a map of one entry (which is the tag name?)
            • the parsing bugs from the last patch are gone, but the way this method is used still has a lot of the same "parse facet params over and over again" redundency – although admitedly to a lesser extent – that seems inefficient...
              • this method will be called for every facet.pivot param that has a "range" or "query" local param
              • within this method, every facet.range / facet.query param in the request will be parsed (again) to see if it has the specified tag.
            • I still think that hanging "facets" off of pivots should follow the same model as hanging "stats" off of pivots (as i mentioned before: see StatsInfo.getStatsFieldsByTag)...
              • all of the param parsing is done in StatsComponent.prepare, and datastrucutres (StatsInfo) with all the information/logic needed to accumulate stats are put in the request context
              • the main stats logic iterates over all StatsField instances to compute stats over the entire result set
              • each facet.pivot param, as it's processed, gets the StatsInfo from the request context and does a simple map lookup to find the relevant StatsField instances based on it's tag
            • is there a a reason not to do the same approach in this issue? ie...
              • all logic related to parsing facet.range and facet.query params can be done once in FacetComponent.prepare and the resulting data structures can be put in the request context where they can be reused.
              • no matter how many times a range facet or query facet is hung off of something else (by tag) it doesn't need to be reparsed, the caller can just fetch the neccessary object from the context and ask it for a processor/accumulator/whatever to compute the results relative to a docset.
          • PivotFacetProcessor.removeUnwantedQueriesAndRanges
            • this is dead code correct?
          • FWIW: ant precommit is currently failing because of nocommits .. not sure if there are other precommit checks beyond that that might be problematic
          Show
          Hoss Man added a comment - feedback on the latest patch... DateFacetProcessor and RangeFacetProcessor should be moved into handler/component along side PivotFacetProcessor - SimpleFacts is only in the the request subpackage for historical reasons, we shouldn't keep making that mistake Why was SimpleFacets.getFacetCounts() removed completely? in the current patch, the existing callers of that method (FacetComponent and MLT Handler) now have cut/paste duplicated code. why can't that method continue to exist (modified to use the new processors) to eliminate this duplication? PivotFacetProcessor.getTaggedFacets why is this method returning a map of one entry (which is the tag name?) the parsing bugs from the last patch are gone, but the way this method is used still has a lot of the same "parse facet params over and over again" redundency – although admitedly to a lesser extent – that seems inefficient... this method will be called for every facet.pivot param that has a "range" or "query" local param within this method, every facet.range / facet.query param in the request will be parsed (again) to see if it has the specified tag. I still think that hanging "facets" off of pivots should follow the same model as hanging "stats" off of pivots (as i mentioned before: see StatsInfo.getStatsFieldsByTag)... all of the param parsing is done in StatsComponent.prepare, and datastrucutres (StatsInfo) with all the information/logic needed to accumulate stats are put in the request context the main stats logic iterates over all StatsField instances to compute stats over the entire result set each facet.pivot param, as it's processed, gets the StatsInfo from the request context and does a simple map lookup to find the relevant StatsField instances based on it's tag is there a a reason not to do the same approach in this issue? ie... all logic related to parsing facet.range and facet.query params can be done once in FacetComponent.prepare and the resulting data structures can be put in the request context where they can be reused. no matter how many times a range facet or query facet is hung off of something else (by tag) it doesn't need to be reparsed, the caller can just fetch the neccessary object from the context and ask it for a processor/accumulator/whatever to compute the results relative to a docset. PivotFacetProcessor.removeUnwantedQueriesAndRanges this is dead code correct? FWIW: ant precommit is currently failing because of nocommits .. not sure if there are other precommit checks beyond that that might be problematic
          Hide
          Tim Underwood added a comment -

          Any progress on this? I'd love to see SOLR-6686 fixed for 5.2.

          Show
          Tim Underwood added a comment - Any progress on this? I'd love to see SOLR-6686 fixed for 5.2.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. DateFacetProcessor and DateFacetProcessor are moved to handler/component package
          2. SimpleFacets.getFacetCounts() was removed because it would have been weird for SimpleFacets to create and use its own sub-class RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced that bit of code with a static method in FacetComponent which should eliminate the repetition in MLTHandler.
          3. Now FacetComponent tries to do a similar thing as StatsComponent i.e. to parse params in prepare method and re-use in PivotFacetProcessor. I tried many designs before the current patch:
            • Create FacetInfo in prepare method and re-use throughout. However this doesn't work because there is no separation of concerns in the constituents of FacetInfo. They serve a dual purpose of parsing and aggregating. So when we try creating it in prepare it would throw NPEs inside shard requests trying to access rb.shards.
            • Create a FacetContext class which parses and keeps the params and use it in PivotFacetProcessor. We can't use ParsedParam because it has DocSet etc which are just not available in the prepare method.
            • Finally I settled for a crude approach of enhancing FacetBase with tags and just parsing the facet queries and ranges and putting them in the request context. Not super happy about it but un-entangling FacetInfo and related classes is too big a task for this issue.
          • PivotFacetProcessor.removeUnwantedQueriesAndRanges was dead code and it has been removed.
          • I removed the RangeFacetAccumulator class and replaced it with a LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends FacetBase just like PivotFacetValue and is responsible for aggregation of a single facet value. This increases a bit of code i.e. looping over all range facets and aggregating but it is no different from what other facet types were doing.
          • Tests and precommit both pass.

          TBH, this faceting code makes my head hurt and a massive refactoring is required. I have some ideas around it but I chose not to do any more to avoid scope creep. Once this is committed, we can open a few issues to clean up.

          Mike Murphy - I know the class names are same but what's a better name? This patch is older than the new code and is on the same lines as PivotFacetProcessor.

          Show
          Shalin Shekhar Mangar added a comment - Changes: DateFacetProcessor and DateFacetProcessor are moved to handler/component package SimpleFacets.getFacetCounts() was removed because it would have been weird for SimpleFacets to create and use its own sub-class RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced that bit of code with a static method in FacetComponent which should eliminate the repetition in MLTHandler. Now FacetComponent tries to do a similar thing as StatsComponent i.e. to parse params in prepare method and re-use in PivotFacetProcessor. I tried many designs before the current patch: Create FacetInfo in prepare method and re-use throughout. However this doesn't work because there is no separation of concerns in the constituents of FacetInfo. They serve a dual purpose of parsing and aggregating. So when we try creating it in prepare it would throw NPEs inside shard requests trying to access rb.shards. Create a FacetContext class which parses and keeps the params and use it in PivotFacetProcessor. We can't use ParsedParam because it has DocSet etc which are just not available in the prepare method. Finally I settled for a crude approach of enhancing FacetBase with tags and just parsing the facet queries and ranges and putting them in the request context. Not super happy about it but un-entangling FacetInfo and related classes is too big a task for this issue. PivotFacetProcessor.removeUnwantedQueriesAndRanges was dead code and it has been removed. I removed the RangeFacetAccumulator class and replaced it with a LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends FacetBase just like PivotFacetValue and is responsible for aggregation of a single facet value. This increases a bit of code i.e. looping over all range facets and aggregating but it is no different from what other facet types were doing. Tests and precommit both pass. TBH, this faceting code makes my head hurt and a massive refactoring is required. I have some ideas around it but I chose not to do any more to avoid scope creep. Once this is committed, we can open a few issues to clean up. Mike Murphy - I know the class names are same but what's a better name? This patch is older than the new code and is on the same lines as PivotFacetProcessor.
          Hide
          Hoss Man added a comment -

          Ok, I managed to review a little over half of the the current patch (trying to start with the small changes first and work my way up) – comments below...


          General feedback: This patch, on the whole, would be a lot easier to review – and the final code a lot easier to make sense of – if there were more class/method javadocs explaining what the purpose of everything was. I realize that could be said for a lot of the existing Solr code, but adding/refactoring classses/methods is the best time to try and improve the situation since that's when you're thining about it the most.


          "SimpleFacets.getFacetCounts() was removed because it would have been weird for SimpleFacets to create and use its own sub-class RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced that bit of code with a static method in FacetComponent which should eliminate the repetition in MLTHandler.

          If you don't think it makes sense to keep this method where it is that's fine, but it is a public method designed for people writting custom request handlers to use to add faceting to their stuff – so at the very least you should leave a deprecated stub in it's place that calls your new method.

          It also confuses the hell out of me that in the process of moving this method you made the method signature more complicated to use (more args to understand, more exceptions to deal with) and got rid of the javadocs so the new more complicated call signature is even harder to understand – and I, frankly, can't make sense of why any of that is neccessary (even if should now live in a diff class because of the inheritence)...

          • why does the caller have to pass in an empty NamedList to be mutated? why can't the method just return a NamedList like the old method did?
          • why does the caller have to instantiate the date & range processors themselves?
            • in both of the places this method is called, the processors are constructed just to pass to this method and then thrown away.
            • So why can't this static method construct them internally (using the request, docset, params, and response builder from the SimpleFacets object that's also passed as an arg) and keep the method signature simple?
          • why does this new method catch & wrap SyntaxErrors but not IOExceptions like the old method?

          (If there reasons for these changes then great – but they aren't clear just from reading the patch, and there's no javadocs to get guidance from)


          I removed the RangeFacetAccumulator class and replaced it with a LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends FacetBase just like PivotFacetValue and is responsible for aggregation of a single facet value.

          • I don't see a class called "RangeFacet" ... it looks like you talking about RangeFacetValue?
            • FWIW: PivotFacetValue doesn't extend FacetBase because it models a single (value,count) pair in the context of a PivotFacetField (which may be hierarchical) ... you may be thinking of "PivotFacet"
            • ... in which case, should this class atually be named "RangeFacet" ?
            • either way: can we please beef up the javadocs of this class to be crystal clear about what it's modeling – at first glance I thought it was just modeling a facet.range param (with the field, start, end, calculator) before I realized it not only models the field + the indiviual buckets, but also the values in those buckets (but not the other params)
          • Isn't RangeFacetValue.fieldName redundent here? - it's already handled by FacetBase.facetOn.
          • I don't think there is any reason to initialize shardFieldValues in mergeContributionFromShard before the rangeFacet = rangeFromShard short-circut, it's just wasted cycles the first time the method gets called, correct?

          Looking at the changes to PivotFacetValue...

          • why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList?
            • it seems like a really bad idea for these to not be returned in a deterministic order – it should be in the same order as the top level facet.range results (which is hte order of the facet.range params in the request)
            • no idea if this HashMap usage currently causes problems in the shard merging, but it smells fishy – and either way the user should be able to count on getting the range facets back in the same order the asked for them
          • i didn't dig into the full ramifications of this, but you've got PivotFacetValue constructing RangeFacetValue objects and passing the response key in as the "fieldName" argument to the constructor – those aren't the same thing. Either the method call here is wrong, or the constructor arg should be renamed – either way any usage of this "fieldName" constructor arg and internal usage should be sanity checked to ensure it's used consistnetly everywhere
            • are there tests of using a diff response key from the fieldName when hanging ranges under pivots? (both single node and cloud?)
          • why is PivotFacetValue.createFromNamedList conditionally calling RangeFacetValue.mergeContributionFromShard?
            • Unless i'm missing something, that method is only ever creating entirely new instances of PivotFacetValue, so there shouldn't be any existing data to merge into ... correct?
          • in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of looping over each shard's response to either construct a new RangeFacetValue or call RangeFacetValue.mergeContributionFromShard seems like it could/should be refactored into a static helper method (in the RangeFacetValue class)
            • the exact same loop is used in the only other place mergeContributionFromShard is called (FacetComponent)
            • if PivotFacetValue.createFromNamedList really does need to conditional merge (see previuos comment), then it's loop would also be exactly the same, and it could use the same static helper method.

          Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct response format (ie: no special objects need constructed to model the simple counts) can't it just return "querycounts" if "receivingMap" is null? (instead of looping & adding)


          With the current patch, something is wrong with how localparams are dealt with when trying to use facet.ranges inside of pivots. These queries are examples of doing range queries on the same field with diff gap params, and (independently) doing a pivot...

          http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price
          
          http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
          

          ...but as soon as you try to hang those ranges on the pivot you get a local errors...

          http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price
          ...
          Missing required parameter: f.price.facet.range.start (or default: facet.range.start)
          
          
          http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price
          ...
          Missing required parameter: f.price.facet.range.gap (or default: facet.range.gap)
          

          ...the fact that this error only pops up when combining pivots + ranges suggests that the patch didn't break the existing code controlling how the param parsing is done, it aparently just isn't re-using the exact same parsing code in both situations? Which also means that within a single request, the range facet params are still getting parsed (at least) twice – once (correctly) for the top level facets, and (at least) once (incorrectly) when hanging off of the pivot.

          ...Finally I settled for a crude approach of enhancing FacetBase with tags and just parsing the facet queries and ranges and putting them in the request context. ...

          Hmmm....

          I haven't seen this piece of code yet, but it sounds like it only prevents the "tags" from being parsed over and over again – from what I've seen so far, RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates to RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams)
          is still going to cause a new RangeEndpointCalculator to be constructed, and the exact same start/end/include/etc... Strings to be parsed over and over and over again, for every pivot value, at every level. These things should really just be parsed once in prepare(), put into the request context, and reused at every level of every pivot.

          To put it another way...

          With the patch right now, if you do the following query using the techproducts example data:

          http://localhost:8983/solr/techproducts/select?rows=0&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2000-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2B1YEAR
          

          ... then 41 DateRangeEndpointCalculator objects get instantated even though 1 is plenty; the start value of "2000-01-01T00:00:00Z" and the end value "2010-01-01T00:00:00Z" each get parsed into Date objects 41 times, even though the result is always exactly the same. That seems really inefficient ... doesn't it defeat the purpose of doing that param parsing in prepare()?

          Show
          Hoss Man added a comment - Ok, I managed to review a little over half of the the current patch (trying to start with the small changes first and work my way up) – comments below... General feedback: This patch, on the whole, would be a lot easier to review – and the final code a lot easier to make sense of – if there were more class/method javadocs explaining what the purpose of everything was. I realize that could be said for a lot of the existing Solr code, but adding/refactoring classses/methods is the best time to try and improve the situation since that's when you're thining about it the most. "SimpleFacets.getFacetCounts() was removed because it would have been weird for SimpleFacets to create and use its own sub-class RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced that bit of code with a static method in FacetComponent which should eliminate the repetition in MLTHandler. If you don't think it makes sense to keep this method where it is that's fine, but it is a public method designed for people writting custom request handlers to use to add faceting to their stuff – so at the very least you should leave a deprecated stub in it's place that calls your new method. It also confuses the hell out of me that in the process of moving this method you made the method signature more complicated to use (more args to understand, more exceptions to deal with) and got rid of the javadocs so the new more complicated call signature is even harder to understand – and I, frankly, can't make sense of why any of that is neccessary (even if should now live in a diff class because of the inheritence)... why does the caller have to pass in an empty NamedList to be mutated? why can't the method just return a NamedList like the old method did? why does the caller have to instantiate the date & range processors themselves? in both of the places this method is called, the processors are constructed just to pass to this method and then thrown away. So why can't this static method construct them internally (using the request, docset, params, and response builder from the SimpleFacets object that's also passed as an arg) and keep the method signature simple? why does this new method catch & wrap SyntaxErrors but not IOExceptions like the old method? (If there reasons for these changes then great – but they aren't clear just from reading the patch, and there's no javadocs to get guidance from) I removed the RangeFacetAccumulator class and replaced it with a LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends FacetBase just like PivotFacetValue and is responsible for aggregation of a single facet value. I don't see a class called "RangeFacet" ... it looks like you talking about RangeFacetValue? FWIW: PivotFacetValue doesn't extend FacetBase because it models a single (value,count) pair in the context of a PivotFacetField (which may be hierarchical) ... you may be thinking of "PivotFacet" ... in which case, should this class atually be named "RangeFacet" ? either way: can we please beef up the javadocs of this class to be crystal clear about what it's modeling – at first glance I thought it was just modeling a facet.range param (with the field, start, end, calculator) before I realized it not only models the field + the indiviual buckets, but also the values in those buckets (but not the other params) Isn't RangeFacetValue.fieldName redundent here? - it's already handled by FacetBase.facetOn. I don't think there is any reason to initialize shardFieldValues in mergeContributionFromShard before the rangeFacet = rangeFromShard short-circut, it's just wasted cycles the first time the method gets called, correct? Looking at the changes to PivotFacetValue... why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList? it seems like a really bad idea for these to not be returned in a deterministic order – it should be in the same order as the top level facet.range results (which is hte order of the facet.range params in the request) no idea if this HashMap usage currently causes problems in the shard merging, but it smells fishy – and either way the user should be able to count on getting the range facets back in the same order the asked for them i didn't dig into the full ramifications of this, but you've got PivotFacetValue constructing RangeFacetValue objects and passing the response key in as the "fieldName" argument to the constructor – those aren't the same thing. Either the method call here is wrong, or the constructor arg should be renamed – either way any usage of this "fieldName" constructor arg and internal usage should be sanity checked to ensure it's used consistnetly everywhere are there tests of using a diff response key from the fieldName when hanging ranges under pivots? (both single node and cloud?) why is PivotFacetValue.createFromNamedList conditionally calling RangeFacetValue.mergeContributionFromShard? Unless i'm missing something, that method is only ever creating entirely new instances of PivotFacetValue, so there shouldn't be any existing data to merge into ... correct? in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of looping over each shard's response to either construct a new RangeFacetValue or call RangeFacetValue.mergeContributionFromShard seems like it could/should be refactored into a static helper method (in the RangeFacetValue class) the exact same loop is used in the only other place mergeContributionFromShard is called (FacetComponent) if PivotFacetValue.createFromNamedList really does need to conditional merge (see previuos comment), then it's loop would also be exactly the same, and it could use the same static helper method. Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct response format (ie: no special objects need constructed to model the simple counts) can't it just return "querycounts" if "receivingMap" is null? (instead of looping & adding) With the current patch, something is wrong with how localparams are dealt with when trying to use facet.ranges inside of pivots. These queries are examples of doing range queries on the same field with diff gap params, and (independently) doing a pivot... http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price ...but as soon as you try to hang those ranges on the pivot you get a local errors... http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price ... Missing required parameter: f.price.facet.range.start (or default: facet.range.start) http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price ... Missing required parameter: f.price.facet.range.gap (or default: facet.range.gap) ...the fact that this error only pops up when combining pivots + ranges suggests that the patch didn't break the existing code controlling how the param parsing is done, it aparently just isn't re-using the exact same parsing code in both situations? Which also means that within a single request, the range facet params are still getting parsed (at least) twice – once (correctly) for the top level facets, and (at least) once (incorrectly) when hanging off of the pivot. ...Finally I settled for a crude approach of enhancing FacetBase with tags and just parsing the facet queries and ranges and putting them in the request context. ... Hmmm.... I haven't seen this piece of code yet, but it sounds like it only prevents the "tags" from being parsed over and over again – from what I've seen so far, RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates to RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams) is still going to cause a new RangeEndpointCalculator to be constructed, and the exact same start/end/include/etc... Strings to be parsed over and over and over again, for every pivot value, at every level. These things should really just be parsed once in prepare(), put into the request context, and reused at every level of every pivot. To put it another way... With the patch right now, if you do the following query using the techproducts example data: http://localhost:8983/solr/techproducts/select?rows=0&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2000-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2B1YEAR ... then 41 DateRangeEndpointCalculator objects get instantated even though 1 is plenty; the start value of "2000-01-01T00:00:00Z" and the end value "2010-01-01T00:00:00Z" each get parsed into Date objects 41 times, even though the result is always exactly the same. That seems really inefficient ... doesn't it defeat the purpose of doing that param parsing in prepare()?
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for the review Hoss. This patch should take care of all the concerns you outlined.

          If you don't think it makes sense to keep this method where it is that's fine, but it is a public method designed for people writting custom request handlers to use to add faceting to their stuff – so at the very least you should leave a deprecated stub in it's place that calls your new method.
          It also confuses the hell out of me that in the process of moving this method you made the method signature more complicated to use (more args to understand, more exceptions to deal with) and got rid of the javadocs so the new more complicated call signature is even harder to understand – and I, frankly, can't make sense of why any of that is neccessary (even if should now live in a diff class because of the inheritence)...
          why does the caller have to pass in an empty NamedList to be mutated? why can't the method just return a NamedList like the old method did?
          why does the caller have to instantiate the date & range processors themselves?
          in both of the places this method is called, the processors are constructed just to pass to this method and then thrown away.
          So why can't this static method construct them internally (using the request, docset, params, and response builder from the SimpleFacets object that's also passed as an arg) and keep the method signature simple?
          why does this new method catch & wrap SyntaxErrors but not IOExceptions like the old method?
          (If there reasons for these changes then great – but they aren't clear just from reading the patch, and there's no javadocs to get guidance from)

          1. I have restored the old method but marked it as deprecated.
          2. It internally calls the new static helper method to add all kinds of facets.
          3. The signature is also simplified. It now accepts a SimpleFacets class and constructs all other required processors internally and returns a NamedList like the old method did.
          4. Not wrapping IOException was a mistake. This is fixed.

          I don't see a class called "RangeFacet" ... it looks like you talking about RangeFacetValue?
          FWIW: PivotFacetValue doesn't extend FacetBase because it models a single (value,count) pair in the context of a PivotFacetField (which may be hierarchical) ... you may be thinking of "PivotFacet"
          ... in which case, should this class atually be named "RangeFacet" ?
          either way: can we please beef up the javadocs of this class to be crystal clear about what it's modeling – at first glance I thought it was just modeling a facet.range param (with the field, start, end, calculator) before I realized it not only models the field + the indiviual buckets, but also the values in those buckets (but not the other params)
          Isn't RangeFacetValue.fieldName redundent here? - it's already handled by FacetBase.facetOn.
          I don't think there is any reason to initialize shardFieldValues in mergeContributionFromShard before the rangeFacet = rangeFromShard short-circut, it's just wasted cycles the first time the method gets called, correct?

          I further refactored the code into into three separate classes to divide the responsibilities clearly.

          1. The first called RangeFacetRequest models a single facet.range request along with all request parameters, appropriate calculator and ranges (gaps). Instances of this class are now created once in FacetComponent.prepare and cached in the context and re-used for non-distrib facet.range calculation as well as in pivot facet classes.
          2. The second RangeFacetRequest.DistribRangeFacet models the full 'facet_ranges' to be returned to the user for a distrib request and has helper methods for merging contributions of individual shards.
          3. The third RangeFacetRequest.FacetRange models a single range i.e. an individual gap for which the count has to be computed. A list of this class is created by the calculator (inside RangeFacetRequest's constructor)

          This is actually almost the same as the framework in the Analytics component with some minor modifications. Once this issue is committed, I'll open issues to refactor common code in a single place and re-use. It's funny that I started writing a class to pre-compute the gaps and ended up with almost the same code as RangeFacetRequest in Analytics but a little uglier so I threw that code away and started from the Analytics Component's design and tweaked it to fit our use-case.

          why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList?
          it seems like a really bad idea for these to not be returned in a deterministic order – it should be in the same order as the top level facet.range results (which is hte order of the facet.range params in the request)
          no idea if this HashMap usage currently causes problems in the shard merging, but it smells fishy – and either way the user should be able to count on getting the range facets back in the same order the asked for them

          This is now a LinkedHashMap and its key/values are added in the same order as the NamedList from a shard's response. Though I do not agree about the user counting on the same order because this is not a list but a complex object in all our responses. But I digress.

          i didn't dig into the full ramifications of this, but you've got PivotFacetValue constructing RangeFacetValue objects and passing the response key in as the "fieldName" argument to the constructor – those aren't the same thing. Either the method call here is wrong, or the constructor arg should be renamed – either way any usage of this "fieldName" constructor arg and internal usage should be sanity checked to ensure it's used consistnetly everywhere

          This is no longer an issue since we use the DistribRangeFacet class which has no need for a key at all.

          are there tests of using a diff response key from the fieldName when hanging ranges under pivots? (both single node and cloud?)

          Yes, see DistributedFacetPivotSmallTest.testFacetPivotRange() and SolrExampleTests.testPivotFacetsRanges()

          in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of looping over each shard's response to either construct a new RangeFacetValue or call RangeFacetValue.mergeContributionFromShard seems like it could/should be refactored into a static helper method (in the RangeFacetValue class)

          That bit of code has been refactored into DistribRangeFacet.mergeFacetRangesFromShardResponse method.

          why is PivotFacetValue.createFromNamedList conditionally calling RangeFacetValue.mergeContributionFromShard?

          Laziness on my part, really. Now that this bit of code has been refactored into DistribRangeFacet.mergeFacetRangesFromShardResponse, I use the same here as well. The extra null check wouldn't hurt here.

          Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct response format (ie: no special objects need constructed to model the simple counts) can't it just return "querycounts" if "receivingMap" is null? (instead of looping & adding)

          Good point, done.

          With the current patch, something is wrong with how localparams are dealt with when trying to use facet.ranges inside of pivots. These queries are examples of doing range queries on the same field with diff gap params, and (independently) doing a pivot

          Fixed. I am not sure why that happened but once I refactored the code to do all the parsing and calculation in prepare, I could not reproduce the issue.

          I haven't seen this piece of code yet, but it sounds like it only prevents the "tags" from being parsed over and over again – from what I've seen so far, RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates to RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams)
          is still going to cause a new RangeEndpointCalculator to be constructed, and the exact same start/end/include/etc... Strings to be parsed over and over and over again, for every pivot value, at every level. These things should really just be parsed once in prepare(), put into the request context, and reused at every level of every pivot.

          This is also done as I noted earlier in my comment. The class RangeFacetRequest is the one to be looked at.

          Show
          Shalin Shekhar Mangar added a comment - Thanks for the review Hoss. This patch should take care of all the concerns you outlined. If you don't think it makes sense to keep this method where it is that's fine, but it is a public method designed for people writting custom request handlers to use to add faceting to their stuff – so at the very least you should leave a deprecated stub in it's place that calls your new method. It also confuses the hell out of me that in the process of moving this method you made the method signature more complicated to use (more args to understand, more exceptions to deal with) and got rid of the javadocs so the new more complicated call signature is even harder to understand – and I, frankly, can't make sense of why any of that is neccessary (even if should now live in a diff class because of the inheritence)... why does the caller have to pass in an empty NamedList to be mutated? why can't the method just return a NamedList like the old method did? why does the caller have to instantiate the date & range processors themselves? in both of the places this method is called, the processors are constructed just to pass to this method and then thrown away. So why can't this static method construct them internally (using the request, docset, params, and response builder from the SimpleFacets object that's also passed as an arg) and keep the method signature simple? why does this new method catch & wrap SyntaxErrors but not IOExceptions like the old method? (If there reasons for these changes then great – but they aren't clear just from reading the patch, and there's no javadocs to get guidance from) I have restored the old method but marked it as deprecated. It internally calls the new static helper method to add all kinds of facets. The signature is also simplified. It now accepts a SimpleFacets class and constructs all other required processors internally and returns a NamedList like the old method did. Not wrapping IOException was a mistake. This is fixed. I don't see a class called "RangeFacet" ... it looks like you talking about RangeFacetValue? FWIW: PivotFacetValue doesn't extend FacetBase because it models a single (value,count) pair in the context of a PivotFacetField (which may be hierarchical) ... you may be thinking of "PivotFacet" ... in which case, should this class atually be named "RangeFacet" ? either way: can we please beef up the javadocs of this class to be crystal clear about what it's modeling – at first glance I thought it was just modeling a facet.range param (with the field, start, end, calculator) before I realized it not only models the field + the indiviual buckets, but also the values in those buckets (but not the other params) Isn't RangeFacetValue.fieldName redundent here? - it's already handled by FacetBase.facetOn. I don't think there is any reason to initialize shardFieldValues in mergeContributionFromShard before the rangeFacet = rangeFromShard short-circut, it's just wasted cycles the first time the method gets called, correct? I further refactored the code into into three separate classes to divide the responsibilities clearly. The first called RangeFacetRequest models a single facet.range request along with all request parameters, appropriate calculator and ranges (gaps). Instances of this class are now created once in FacetComponent.prepare and cached in the context and re-used for non-distrib facet.range calculation as well as in pivot facet classes. The second RangeFacetRequest.DistribRangeFacet models the full 'facet_ranges' to be returned to the user for a distrib request and has helper methods for merging contributions of individual shards. The third RangeFacetRequest.FacetRange models a single range i.e. an individual gap for which the count has to be computed. A list of this class is created by the calculator (inside RangeFacetRequest's constructor) This is actually almost the same as the framework in the Analytics component with some minor modifications. Once this issue is committed, I'll open issues to refactor common code in a single place and re-use. It's funny that I started writing a class to pre-compute the gaps and ended up with almost the same code as RangeFacetRequest in Analytics but a little uglier so I threw that code away and started from the Analytics Component's design and tweaked it to fit our use-case. why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList? it seems like a really bad idea for these to not be returned in a deterministic order – it should be in the same order as the top level facet.range results (which is hte order of the facet.range params in the request) no idea if this HashMap usage currently causes problems in the shard merging, but it smells fishy – and either way the user should be able to count on getting the range facets back in the same order the asked for them This is now a LinkedHashMap and its key/values are added in the same order as the NamedList from a shard's response. Though I do not agree about the user counting on the same order because this is not a list but a complex object in all our responses. But I digress. i didn't dig into the full ramifications of this, but you've got PivotFacetValue constructing RangeFacetValue objects and passing the response key in as the "fieldName" argument to the constructor – those aren't the same thing. Either the method call here is wrong, or the constructor arg should be renamed – either way any usage of this "fieldName" constructor arg and internal usage should be sanity checked to ensure it's used consistnetly everywhere This is no longer an issue since we use the DistribRangeFacet class which has no need for a key at all. are there tests of using a diff response key from the fieldName when hanging ranges under pivots? (both single node and cloud?) Yes, see DistributedFacetPivotSmallTest.testFacetPivotRange() and SolrExampleTests.testPivotFacetsRanges() in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of looping over each shard's response to either construct a new RangeFacetValue or call RangeFacetValue.mergeContributionFromShard seems like it could/should be refactored into a static helper method (in the RangeFacetValue class) That bit of code has been refactored into DistribRangeFacet.mergeFacetRangesFromShardResponse method. why is PivotFacetValue.createFromNamedList conditionally calling RangeFacetValue.mergeContributionFromShard? Laziness on my part, really. Now that this bit of code has been refactored into DistribRangeFacet.mergeFacetRangesFromShardResponse, I use the same here as well. The extra null check wouldn't hurt here. Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct response format (ie: no special objects need constructed to model the simple counts) can't it just return "querycounts" if "receivingMap" is null? (instead of looping & adding) Good point, done. With the current patch, something is wrong with how localparams are dealt with when trying to use facet.ranges inside of pivots. These queries are examples of doing range queries on the same field with diff gap params, and (independently) doing a pivot Fixed. I am not sure why that happened but once I refactored the code to do all the parsing and calculation in prepare, I could not reproduce the issue. I haven't seen this piece of code yet, but it sounds like it only prevents the "tags" from being parsed over and over again – from what I've seen so far, RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates to RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams) is still going to cause a new RangeEndpointCalculator to be constructed, and the exact same start/end/include/etc... Strings to be parsed over and over and over again, for every pivot value, at every level. These things should really just be parsed once in prepare(), put into the request context, and reused at every level of every pivot. This is also done as I noted earlier in my comment. The class RangeFacetRequest is the one to be looked at.
          Hide
          Shalin Shekhar Mangar added a comment -

          Here's a new patch (SOLR-6353-4212.patch) which applies after the SOLR-6686 commit.

          Show
          Shalin Shekhar Mangar added a comment - Here's a new patch ( SOLR-6353 -4212.patch) which applies after the SOLR-6686 commit.
          Hide
          Hoss Man added a comment -

          Updated patch wit ha few small changes, details mentioned below.

          Comments in no particular order...


          I reiterate my earlier comment about javadocs – it's really hard to review a patch this size with so many new methods that don't have any javadocs at all. Everytime I see a method mentioned in the new code, I look it up to jog my memory about what exactly it does and i wind up needing to re-read/skim each method over and over to remind myself what it does becaase they don't have any javadocs. (I can't imagine how how any developer will ever be able to make sense of any of this code down the road w/o more method level docs)

          On the flip side: PivotFacetHelper.mergeQueryCounts does have javadocs, but it doesn't look like they match what the method does – I'm pretty sure either the javadocs are wrong, or the method has a bug in it and isn't working as documented.


          I further refactored the code into into three separate classes to divide the responsibilities clearly.

          Cool ... This organization makes a lot more sense then previous patches.


          Fixed. I am not sure why that happened but once I refactored the code to do all the parsing and calculation in prepare, I could not reproduce the issue.

          Well, the best way to be sure it's fixed is to add a test showing that it works.

          I've updated your patch to add a request like this to DistributedFacetPivotLargeTest.

          In order to get it to work, I had to also modify the Map<String,RangeFacetRequest> created by FacetComponent.prepare to be a LinkedHashMap (see more comments on this below) because otherwise it was a huge pain in the ass to try and write code to do anything with teh response in a clean way.


          Though I do not agree about the user counting on the same order because this is not a list but a complex object in all our responses. But I digress.

          I don't really understand the basis for your comment.

          My argument is simple:

          • Given:
            • (top level) Range Facet reults have always been returned as a NamedList, in the same order as specified by the facet.range params that requested them.
            • This order is modeled in SolrJ's QueryResponse class as List<RangeFacet> getFacetRanges()
          • Therefore:
            • We should be consistent and return Range Facet results with the same predictible order when they are nested under pivots.
            • This is already true in the SolrJ modeling in your patch, which adds List<RangeFacet> getFacetRanges() to PivotField.

          If nothing else, having the RangeFacets returned in the same order for both the top level results and the per-pivot results is the only sane/simple way for users to compute ratios between the top level range counts and the per pivot value counts when dealing with many/variable ranges – otherwise they have to sort them themselves to corrolate.


          As I mentioned above regarding my test additions, I fixed the consistent ordering problem by changing the Map<String,RangeFacetRequest> created by FacetComponent.prepare to also be a LinkedHashMap. But I still don't understand the purpose of this being a Map (where the keys are the original values of the facet.range params, not to be confused with having a Map keyed off of the tags associated with the list of RangeFacetRequests that use that tag discussed below) at all.

          As far as i can tell all usages of this Map<String,RangeFacetRequest> either:

          • iterate over every entry in the Map and only look at the values.
          • iterate over every facet.range param string (in order), and use it to "get()" the RangeFacetRequest object that corrisponds with that param string.

          ...wouldn't both of those usages be much simplier if instead of this Map there was just a List<RangeFacetRequest> that could be iterated over when the caller cares about looping over all RangeFacetRequest instances?

          Is there some other purpose for having this Map<String,RangeFacetRequest> that i'm not understanding?

          (see below for comments about creating a reused Map<String,List<RangeFacetRequest>> for when dealing with tags ... that's one of the current "loop over all RangeFacetRequest" usages that seems wrong to me anyway)


          I think you should revive your earlier idea of having a "FacetContext" object to put in the request context and use it to track the RangeFacetRequests and FacetBase (FACET_QUERY) objects. (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?)

          Either way (with or w/o a new FacetContext class, with or w/o reusing FacetInfo) the code to access the Map/List of all RangeFacetRequests and FacetBase objects should really be encapsulated in some way – if nothing else to reduce the amount of casting when consumers have to pull them out of req.getContext(...), but also to help future proof us against attempts to access those structures before they've been initialized by FacetComponent.prepare. (ie: throw IllegalAccessException if !req.getContext().containsKey(WHATEVER_CONTEXT_KEY)

          example...

          class FacetComponent {
            // ...
            public static List<RangeFacetRequest> getAllRangeFacetRequestsFromContext(SolrQueryRequest req) {
              List<RangeFacetRequest> result = (List<RangeFacetRequest>) req.getContext().get(MAGIC_CONTEXT_KEY);
              if (null == result) {
                throw new IllegalAccessException("RangeFacetRequests can't be accessed from context they are initalized");
              }
              return result;
            }
            // etc...
          

          ...or better still, with the FacetContext container...

          class FacetContext {
            public static FacetContext getFacetContext(SolrQueryRequest req) {
              FacetContext result = (FacetContext) req.getContext().get(MAGIC_CONTEXT_KEY);
              if (null == result) {
                throw new IllegalAccessException("FacetContext can't be accessed before it's initalized in request context");
              }
              return result;
            }
            private final List<RangeFacetRequest> allRangeFacets; // init in constructor
            private final List<FacetBase> allQueryFacets; // init in constructor
            public List<RangeFacetRequest> getAllRangeFacetRequests() { ... }
            public List<FacetBase> getAllQueryFacets() { ... }
           
            // // (see next comment about PivotFacetProcessor.getTaggedRangeFacets)
            // private final Map<String,List<RangeFacetRequest>> taggedRangeFacets; // init in constructor
            // private final Map<String,List<FacetBase>> taggedQueryFacets; // init in constructor
            // public List<RangeFacetRequest> getRangeFacetRequestsForTag(String tag) { ... }
            // public List<FacetBase> getAllQueryFacetsForTag(String tag) { ... }
          }
          

          PivotFacetProcessor.getTaggedRangeFacets (and getTaggedQueryFacets) seem really inefficient to me...

          • unless i'm missreading these, every facet.pivot param is going to cause the the list(s) of all facet.range RangeFacetRequest objects (and all facet.query FacetBase objects) to be iterated over to say "do you match this tag"
            • in the case of facet.pivot refinement requests, that's going to mean a lot of looping over the same lists over and over.
          • why doesn't this follow the same pattern as PivotFacetProcessor.getTaggedStatsFields (and StatsInfo.getStatsFieldsByTag)?
            • when the RangeFacetRequest (or facet.query FacetBase objects) are constructed, why isn't there a Map<String,List<RangeFacetRequest>> (and Map<String,List<FacetBase>>) keyed off of the "tag" values returning each each RangeFacetRequest (and each FacetBase) built so that getTaggedRangeFacets and getTaggedQueryFacets can just be fast/simple lookups on these Maps instead of constantly looping over the same lists repeatedly?

          (this "tag->RangeFacetRequest" mapping could easily be accessed via the same FacetContext mentioned above, see commented out part of that example code)


          The way RangeFacetProcessor deals with FacetRangeOther seems really confusing and sketchy. It took me a while to understand why those Exceptions were "safe to ignore"

          Rather then rely on "expected exceptions" when looping over the List of FacetRanges, wouldn't the code be alot easier to understand if we changed FacetRange to look something like...

          /** null if this range doesn't corrispond to an {@link FacetRangeOther} */
          public final FacetRange other;
          // ...other existing vars...
          private FacetRange(FacetRangeOther other, String name, ...) {
            ...
          }
          public FacetRange(String name,...) {
            this(null, name, ...);
          }
          public FacetRange(FacetRangeOther other, ...) {
            this(other, other.name, ...);
          }
          

          ...And then simplified the response building RangeFacetProcessor to look something like...

          final NamedList<Object> res = new SimpleOrderedMap<>();
          final NamedList<Integer> counts = new NamedList<>();
          res.add("counts", counts); // built up below
          
          // explicitly return the gap.  compute this early so we are more
          // likely to catch parse errors before attempting math
          res.add("gap", rfr.getGapObj());
          
          // explicitly return the start and end so all the counts
          // (including before/after/between) are meaningful - even if mincount
          // has removed the neighboring ranges
          res.add("start", rfr.getStartObj());
          res.add("end", rfr.getEndObj());
          
          for (RangeFacetRequest.FacetRange range : rfr.getFacetRanges()) {
            final int count = rangeCount(rfr, range);
            if (null == range.getFacetRangeOther()) {
              // normal range, added to count depending on mincount conditions...
              if (count >= rfr.getMinCount()) {
                counts.add(range.name, count);
              }
            } else {
              // special 'other' count, added to higher level res
              res.add(range.name, count);
            }
          }
          
          return res;
          

          ?

          If I'm missing something, and there's a reason the current dual loops over all FacetRanges w/exception checking is better for some reason, then at a bare minimum those empty catch blocks need to make it a lot more clear why they are there and why the exceptions are there.


              // TODO: slight optimization would prevent double-parsing of any localParams
              Query qobj = QParser.getParser(parsed.facetValue, null, req).getQuery();
          

          I added a nocommit here to ensure we don't forget to file a jira to track this.

          It's not just an issue of localparams, but also the parsing & construction of the Query object – doing that once per request (just like the way you've changed the range faceting to do all the bucket calculation once) could give us some serious performance savings when facet.query params are hung of of many and/or large pivot trees. (should just mean a pretty trivial new subclass of FacetBase that also has a public Query getQuery() accessor)


          RangeFacetRequest implementation weirdness...

          • It's really weird that none of the public accessor methods (getStart, getEnd, getGap, getGapObj, etc...) have any javadocs, even though many of the protected variables they return do have jdocs (and the info in those jdocs seems pretty important to the method callers – like the distinction betwen end vs endObj).
          • The method signature & jdocs for createCalculator look dangerous – it says it will return a calculator based on the "rangeFacetRequest" argument, but it's a non static method and in a couple places (schemaField, facetOn) it ignores the "rangeFacetRequest" arg and uses properties of "this" (and in other places vice vesa) – suggesting either this method should be static so it doesn't accidently use "this", or it should be changed to not take any arguments and always refer to "this" (which seems to be the intention based on the place i see this method being called currently.)
          • There's a weird bit of "format and immediately reparse" logic in RangeFacetRequest + RangeEndpointCalculator that makes no sense to me...
            • RangeEndpointCalculator knows the type specific "end" after looping over the ranges in computeRanges()
            • RangeEndpointCalculator uses "formatValue(end)" to build "endStr"
            • RangeEndpointCalculator.getComputedEnd() returns "endStr"
            • RangeFacetRequest calls getComputedEnd() and then passes that String to RangeEndpointCalculator.getValue(String) to parse it back into a type specific object
            • why not just have a public T getComputedEnd() method return the original (type specific) "end" object the calculator already figured out?
          • likewise: the type specific start value...
            • RangeEndpointCalculator.computeRanges() already figures out final T start = getValue(rfr.getStart());
            • so why make RangeFacetRequest ask the calculator to reparse the original start string with calculator.getValue(start); ?
            • why not just have a public T getStart() method on the calculator to return that value it already parsed?
          • likewise: gap
            • should change RangeEndpointCalculator.getGap(String) to just getGap()
            • won't save us any parsing round trips like the start / end changes suggested above, but will keep the API consistent
            • for the same reason the old code had this comment...
              // explicitly return the gap.  compute this early so we are more 
              // likely to catch parse errors before attempting math
              

              ...RangeEndpointCalculator should call & remember the value of parseGap() early in it's constructor, and then getGap() can just be a trivial accessor method.


          PivotFacetProcessor.addPivotQueriesAndRanges concerns...

          • this method was doing a lot of null checking against the facetQueries and facetRanges Lists – even though there was never any possibility of those lists being null
          • (partly because of these null checks that always passed) things like SimpleFacets instances and RangeFacetProcessors were getting constructed for every Pivot value (even on refinement) even when there are no ranges or facet queries hanging off of a pivot.

          So in the updated patch I also cleaned this method up a bit to assert the lists are non null and keep the logicc minimal when they are empty. (had to try it myself to be certain i wans't missing anything)

          I also tightened up the error handling to be more specific about where the SyntaxErrors are coming from.

          Show
          Hoss Man added a comment - Updated patch wit ha few small changes, details mentioned below. Comments in no particular order... I reiterate my earlier comment about javadocs – it's really hard to review a patch this size with so many new methods that don't have any javadocs at all. Everytime I see a method mentioned in the new code, I look it up to jog my memory about what exactly it does and i wind up needing to re-read/skim each method over and over to remind myself what it does becaase they don't have any javadocs. (I can't imagine how how any developer will ever be able to make sense of any of this code down the road w/o more method level docs) On the flip side: PivotFacetHelper.mergeQueryCounts does have javadocs, but it doesn't look like they match what the method does – I'm pretty sure either the javadocs are wrong, or the method has a bug in it and isn't working as documented. I further refactored the code into into three separate classes to divide the responsibilities clearly. Cool ... This organization makes a lot more sense then previous patches. Fixed. I am not sure why that happened but once I refactored the code to do all the parsing and calculation in prepare, I could not reproduce the issue. Well, the best way to be sure it's fixed is to add a test showing that it works. I've updated your patch to add a request like this to DistributedFacetPivotLargeTest. In order to get it to work, I had to also modify the Map<String,RangeFacetRequest> created by FacetComponent.prepare to be a LinkedHashMap (see more comments on this below) because otherwise it was a huge pain in the ass to try and write code to do anything with teh response in a clean way. Though I do not agree about the user counting on the same order because this is not a list but a complex object in all our responses. But I digress. I don't really understand the basis for your comment. My argument is simple: Given: (top level) Range Facet reults have always been returned as a NamedList, in the same order as specified by the facet.range params that requested them. This order is modeled in SolrJ's QueryResponse class as List<RangeFacet> getFacetRanges() Therefore: We should be consistent and return Range Facet results with the same predictible order when they are nested under pivots. This is already true in the SolrJ modeling in your patch, which adds List<RangeFacet> getFacetRanges() to PivotField. If nothing else, having the RangeFacets returned in the same order for both the top level results and the per-pivot results is the only sane/simple way for users to compute ratios between the top level range counts and the per pivot value counts when dealing with many/variable ranges – otherwise they have to sort them themselves to corrolate. As I mentioned above regarding my test additions, I fixed the consistent ordering problem by changing the Map<String,RangeFacetRequest> created by FacetComponent.prepare to also be a LinkedHashMap. But I still don't understand the purpose of this being a Map (where the keys are the original values of the facet.range params, not to be confused with having a Map keyed off of the tags associated with the list of RangeFacetRequests that use that tag discussed below) at all. As far as i can tell all usages of this Map<String,RangeFacetRequest> either: iterate over every entry in the Map and only look at the values. iterate over every facet.range param string (in order), and use it to "get()" the RangeFacetRequest object that corrisponds with that param string. ...wouldn't both of those usages be much simplier if instead of this Map there was just a List<RangeFacetRequest> that could be iterated over when the caller cares about looping over all RangeFacetRequest instances? Is there some other purpose for having this Map<String,RangeFacetRequest> that i'm not understanding? (see below for comments about creating a reused Map<String,List<RangeFacetRequest>> for when dealing with tags ... that's one of the current "loop over all RangeFacetRequest" usages that seems wrong to me anyway) I think you should revive your earlier idea of having a "FacetContext" object to put in the request context and use it to track the RangeFacetRequests and FacetBase (FACET_QUERY) objects. (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?) Either way (with or w/o a new FacetContext class, with or w/o reusing FacetInfo) the code to access the Map/List of all RangeFacetRequests and FacetBase objects should really be encapsulated in some way – if nothing else to reduce the amount of casting when consumers have to pull them out of req.getContext(...), but also to help future proof us against attempts to access those structures before they've been initialized by FacetComponent.prepare. (ie: throw IllegalAccessException if !req.getContext().containsKey(WHATEVER_CONTEXT_KEY) example... class FacetComponent { // ... public static List<RangeFacetRequest> getAllRangeFacetRequestsFromContext(SolrQueryRequest req) { List<RangeFacetRequest> result = (List<RangeFacetRequest>) req.getContext().get(MAGIC_CONTEXT_KEY); if ( null == result) { throw new IllegalAccessException( "RangeFacetRequests can't be accessed from context they are initalized" ); } return result; } // etc... ...or better still, with the FacetContext container... class FacetContext { public static FacetContext getFacetContext(SolrQueryRequest req) { FacetContext result = (FacetContext) req.getContext().get(MAGIC_CONTEXT_KEY); if ( null == result) { throw new IllegalAccessException( "FacetContext can't be accessed before it's initalized in request context" ); } return result; } private final List<RangeFacetRequest> allRangeFacets; // init in constructor private final List<FacetBase> allQueryFacets; // init in constructor public List<RangeFacetRequest> getAllRangeFacetRequests() { ... } public List<FacetBase> getAllQueryFacets() { ... } // // (see next comment about PivotFacetProcessor.getTaggedRangeFacets) // private final Map< String ,List<RangeFacetRequest>> taggedRangeFacets; // init in constructor // private final Map< String ,List<FacetBase>> taggedQueryFacets; // init in constructor // public List<RangeFacetRequest> getRangeFacetRequestsForTag( String tag) { ... } // public List<FacetBase> getAllQueryFacetsForTag( String tag) { ... } } PivotFacetProcessor.getTaggedRangeFacets (and getTaggedQueryFacets) seem really inefficient to me... unless i'm missreading these, every facet.pivot param is going to cause the the list(s) of all facet.range RangeFacetRequest objects (and all facet.query FacetBase objects) to be iterated over to say "do you match this tag" in the case of facet.pivot refinement requests, that's going to mean a lot of looping over the same lists over and over. why doesn't this follow the same pattern as PivotFacetProcessor.getTaggedStatsFields (and StatsInfo.getStatsFieldsByTag)? when the RangeFacetRequest (or facet.query FacetBase objects) are constructed, why isn't there a Map<String,List<RangeFacetRequest>> (and Map<String,List<FacetBase>> ) keyed off of the "tag" values returning each each RangeFacetRequest (and each FacetBase) built so that getTaggedRangeFacets and getTaggedQueryFacets can just be fast/simple lookups on these Maps instead of constantly looping over the same lists repeatedly? (this "tag->RangeFacetRequest" mapping could easily be accessed via the same FacetContext mentioned above, see commented out part of that example code) The way RangeFacetProcessor deals with FacetRangeOther seems really confusing and sketchy. It took me a while to understand why those Exceptions were "safe to ignore" Rather then rely on "expected exceptions" when looping over the List of FacetRanges, wouldn't the code be alot easier to understand if we changed FacetRange to look something like... /** null if this range doesn't corrispond to an {@link FacetRangeOther} */ public final FacetRange other; // ...other existing vars... private FacetRange(FacetRangeOther other, String name, ...) { ... } public FacetRange( String name,...) { this ( null , name, ...); } public FacetRange(FacetRangeOther other, ...) { this (other, other.name, ...); } ...And then simplified the response building RangeFacetProcessor to look something like... final NamedList< Object > res = new SimpleOrderedMap<>(); final NamedList< Integer > counts = new NamedList<>(); res.add( "counts" , counts); // built up below // explicitly return the gap. compute this early so we are more // likely to catch parse errors before attempting math res.add( "gap" , rfr.getGapObj()); // explicitly return the start and end so all the counts // (including before/after/between) are meaningful - even if mincount // has removed the neighboring ranges res.add( "start" , rfr.getStartObj()); res.add( "end" , rfr.getEndObj()); for (RangeFacetRequest.FacetRange range : rfr.getFacetRanges()) { final int count = rangeCount(rfr, range); if ( null == range.getFacetRangeOther()) { // normal range, added to count depending on mincount conditions... if (count >= rfr.getMinCount()) { counts.add(range.name, count); } } else { // special 'other' count, added to higher level res res.add(range.name, count); } } return res; ? If I'm missing something, and there's a reason the current dual loops over all FacetRanges w/exception checking is better for some reason, then at a bare minimum those empty catch blocks need to make it a lot more clear why they are there and why the exceptions are there. // TODO: slight optimization would prevent double -parsing of any localParams Query qobj = QParser.getParser(parsed.facetValue, null , req).getQuery(); I added a nocommit here to ensure we don't forget to file a jira to track this. It's not just an issue of localparams, but also the parsing & construction of the Query object – doing that once per request (just like the way you've changed the range faceting to do all the bucket calculation once) could give us some serious performance savings when facet.query params are hung of of many and/or large pivot trees. (should just mean a pretty trivial new subclass of FacetBase that also has a public Query getQuery() accessor) RangeFacetRequest implementation weirdness... It's really weird that none of the public accessor methods (getStart, getEnd, getGap, getGapObj, etc...) have any javadocs, even though many of the protected variables they return do have jdocs (and the info in those jdocs seems pretty important to the method callers – like the distinction betwen end vs endObj). The method signature & jdocs for createCalculator look dangerous – it says it will return a calculator based on the "rangeFacetRequest" argument, but it's a non static method and in a couple places (schemaField, facetOn) it ignores the "rangeFacetRequest" arg and uses properties of "this" (and in other places vice vesa) – suggesting either this method should be static so it doesn't accidently use "this", or it should be changed to not take any arguments and always refer to "this" (which seems to be the intention based on the place i see this method being called currently.) There's a weird bit of "format and immediately reparse" logic in RangeFacetRequest + RangeEndpointCalculator that makes no sense to me... RangeEndpointCalculator knows the type specific "end" after looping over the ranges in computeRanges() RangeEndpointCalculator uses "formatValue(end)" to build "endStr" RangeEndpointCalculator.getComputedEnd() returns "endStr" RangeFacetRequest calls getComputedEnd() and then passes that String to RangeEndpointCalculator.getValue(String) to parse it back into a type specific object why not just have a public T getComputedEnd() method return the original (type specific) "end" object the calculator already figured out? likewise: the type specific start value... RangeEndpointCalculator.computeRanges() already figures out final T start = getValue(rfr.getStart()); so why make RangeFacetRequest ask the calculator to reparse the original start string with calculator.getValue(start); ? why not just have a public T getStart() method on the calculator to return that value it already parsed? likewise: gap should change RangeEndpointCalculator.getGap(String) to just getGap() won't save us any parsing round trips like the start / end changes suggested above, but will keep the API consistent for the same reason the old code had this comment... // explicitly return the gap. compute this early so we are more // likely to catch parse errors before attempting math ...RangeEndpointCalculator should call & remember the value of parseGap() early in it's constructor, and then getGap() can just be a trivial accessor method. PivotFacetProcessor.addPivotQueriesAndRanges concerns... this method was doing a lot of null checking against the facetQueries and facetRanges Lists – even though there was never any possibility of those lists being null (partly because of these null checks that always passed) things like SimpleFacets instances and RangeFacetProcessors were getting constructed for every Pivot value (even on refinement) even when there are no ranges or facet queries hanging off of a pivot. So in the updated patch I also cleaned this method up a bit to assert the lists are non null and keep the logicc minimal when they are empty. (had to try it myself to be certain i wans't missing anything) I also tightened up the error handling to be more specific about where the SyntaxErrors are coming from.
          Hide
          Shalin Shekhar Mangar added a comment -

          On the flip side: PivotFacetHelper.mergeQueryCounts does have javadocs, but it doesn't look like they match what the method does – I'm pretty sure either the javadocs are wrong, or the method has a bug in it and isn't working as documented.

          Fixed. This method was refactored and I forgot to update the javadocs.

          I've updated your patch to add a request like this to DistributedFacetPivotLargeTest.

          Thanks!

          ...wouldn't both of those usages be much simplier if instead of this Map there was just a List<RangeFacetRequest> that could be iterated over when the caller cares about looping over all RangeFacetRequest instances?

          I changed the Map to a List.

          I think you should revive your earlier idea of having a "FacetContext" object to put in the request context and use it to track the RangeFacetRequests and FacetBase (FACET_QUERY) objects.

          I have created a new FacetContext class which holds all parsed range facets and query facets and also has a map of tag vs list of parsed facets such that we can do quick lookups by tag.

          (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?)

          Yeah, that’s pretty much it. The current FacetInfo is exclusively used for merging distributed results which makes it a bit inflexible for our requirements.

          The way RangeFacetProcessor deals with FacetRangeOther seems really confusing and sketchy. It took me a while to understand why those Exceptions were "safe to ignore"

          I refactored as you suggested. This is much cleaner, thank you!

          I added a nocommit here to ensure we don't forget to file a jira to track this.
          It's not just an issue of localparams, but also the parsing & construction of the Query object – doing that once per request (just like the way you've changed the range faceting to do all the bucket calculation once) could give us some serious performance savings when facet.query params are hung of of many and/or large pivot trees. (should just mean a pretty trivial new subclass of FacetBase that also has a public Query getQuery() accessor)

          I opened SOLR-7753

          It's really weird that none of the public accessor methods (getStart, getEnd, getGap, getGapObj, etc...) have any javadocs, even though many of the protected variables they return do have jdocs (and the info in those jdocs seems pretty important to the method callers – like the distinction betwen end vs endObj).

          Done. Initially I had kept the fields as public final but later I changed them to private and added accessor methods. Didn’t remember to move the javadocs. I have added copious amounts of javadocs to each one of them now.

          The method signature & jdocs for createCalculator look dangerous

          You’re right. I changed it to a private method which operates on “this”.

          There's a weird bit of "format and immediately reparse" logic in RangeFacetRequest + RangeEndpointCalculator that makes no sense to me

          Fixed, as you suggested.

          PivotFacetProcessor.addPivotQueriesAndRanges concerns...
          this method was doing a lot of null checking against the facetQueries and facetRanges Lists – even though there was never any possibility of those lists being null
          (partly because of these null checks that always passed) things like SimpleFacets instances and RangeFacetProcessors were getting constructed for every Pivot value (even on refinement) even when there are no ranges or facet queries hanging off of a pivot.
          So in the updated patch I also cleaned this method up a bit to assert the lists are non null and keep the logicc minimal when they are empty. (had to try it myself to be certain i wans't missing anything)
          I also tightened up the error handling to be more specific about where the SyntaxErrors are coming from.

          Thank you, looks really good.

          I think this is ready. Please give it another look and let me know if it's good to go in.

          Show
          Shalin Shekhar Mangar added a comment - On the flip side: PivotFacetHelper.mergeQueryCounts does have javadocs, but it doesn't look like they match what the method does – I'm pretty sure either the javadocs are wrong, or the method has a bug in it and isn't working as documented. Fixed. This method was refactored and I forgot to update the javadocs. I've updated your patch to add a request like this to DistributedFacetPivotLargeTest. Thanks! ...wouldn't both of those usages be much simplier if instead of this Map there was just a List<RangeFacetRequest> that could be iterated over when the caller cares about looping over all RangeFacetRequest instances? I changed the Map to a List. I think you should revive your earlier idea of having a "FacetContext" object to put in the request context and use it to track the RangeFacetRequests and FacetBase (FACET_QUERY) objects. I have created a new FacetContext class which holds all parsed range facets and query facets and also has a map of tag vs list of parsed facets such that we can do quick lookups by tag. (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?) Yeah, that’s pretty much it. The current FacetInfo is exclusively used for merging distributed results which makes it a bit inflexible for our requirements. The way RangeFacetProcessor deals with FacetRangeOther seems really confusing and sketchy. It took me a while to understand why those Exceptions were "safe to ignore" I refactored as you suggested. This is much cleaner, thank you! I added a nocommit here to ensure we don't forget to file a jira to track this. It's not just an issue of localparams, but also the parsing & construction of the Query object – doing that once per request (just like the way you've changed the range faceting to do all the bucket calculation once) could give us some serious performance savings when facet.query params are hung of of many and/or large pivot trees. (should just mean a pretty trivial new subclass of FacetBase that also has a public Query getQuery() accessor) I opened SOLR-7753 It's really weird that none of the public accessor methods (getStart, getEnd, getGap, getGapObj, etc...) have any javadocs, even though many of the protected variables they return do have jdocs (and the info in those jdocs seems pretty important to the method callers – like the distinction betwen end vs endObj). Done. Initially I had kept the fields as public final but later I changed them to private and added accessor methods. Didn’t remember to move the javadocs. I have added copious amounts of javadocs to each one of them now. The method signature & jdocs for createCalculator look dangerous You’re right. I changed it to a private method which operates on “this”. There's a weird bit of "format and immediately reparse" logic in RangeFacetRequest + RangeEndpointCalculator that makes no sense to me Fixed, as you suggested. PivotFacetProcessor.addPivotQueriesAndRanges concerns... this method was doing a lot of null checking against the facetQueries and facetRanges Lists – even though there was never any possibility of those lists being null (partly because of these null checks that always passed) things like SimpleFacets instances and RangeFacetProcessors were getting constructed for every Pivot value (even on refinement) even when there are no ranges or facet queries hanging off of a pivot. So in the updated patch I also cleaned this method up a bit to assert the lists are non null and keep the logicc minimal when they are empty. (had to try it myself to be certain i wans't missing anything) I also tightened up the error handling to be more specific about where the SyntaxErrors are coming from. Thank you, looks really good. I think this is ready. Please give it another look and let me know if it's good to go in.
          Hide
          Shalin Shekhar Mangar added a comment -

          I found a test failure with the last patch. I'll upload a new patch shortly.

          Show
          Shalin Shekhar Mangar added a comment - I found a test failure with the last patch. I'll upload a new patch shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch passes all tests and precommit.

          The test failure that I saw earlier was in TestDistributedSearch where a facet request is made with the same facet range repeated twice. The FacetComponent deduplicates all facet parameters before execution but I was not doing that in the prepare method where all the range facets were being parsed. I fixed this by moving the deduplication step in prepare and putting the modified parameter object back in the request.

          Show
          Shalin Shekhar Mangar added a comment - This patch passes all tests and precommit. The test failure that I saw earlier was in TestDistributedSearch where a facet request is made with the same facet range repeated twice. The FacetComponent deduplicates all facet parameters before execution but I was not doing that in the prepare method where all the range facets were being parsed. I fixed this by moving the deduplication step in prepare and putting the modified parameter object back in the request.
          Hide
          Shalin Shekhar Mangar added a comment -

          Two changes in this patch:

          1. I moved the DocSet calculation down to the rangeCount method in RangeFacetProcessor but that lead to repeated calculation of the DocSet if any grouping or exclusions were involved. In this patch, the doc set is calculated once in RangeFacetProcessor.getFacetRangeCounts and passed down to the rangeCount method.
          2. I changed FacetContext.getFacetContext method to throw IllegalStateException instead of IllegalAccessException.
          Show
          Shalin Shekhar Mangar added a comment - Two changes in this patch: I moved the DocSet calculation down to the rangeCount method in RangeFacetProcessor but that lead to repeated calculation of the DocSet if any grouping or exclusions were involved. In this patch, the doc set is calculated once in RangeFacetProcessor.getFacetRangeCounts and passed down to the rangeCount method. I changed FacetContext.getFacetContext method to throw IllegalStateException instead of IllegalAccessException.
          Hide
          Hoss Man added a comment -

          In this patch, the doc set is calculated once in RangeFacetProcessor.getFacetRangeCounts and passed down to the rangeCount method.

          ...good catch.

          I changed FacetContext.getFacetContext method to throw IllegalStateException instead of IllegalAccessException.

          yeah, my bad – sorry about that.


          The current patch looks really solid to me, only one thing i think we definitely need to clean up (and a few less important nitpicks/concerns) ...

          I have created a new FacetContext class which holds all parsed range facets and query facets and also has a map of tag vs list of parsed facets such that we can do quick lookups by tag.

          (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?)

          Yeah, that’s pretty much it. The current FacetInfo is exclusively used for merging distributed results which makes it a bit inflexible for our requirements.

          ...can you please add some specifics to the FacetContext and FacetInfo class jdocs that point to eachother and explain/compare/contrast the differences (ie: explain when/why each is used so people trying to make sense of various bits of code get the distinction)


          Nit-Picks...

          • looking at how FacetContext.setFacetContext is used, i'm wondering if a cleaner API would be to make the FacetContext constructor private, and replace setFacetContext with something like...
            public static void initFacetContext(ResponseBuilder rb) {
              if (rb.getContext().containsKey(MAGIC_KEY)) {
                throw new IllegalStateException("Context must only be initialized once per request...");
              }
              // ...Parse params into things like RangeFacetRequest & build Lists & Maps for context
            }
            

            ...that seems like it would be safer and guard against agcidental missuse of redundent "new FacetContext" or "FacetContext.setFacetContext(...)" by devs in the future

          • with the new FacetContext object, PivotFacetProcessor.getTaggedQueryFacets and PivotFacetProcessor.getTaggedRangeFacets seem fairly unneccessary – if we just move the "if null use emptyList()" logic into the FacetContext equivilents can't we just refactor them into oblivion?
          Show
          Hoss Man added a comment - In this patch, the doc set is calculated once in RangeFacetProcessor.getFacetRangeCounts and passed down to the rangeCount method. ...good catch. I changed FacetContext.getFacetContext method to throw IllegalStateException instead of IllegalAccessException. yeah, my bad – sorry about that. The current patch looks really solid to me, only one thing i think we definitely need to clean up (and a few less important nitpicks/concerns) ... I have created a new FacetContext class which holds all parsed range facets and query facets and also has a map of tag vs list of parsed facets such that we can do quick lookups by tag. (Either that, or just use the existing FacetInfo class for this since that's pretty much it's current purpose. Is the reason you didn't already do that because FacetInfo isn't currently in the single node code path?) Yeah, that’s pretty much it. The current FacetInfo is exclusively used for merging distributed results which makes it a bit inflexible for our requirements. ...can you please add some specifics to the FacetContext and FacetInfo class jdocs that point to eachother and explain/compare/contrast the differences (ie: explain when/why each is used so people trying to make sense of various bits of code get the distinction) Nit-Picks... looking at how FacetContext.setFacetContext is used, i'm wondering if a cleaner API would be to make the FacetContext constructor private, and replace setFacetContext with something like... public static void initFacetContext(ResponseBuilder rb) { if (rb.getContext().containsKey(MAGIC_KEY)) { throw new IllegalStateException( "Context must only be initialized once per request..." ); } // ...Parse params into things like RangeFacetRequest & build Lists & Maps for context } ...that seems like it would be safer and guard against agcidental missuse of redundent "new FacetContext" or "FacetContext.setFacetContext(...)" by devs in the future with the new FacetContext object, PivotFacetProcessor.getTaggedQueryFacets and PivotFacetProcessor.getTaggedRangeFacets seem fairly unneccessary – if we just move the "if null use emptyList()" logic into the FacetContext equivilents can't we just refactor them into oblivion?
          Hide
          Shalin Shekhar Mangar added a comment -

          ...can you please add some specifics to the FacetContext and FacetInfo class jdocs that point to eachother and explain/compare/contrast the differences (ie: explain when/why each is used so people trying to make sense of various bits of code get the distinction)

          Yeah, absolutely. I added the following:

          /**
          * Encapsulates facet ranges and facet queries such that their parameters
          * are parsed and cached for efficient re-use.
          * <p/>
          * An instance of this class is initialized and kept in the request context via the static
          * method {@link org.apache.solr.handler.component.FacetComponent.FacetContext#initContext(ResponseBuilder)} and
          * can be retrieved via {@link org.apache.solr.handler.component.FacetComponent.FacetContext#getFacetContext(SolrQueryRequest)}
          * <p/>
          * This class is used exclusively in a single-node context (i.e. non distributed requests or an individual shard
          * request). Also see {@link org.apache.solr.handler.component.FacetComponent.FacetInfo} which is
          * dedicated exclusively for merging responses from multiple shards and plays no role during computation of facet
          * counts in a single node request.
          *
          * <b>This API is experimental and subject to change</b>
          *
          * @see org.apache.solr.handler.component.FacetComponent.FacetInfo
          */
          public static class FacetContext {
          ...
          }
          
          /**
          * This class is used exclusively for merging results from each shard
          * in a distributed facet request. It plays no role in the computation
          * of facet counts inside a single node.
          *
          * A related class {@link org.apache.solr.handler.component.FacetComponent.FacetContext}
          * exists for assisting computation inside a single node.
          *
          * <b>This API is experimental and subject to change</b>
          *
          * @see org.apache.solr.handler.component.FacetComponent.FacetContext
          */
          public static class FacetInfo {
          ...
          }
          

          Looking at how FacetContext.setFacetContext is used, i'm wondering if a cleaner API would be to make the FacetContext constructor private, and replace setFacetContext with something like...

          Good idea. This patch has:

          /**
          * Initializes FacetContext using request parameters and saves it in the request
          * context which can be retrieved via {@link #getFacetContext(SolrQueryRequest)}
          *
          * @param rb the ResponseBuilder object from which the request parameters are read
          *           and to which the FacetContext object is saved.
          */
          public static void initContext(ResponseBuilder rb)  {
          ...
          }
          
          private FacetContext(List<RangeFacetRequest> allRangeFacets, List<FacetBase> allQueryFacets) {
          ...
          }
          
          /**
          * Return the {@link org.apache.solr.handler.component.FacetComponent.FacetContext} instance
          * cached in the request context.
          *
          * @param req the {@link SolrQueryRequest}
          * @return the cached FacetContext instance
          * @throws IllegalAccessException if no cached FacetContext instance is found in the request context
          */
          public static FacetContext getFacetContext(SolrQueryRequest req) throws IllegalStateException {
          ...
          }
          

          with the new FacetContext object, PivotFacetProcessor.getTaggedQueryFacets and PivotFacetProcessor.getTaggedRangeFacets seem fairly unneccessary – if we just move the "if null use emptyList()" logic into the FacetContext equivilents can't we just refactor them into oblivion?

          Done. The null check is moved inside FacetContext methods and the methods are inlined into PivotFacetProcessor.process().

          I'll commit this tomorrow morning my time unless there are any objections.

          Show
          Shalin Shekhar Mangar added a comment - ...can you please add some specifics to the FacetContext and FacetInfo class jdocs that point to eachother and explain/compare/contrast the differences (ie: explain when/why each is used so people trying to make sense of various bits of code get the distinction) Yeah, absolutely. I added the following: /** * Encapsulates facet ranges and facet queries such that their parameters * are parsed and cached for efficient re-use. * <p/> * An instance of this class is initialized and kept in the request context via the static * method {@link org.apache.solr.handler.component.FacetComponent.FacetContext#initContext(ResponseBuilder)} and * can be retrieved via {@link org.apache.solr.handler.component.FacetComponent.FacetContext#getFacetContext(SolrQueryRequest)} * <p/> * This class is used exclusively in a single-node context (i.e. non distributed requests or an individual shard * request). Also see {@link org.apache.solr.handler.component.FacetComponent.FacetInfo} which is * dedicated exclusively for merging responses from multiple shards and plays no role during computation of facet * counts in a single node request. * * <b>This API is experimental and subject to change</b> * * @see org.apache.solr.handler.component.FacetComponent.FacetInfo */ public static class FacetContext { ... } /** * This class is used exclusively for merging results from each shard * in a distributed facet request. It plays no role in the computation * of facet counts inside a single node. * * A related class {@link org.apache.solr.handler.component.FacetComponent.FacetContext} * exists for assisting computation inside a single node. * * <b>This API is experimental and subject to change</b> * * @see org.apache.solr.handler.component.FacetComponent.FacetContext */ public static class FacetInfo { ... } Looking at how FacetContext.setFacetContext is used, i'm wondering if a cleaner API would be to make the FacetContext constructor private, and replace setFacetContext with something like... Good idea. This patch has: /** * Initializes FacetContext using request parameters and saves it in the request * context which can be retrieved via {@link #getFacetContext(SolrQueryRequest)} * * @param rb the ResponseBuilder object from which the request parameters are read * and to which the FacetContext object is saved. */ public static void initContext(ResponseBuilder rb) { ... } private FacetContext(List<RangeFacetRequest> allRangeFacets, List<FacetBase> allQueryFacets) { ... } /** * Return the {@link org.apache.solr.handler.component.FacetComponent.FacetContext} instance * cached in the request context. * * @param req the {@link SolrQueryRequest} * @ return the cached FacetContext instance * @ throws IllegalAccessException if no cached FacetContext instance is found in the request context */ public static FacetContext getFacetContext(SolrQueryRequest req) throws IllegalStateException { ... } with the new FacetContext object, PivotFacetProcessor.getTaggedQueryFacets and PivotFacetProcessor.getTaggedRangeFacets seem fairly unneccessary – if we just move the "if null use emptyList()" logic into the FacetContext equivilents can't we just refactor them into oblivion? Done. The null check is moved inside FacetContext methods and the methods are inlined into PivotFacetProcessor.process(). I'll commit this tomorrow morning my time unless there are any objections.
          Hide
          Hoss Man added a comment -

          I'll commit this tomorrow morning my time unless there are any objections.

          +1

          Show
          Hoss Man added a comment - I'll commit this tomorrow morning my time unless there are any objections. +1
          Hide
          ASF subversion and git services added a comment -

          Commit 1689802 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1689802 ]

          SOLR-4212: SOLR-6353: Let facet queries and facet ranges hang off of pivots

          Show
          ASF subversion and git services added a comment - Commit 1689802 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1689802 ] SOLR-4212 : SOLR-6353 : Let facet queries and facet ranges hang off of pivots
          Hide
          ASF subversion and git services added a comment -

          Commit 1689839 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1689839 ]

          SOLR-4212: SOLR-6353: Let facet queries and facet ranges hang off of pivots

          Show
          ASF subversion and git services added a comment - Commit 1689839 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689839 ] SOLR-4212 : SOLR-6353 : Let facet queries and facet ranges hang off of pivots
          Hide
          ASF subversion and git services added a comment -

          Commit 1689840 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1689840 ]

          SOLR-4212: SOLR-6353: Added attribution in changes.txt

          Show
          ASF subversion and git services added a comment - Commit 1689840 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1689840 ] SOLR-4212 : SOLR-6353 : Added attribution in changes.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1689841 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1689841 ]

          SOLR-4212: SOLR-6353: Added attribution in changes.txt

          Show
          ASF subversion and git services added a comment - Commit 1689841 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689841 ] SOLR-4212 : SOLR-6353 : Added attribution in changes.txt
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Steve and Hoss!

          Show
          Shalin Shekhar Mangar added a comment - Thanks Steve and Hoss!
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Steve Molloy
            • Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development