Show
added a comment -
I still haven't had a chance to really dig into the implementation details of the patch, but i wanted to spend some time testing things out from a user perspective...
One of the first things i noticed, is that the refinement requests seem to be extra verbose. For example, given this user request (using the example data, with a 2 shard cloud setup):
http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,inStock&facet.limit=3&facet.pivot=manu_id_s,inStock
This is what the refinement requests in the logs of each shard looked like...
3434041 [qtp1282186295-19] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={manu_id_s,inStock_8__terms=samsung&facet=true&sort=id+desc&facet.limit=3&manu_id_s,inStock_9__terms=viewsonic&distrib=false&cat,inStock_1__terms=search&wt=javabin&version=2&rows=0&manu_id_s,inStock_6__terms=maxtor&manu_id_s,inStock_7__terms=nor&NOW=1399584452682&shard.url=http://127.0.1.1:8983/solr/collection1/&df=text&cat,inStock_2__terms=software&q=*:*&manu_id_s,inStock_3__terms=canon&manu_id_s,inStock_4__terms=ati&facet.pivot.mincount=-1&isShard=true&cat,inStock_0__terms=hard+drive&facet.pivot={!terms%3D$cat,inStock_0__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_1__terms}cat,inStock&facet.pivot={!terms%3D$cat,inStock_2__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_3__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_4__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_5__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_6__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_7__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_8__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_9__terms}manu_id_s,inStock&manu_id_s,inStock_5__terms=eu} hits=14 status=0 QTime=3
3424918 [qtp1282186295-16] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={cat,inStock_10__terms=memory&manu_id_s,inStock_15__terms=dell&facet=true&manu_id_s,inStock_12__terms=apple&sort=id+desc&facet.limit=3&manu_id_s,inStock_13__terms=asus&manu_id_s,inStock_16__terms=uk&distrib=false&wt=javabin&manu_id_s,inStock_14__terms=boa&version=2&rows=0&NOW=1399584452682&shard.url=http://127.0.1.1:7574/solr/collection1/&df=text&manu_id_s,inStock_11__terms=corsair&q=*:*&facet.pivot.mincount=-1&isShard=true&facet.pivot={!terms%3D$cat,inStock_10__terms}cat,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_11__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_12__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_13__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_14__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_15__terms}manu_id_s,inStock&facet.pivot={!terms%3D$manu_id_s,inStock_16__terms}manu_id_s,inStock} hits=18 status=0 QTime=2
Or if we prune that down to just the interesting params (as far as pivot faceting goes)...
shard1
facet.pivot.mincount=-1
cat,inStock_0__terms=hard+drive
cat,inStock_1__terms=search
cat,inStock_2__terms=software
manu_id_s,inStock_3__terms=canon
manu_id_s,inStock_4__terms=ati
manu_id_s,inStock_5__terms=eu
manu_id_s,inStock_6__terms=maxtor
manu_id_s,inStock_7__terms=nor
manu_id_s,inStock_8__terms=samsung
manu_id_s,inStock_9__terms=viewsonic
facet.pivot={!terms=$cat,inStock_0__terms}cat,inStock
facet.pivot={!terms=$cat,inStock_1__terms}cat,inStock
facet.pivot={!terms=$cat,inStock_2__terms}cat,inStock
facet.pivot={!terms=$manu_id_s,inStock_3__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_4__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_5__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_6__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_7__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_8__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_9__terms}manu_id_s,inStock
shard2
facet.pivot.mincount=-1
cat,inStock_10__terms=memory
manu_id_s,inStock_11__terms=corsair
manu_id_s,inStock_12__terms=apple
manu_id_s,inStock_13__terms=asus
manu_id_s,inStock_14__terms=boa
manu_id_s,inStock_15__terms=dell
manu_id_s,inStock_16__terms=uk
facet.pivot={!terms=$cat,inStock_10__terms}cat,inStock
facet.pivot={!terms=$manu_id_s,inStock_11__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_12__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_13__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_14__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_15__terms}manu_id_s,inStock
facet.pivot={!terms=$manu_id_s,inStock_16__terms}manu_id_s,inStock
I believe that what's going on here is basically:
top level params are being used for the individual terms that need refined (which is smart, helps eliminate risk of terms needing special escaping with local params)
the top level param names for these terms that need refined use a per-(user)request "global" counter to ensure that they are unique (+1)
the top level term param names also include the facet.pivot spec they are needed for – this seems redundant since the counter is clearly global (even across multiple "facet.pivot" specs)
these top leve term param names are then added only to the shard requests where refinement is actually needed for those terms (+1) and are referenced as variables in facet.pivot commands using the "terms" local param (which the shards evidently look for to know when this is a refinement request)
because many terms may need refinement, that means each user specified facet.pivot=X,Y param results in many shard params of facet.pivot={!terms=$N}X,Y
I realize that local params don't play nice with multi-valued params at all, let alone make it easy to use a single variable to refer to a multi-valued param – But wouldn't it be simpler (and less verbose over the wire) to just ignore Solr's built in param variable derefrencing and instead generate 1 unique param name to use for all the terms we care about (for each unique pivot spec), and then refer to that name once in a local param for a single facet.pivot param (which the pivot facet could would then go and explicitly fetch from the top level SolrParams as a multi-value)
The result being, that instead of the refinement requests shows above, the refinement requests for each shard could be something much simpler like...
shard1_proposed
facet.pivot.mincount=-1
_fpt_1=hard+drive
_fpt_1=search
_fpt_1=software
_fpt_2=canon
_fpt_2=ati
_fpt_2=eu
_fpt_2=maxtor
_fpt_2=nor
_fpt_2=samsung
_fpt_2=viewsonic
facet.pivot={!fpt=1}cat,inStock
facet.pivot={!fpt=2}manu_id_s,inStock
shard2_proposed
facet.pivot.mincount=-1
_fpt_1=memory
_fpt_2=corsair
_fpt_2=apple
_fpt_2=asus
_fpt_2=boa
_fpt_2=dell
_fpt_2=uk
facet.pivot={!fpt=1}cat,inStock
facet.pivot={!fpt=2}manu_id_s,inStock
(where _fpt_ is just a short prefix for "facet pivot terms" that i pulled out of my ass)
Another thing I noticed is that with my 2 shard exampledocs setup, the following URL seems to send the pivot faceting into an infinite loop of refinement requests (note the typo: there's a space embeded in a field name manu_id_s != manu_+id_s )...
http://localhost:8983/solr/select?q=*:*&sort=id+desc&rows=2&facet=true&facet.pivot=cat,manu_+id_s,inStock&facet.limit=3
...not clear what's going on there, but definitely something that needs fixed before committing ("garbage in -> garbage out" is one thing, "garbage in -> crash your cluster" is another)
the multi-level refinement is sooooooo sweet.
Based on
SOLR-792, it looked like there was some traction in getting distributed pivoting in the trunk codebase beyond the functional prototype. This feature has a lot of value within my company where we perform 50 separate queries where one would suffice if we had distributed pivot support.