Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: faceting
    • Labels:

      Description

      For a sample inventory(note - nested documents) like this -
      <doc>
      <field name="id">10</field>
      <field name="type_s">parent</field>
      <field name="BRAND_s">Nike</field>
      <doc>
      <field name="id">11</field>
      <field name="COLOR_s">Red</field>
      <field name="SIZE_s">XL</field>
      </doc>
      <doc>
      <field name="id">12</field>
      <field name="COLOR_s">Blue</field>
      <field name="SIZE_s">XL</field>
      </doc>
      </doc>

      Faceting results must contain -
      Red(1)
      XL(1)
      Blue(1)

      for a "q=*" query.

      PS : The inventory example has been taken from this blog - http://blog.griddynamics.com/2013/09/solr-block-join-support.html

      1. cluster.jpg
        61 kB
        Vijay Sekhri
      2. service_baseline.png
        35 kB
        Vijay Sekhri
      3. service_new_baseline.jpg
        54 kB
        Vijay Sekhri
      4. solr_baseline.jpg
        192 kB
        Vijay Sekhri
      5. solr_new_baseline.jpg
        191 kB
        Vijay Sekhri
      6. SOLR-5743.patch
        79 kB
        Mikhail Khludnev
      7. SOLR-5743.patch
        78 kB
        Mikhail Khludnev
      8. SOLR-5743.patch
        78 kB
        Mikhail Khludnev
      9. SOLR-5743.patch
        77 kB
        Dr Oleg Savrasov
      10. SOLR-5743.patch
        75 kB
        Mikhail Khludnev
      11. SOLR-5743.patch
        76 kB
        Mikhail Khludnev
      12. SOLR-5743.patch
        75 kB
        Mikhail Khludnev
      13. SOLR-5743.patch
        62 kB
        Dr Oleg Savrasov
      14. SOLR-5743.patch
        59 kB
        Dr Oleg Savrasov
      15. SOLR-5743.patch
        59 kB
        Dr Oleg Savrasov
      16. SOLR-5743.patch
        47 kB
        Dr Oleg Savrasov
      17. SOLR-5743.patch
        38 kB
        Dr Oleg Savrasov
      18. SOLR-5743.patch
        37 kB
        Dr Oleg Savrasov
      19. SOLR-5743.patch
        36 kB
        Dr Oleg Savrasov
      20. SOLR-5743.patch
        73 kB
        Dr Oleg Savrasov

        Activity

        Hide
        Dr Oleg Savrasov added a comment -

        I'm preparing Lucene Revolution talk http://lucenerevolution.uservoice.com/forums/254256-internals-track/suggestions/5995621-faceting-with-lucene-blockjoinquery which addresses the feature. Your votes would be much appreciated.

        Show
        Dr Oleg Savrasov added a comment - I'm preparing Lucene Revolution talk http://lucenerevolution.uservoice.com/forums/254256-internals-track/suggestions/5995621-faceting-with-lucene-blockjoinquery which addresses the feature. Your votes would be much appreciated.
        Hide
        Dr Oleg Savrasov added a comment -

        Initial implementation which meets functionality requirements. There is new BlockJoinFacetComponent which expects ToParentBlockJoinQuery in search request. Facets are calculated for fields defined by child.facet.field parameter. Only DocValues fields are supported.

        Show
        Dr Oleg Savrasov added a comment - Initial implementation which meets functionality requirements. There is new BlockJoinFacetComponent which expects ToParentBlockJoinQuery in search request. Facets are calculated for fields defined by child.facet.field parameter. Only DocValues fields are supported.
        Hide
        ash fo added a comment -

        There are two xml files you are tweaking in this patch that do not even exist in the source:

        solr/core/src/test-files/solr/collection1/conf/schema-blockjoinfacetcomponent.xml
        solr/core/src/test-files/solr/collection1/conf/solrconfig-blockjoinfacetcomponent.xml

        Could you please explain where I find those files? Patch is modifying them not adding the whole file. When I apply the patch it skips those files basically.

        Thank you

        Show
        ash fo added a comment - There are two xml files you are tweaking in this patch that do not even exist in the source: solr/core/src/test-files/solr/collection1/conf/schema-blockjoinfacetcomponent.xml solr/core/src/test-files/solr/collection1/conf/solrconfig-blockjoinfacetcomponent.xml Could you please explain where I find those files? Patch is modifying them not adding the whole file. When I apply the patch it skips those files basically. Thank you
        Hide
        Dr Oleg Savrasov added a comment -

        I created the files by copying and modifying existing configurations. It looks like my IDE processed changes incorrectly. Sorry about that. Please find updated patch attached. Should you have any issues, please notice me.

        Show
        Dr Oleg Savrasov added a comment - I created the files by copying and modifying existing configurations. It looks like my IDE processed changes incorrectly. Sorry about that. Please find updated patch attached. Should you have any issues, please notice me.
        Hide
        ash fo added a comment -

        Thanks, I applied the patch but still passing "child.facet.field=xxxxxxx" doesn't do anything. Here is my query:

        http://localhost:8080/solr/nested_collecion2/select?q=*%3A*&fq=content_type%3AparentDocument&fl=id&wt=json&indent=true&facet=true&child.facet.field=retid

        And this is what I get back, basically Solr doesn't know the 'child.facet.field' parameter:

        {
        "responseHeader":{
        "status":0,
        "QTime":1,
        "params":{
        "facet":"true",
        "fl":"id",
        "indent":"true",
        "q":":",
        "child.facet.field":"retid",
        "wt":"json",
        "fq":"content_type:parentDocument"}},
        "response":{"numFound":998,"start":0,"docs":[

        { "id":"1554855923"}

        ,

        { "id":"1556730933"}

        ,

        { "id":"1437257890"}

        ,

        { "id":"1463296684"}

        ,

        { "id":"1143793641"}

        ,

        { "id":"1168208507"}

        ,

        { "id":"1201399772"}

        ,

        { "id":"1162769709"}

        ,

        { "id":"1199906811"}

        ,

        { "id":"1296203203"}

        ]
        },
        "facet_counts":{
        "facet_queries":{},
        "facet_fields":{},
        "facet_dates":{},
        "facet_ranges":{},
        "facet_intervals":{}}}

        the retid field has the docValues="true" too.

        <field name="retid" type="int" indexed="true" stored="true" docValues="true"/>

        Is there anything else needs to be done?

        Thanks

        Show
        ash fo added a comment - Thanks, I applied the patch but still passing "child.facet.field=xxxxxxx" doesn't do anything. Here is my query: http://localhost:8080/solr/nested_collecion2/select?q=*%3A*&fq=content_type%3AparentDocument&fl=id&wt=json&indent=true&facet=true&child.facet.field=retid And this is what I get back, basically Solr doesn't know the 'child.facet.field' parameter: { "responseHeader":{ "status":0, "QTime":1, "params":{ "facet":"true", "fl":"id", "indent":"true", "q":" : ", "child.facet.field":"retid", "wt":"json", "fq":"content_type:parentDocument"}}, "response":{"numFound":998,"start":0,"docs":[ { "id":"1554855923"} , { "id":"1556730933"} , { "id":"1437257890"} , { "id":"1463296684"} , { "id":"1143793641"} , { "id":"1168208507"} , { "id":"1201399772"} , { "id":"1162769709"} , { "id":"1199906811"} , { "id":"1296203203"} ] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}}} the retid field has the docValues="true" too. <field name="retid" type="int" indexed="true" stored="true" docValues="true"/> Is there anything else needs to be done? Thanks
        Hide
        Dr Oleg Savrasov added a comment -

        In order to utilize proposed component, you need to configure it in solrconfig.xml and introduce some search handler which uses it, for example

        <searchComponent name="blockJoinFacet" class="org.apache.solr.handler.component.BlockJoinFacetComponent">

        </searchComponent>

        <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler">
        <arr name="last-components">
        <str>blockJoinFacet</str>
        </arr>
        </requestHandler>

        Please notice that only string docValues fields could be used for faceting, int type can be covered later, so you need to update appropriate fields configuration in schema.xml file, for example

        <field name="COLOR_s" type="string" indexed="true" stored="true" docValues="true"/>
        <field name="SIZE_s" type="string" indexed="true" stored="true" docValues="true"/>

        Then after indexing some set of hierarchical documents like

        <doc>
        <field name="id">10</field>
        <field name="type_s">parent</field>
        <field name="BRAND_s">Nike</field>
        <doc>
        <field name="id">11</field>
        <field name="type_s">child</field>
        <field name="COLOR_s">Red</field>
        <field name="SIZE_s">XL</field>
        </doc>
        <doc>
        <field name="id">12</field>
        <field name="type_s">child</field>
        <field name="COLOR_s">Blue</field>
        <field name="SIZE_s">XL</field>
        </doc>
        </doc>

        you need to pass required ToParentBlockJoinQuery to the configured request handler, for example

        http://localhost:8983/solr/collection1/blockJoinFacetRH?q=

        {!parent+which%3D%22type_s%3Aparent%22}

        type_s%3Achild&wt=json&indent=true&facet=true&child.facet.field=COLOR_s&child.facet.field=SIZE_s

        and it yields you the desired result

        {
        "responseHeader":

        { "status":0, "QTime":1}

        ,
        "response":{"numFound":1,"start":0,"docs":[

        { "id":"10", "type_s":"parent", "BRAND_s":"Nike", "_version_":1491642108914696192}

        ]
        },
        "facet_counts":{
        "facet_queries":{},
        "facet_fields":{},
        "facet_dates":{},
        "facet_ranges":{},
        "facet_intervals":{},
        "facet_fields":[
        "COLOR_s",[
        "Blue",1,
        "Red",1],
        "SIZE_s",[
        "XL",1]]}}

        Please take the latest patch, it contains fix related to just found caching issue.

        Show
        Dr Oleg Savrasov added a comment - In order to utilize proposed component, you need to configure it in solrconfig.xml and introduce some search handler which uses it, for example <searchComponent name="blockJoinFacet" class="org.apache.solr.handler.component.BlockJoinFacetComponent"> </searchComponent> <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler"> <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler> Please notice that only string docValues fields could be used for faceting, int type can be covered later, so you need to update appropriate fields configuration in schema.xml file, for example <field name="COLOR_s" type="string" indexed="true" stored="true" docValues="true"/> <field name="SIZE_s" type="string" indexed="true" stored="true" docValues="true"/> Then after indexing some set of hierarchical documents like <doc> <field name="id">10</field> <field name="type_s">parent</field> <field name="BRAND_s">Nike</field> <doc> <field name="id">11</field> <field name="type_s">child</field> <field name="COLOR_s">Red</field> <field name="SIZE_s">XL</field> </doc> <doc> <field name="id">12</field> <field name="type_s">child</field> <field name="COLOR_s">Blue</field> <field name="SIZE_s">XL</field> </doc> </doc> you need to pass required ToParentBlockJoinQuery to the configured request handler, for example http://localhost:8983/solr/collection1/blockJoinFacetRH?q= {!parent+which%3D%22type_s%3Aparent%22} type_s%3Achild&wt=json&indent=true&facet=true&child.facet.field=COLOR_s&child.facet.field=SIZE_s and it yields you the desired result { "responseHeader": { "status":0, "QTime":1} , "response":{"numFound":1,"start":0,"docs":[ { "id":"10", "type_s":"parent", "BRAND_s":"Nike", "_version_":1491642108914696192} ] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_fields":[ "COLOR_s",[ "Blue",1, "Red",1], "SIZE_s",[ "XL",1]]}} Please take the latest patch, it contains fix related to just found caching issue.
        Hide
        ash fo added a comment -

        Thank you, finally I got it working. Is it possible to include the integer and float fields in this patch as well? Two of my child fields are integer and float (retailer id and price) and I need to facet on them too.

        Show
        ash fo added a comment - Thank you, finally I got it working. Is it possible to include the integer and float fields in this patch as well? Two of my child fields are integer and float (retailer id and price) and I need to facet on them too.
        Hide
        Dr Oleg Savrasov added a comment -

        After investigating it, I've found that float and int types work fine for multivalued fields, i.e. they should be configured like

        <field name="RETAILER_ID" type="int" indexed="true" stored="true" docValues="true" multiValued="true"/>
        <field name="PRICE" type="float" indexed="true" stored="true" docValues="true" multiValued="true"/>

        Unit test in the patch is extended to cover int and float types.
        I'll try to find out if it's possible to make it working for multiValued="false".

        Show
        Dr Oleg Savrasov added a comment - After investigating it, I've found that float and int types work fine for multivalued fields, i.e. they should be configured like <field name="RETAILER_ID" type="int" indexed="true" stored="true" docValues="true" multiValued="true"/> <field name="PRICE" type="float" indexed="true" stored="true" docValues="true" multiValued="true"/> Unit test in the patch is extended to cover int and float types. I'll try to find out if it's possible to make it working for multiValued="false".
        Hide
        ash fo added a comment -

        Thank you.

        Could you please also include the 'child.facet.query'? A lot of times people want to know how many offers for example are in a specific price range, something like this:

        &child.facet.query=price :[1 TO 100]

        Show
        ash fo added a comment - Thank you. Could you please also include the 'child.facet.query'? A lot of times people want to know how many offers for example are in a specific price range, something like this: &child.facet.query=price : [1 TO 100]
        Hide
        ash fo added a comment -

        It seems that the patch isn't working with Solr cloud. When I have a single instance it works, but in cloud with multiple nodes and shards it just doesn't work. Is there a way to have this working with multiple nodes/shards? Thank you.

        Show
        ash fo added a comment - It seems that the patch isn't working with Solr cloud. When I have a single instance it works, but in cloud with multiple nodes and shards it just doesn't work. Is there a way to have this working with multiple nodes/shards? Thank you.
        Hide
        Dr Oleg Savrasov added a comment -

        Solr Cloud support has been implemented

        Show
        Dr Oleg Savrasov added a comment - Solr Cloud support has been implemented
        Hide
        Dr Oleg Savrasov added a comment -

        Please checkout the latest patch. Solr Cloud support has been implemented here. Please notice that in order to make it working you should make some configuration changes.

        1. If you have /select SearchHandler definition, you should add blockJoinFacet as a last component here, like

        <requestHandler name="/select" class="solr.SearchHandler">
        .....
        <arr name="last-components">
        <str>blockJoinFacet</str>
        </arr>
        </requestHandler>

        2. If you don't have /select SearchHandler definition, you should configure your custom BlockJoinFacet search handler with shards.qt parameter, which should reference on search handler name. For example:

        <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler">
        <lst name="defaults">
        <str name="shards.qt">blockJoinFacetRH</str>
        </lst>
        <arr name="last-components">
        <str>blockJoinFacet</str>
        </arr>
        </requestHandler>

        Show
        Dr Oleg Savrasov added a comment - Please checkout the latest patch. Solr Cloud support has been implemented here. Please notice that in order to make it working you should make some configuration changes. 1. If you have /select SearchHandler definition, you should add blockJoinFacet as a last component here, like <requestHandler name="/select" class="solr.SearchHandler"> ..... <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler> 2. If you don't have /select SearchHandler definition, you should configure your custom BlockJoinFacet search handler with shards.qt parameter, which should reference on search handler name. For example: <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="shards.qt">blockJoinFacetRH</str> </lst> <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler>
        Hide
        Dr Oleg Savrasov added a comment -

        Video from the Lucene Revolution talk is available here http://www.youtube.com/watch?v=Su5SHc_uJw8

        Show
        Dr Oleg Savrasov added a comment - Video from the Lucene Revolution talk is available here http://www.youtube.com/watch?v=Su5SHc_uJw8
        Hide
        Jacob Carter added a comment -

        I've applied this patch to the Solr 5.0.0 and with a index containing around 400k parent documents and 1.5 million child documents it's taking over a minute to return the values of a child facet and their counts. Is this performance to be expected at the present time or have I potentially misconfigured my instance?

        Show
        Jacob Carter added a comment - I've applied this patch to the Solr 5.0.0 and with a index containing around 400k parent documents and 1.5 million child documents it's taking over a minute to return the values of a child facet and their counts. Is this performance to be expected at the present time or have I potentially misconfigured my instance?
        Hide
        Dr Oleg Savrasov added a comment -

        Performance improvements are still under investigation at the moment. I don't have too much time these days so I cannot promise that I'll come up with some solution soon. But we keep working on it.

        Show
        Dr Oleg Savrasov added a comment - Performance improvements are still under investigation at the moment. I don't have too much time these days so I cannot promise that I'll come up with some solution soon. But we keep working on it.
        Hide
        Dr Oleg Savrasov added a comment -

        I don't think that we need to introduce one more new special child.facet.query parameter here.
        It looks like that it's possible to achieve the same result by specifying appropriate ToParentQuery in facet.query parameter.
        For example, facet.query=

        {!parent which=type_s:parent}

        price:[1 TO 100].
        But please notice that in this case facet.query result could count child documents which are not matched by search query.
        For example, there could be a parent document with two children. One child has COLOR_s:Red and price:200, while another one COLOR_s:Blue and price:50.
        If you request q=

        {!parent which=type_s:parent}

        COLOR_s:Red
        and facet.query=

        {!parent which=type_s:parent}

        price:[1 TO 100], this document is going go be counted.
        Sometimes it's OK, but if you want to eliminate this effect, you need to add child documents filter from q to facet.query.
        The best way to do it is introducing new http parameter, say qq=COLOR_s:Red and referencing it both from q and facet.query, i.e.
        q=

        {!parent which=type_s:parent v=$qq}

        &facet.query=

        {!parent which=type_s:parent}

        +price:[1 TO 100] +

        {!v=$qq}

        &qq=type_s:child&facet=true

        Show
        Dr Oleg Savrasov added a comment - I don't think that we need to introduce one more new special child.facet.query parameter here. It looks like that it's possible to achieve the same result by specifying appropriate ToParentQuery in facet.query parameter. For example, facet.query= {!parent which=type_s:parent} price: [1 TO 100] . But please notice that in this case facet.query result could count child documents which are not matched by search query. For example, there could be a parent document with two children. One child has COLOR_s:Red and price:200, while another one COLOR_s:Blue and price:50. If you request q= {!parent which=type_s:parent} COLOR_s:Red and facet.query= {!parent which=type_s:parent} price: [1 TO 100] , this document is going go be counted. Sometimes it's OK, but if you want to eliminate this effect, you need to add child documents filter from q to facet.query. The best way to do it is introducing new http parameter, say qq=COLOR_s:Red and referencing it both from q and facet.query, i.e. q= {!parent which=type_s:parent v=$qq} &facet.query= {!parent which=type_s:parent} +price: [1 TO 100] + {!v=$qq} &qq=type_s:child&facet=true
        Hide
        Jim Musil added a comment -

        Curious, how would you handle this if a user searches for "pink shoes" or "large gloves"?

        Show
        Jim Musil added a comment - Curious, how would you handle this if a user searches for "pink shoes" or "large gloves"?
        Hide
        Dr Oleg Savrasov added a comment -

        We call this kind of requests which mix and match fields from different related entities a "deep search". To handle such requests we need to create a composition of Boolean query which will provide linguistic matching and Block Join query which will allow to return top level document when match happened on nested document. This topic worth its own JIRA (or few of them). Here, we are focusing on faceting rather than matching.

        Show
        Dr Oleg Savrasov added a comment - We call this kind of requests which mix and match fields from different related entities a "deep search". To handle such requests we need to create a composition of Boolean query which will provide linguistic matching and Block Join query which will allow to return top level document when match happened on nested document. This topic worth its own JIRA (or few of them). Here, we are focusing on faceting rather than matching.
        Hide
        Dr Oleg Savrasov added a comment -

        Performance improvement patch, which is prepared for lucene_solr_5_2 branch. On my local test data it makes proposed component faster in about 25 times. Please notice that it's recommended to apply patch SOLR-7730 as well, since it yields significant performance benefits too.

        Show
        Dr Oleg Savrasov added a comment - Performance improvement patch, which is prepared for lucene_solr_5_2 branch. On my local test data it makes proposed component faster in about 25 times. Please notice that it's recommended to apply patch SOLR-7730 as well, since it yields significant performance benefits too.
        Hide
        Dr Oleg Savrasov added a comment -

        Proposed component has been reworked to utilize algorithm described here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html. As a result code became more elegant and faster in about 2 times in comparison with the previous version.

        Show
        Dr Oleg Savrasov added a comment - Proposed component has been reworked to utilize algorithm described here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html . As a result code became more elegant and faster in about 2 times in comparison with the previous version.
        Hide
        Dr Oleg Savrasov added a comment -

        Unit test is extended to cover single value flow

        Show
        Dr Oleg Savrasov added a comment - Unit test is extended to cover single value flow
        Hide
        Ishan Chattopadhyaya added a comment -

        IMHO this is an important issue to fix, and the patch looks good to me (based on initial look, and due to the tests included in the patch). It would be very good to have some committer attention here.

        Show
        Ishan Chattopadhyaya added a comment - IMHO this is an important issue to fix, and the patch looks good to me (based on initial look, and due to the tests included in the patch). It would be very good to have some committer attention here.
        Hide
        Mikhail Khludnev added a comment - - edited

        Ishan Chattopadhyaya I'd like to commit it. I just want to confirm that there is no veto from anyone.
        I also appreciate if colleagues leave feedbacks for the recent patch, especially about its' performance. Jacob Carter would you comment on that?

        To summarize, If we decide to go on, I'll add it into defaultComponents, after that user will be able to get aggregated facets for children fields along side with the usual one:

        q={!parent ...}...&facet=true&child.facet.field=COLOR
        

        Here is the brief use case description https://www.mail-archive.com/solr-user@lucene.apache.org/msg115732.html

        Show
        Mikhail Khludnev added a comment - - edited Ishan Chattopadhyaya I'd like to commit it. I just want to confirm that there is no veto from anyone. I also appreciate if colleagues leave feedbacks for the recent patch, especially about its' performance. Jacob Carter would you comment on that? To summarize, If we decide to go on, I'll add it into defaultComponents, after that user will be able to get aggregated facets for children fields along side with the usual one: q={!parent ...}...&facet= true &child.facet.field=COLOR Here is the brief use case description https://www.mail-archive.com/solr-user@lucene.apache.org/msg115732.html
        Hide
        Mikhail Khludnev added a comment -

        Colleagues! I need your advice.
        This patch disables query result caching (that requires to make NO_CHECK_QCACHE public), enforce execution query every time (of course only if params are present).
        It calculates facets ongoing with search via DelegatingCollector. It's quite different to what Solr usually does. And it requires to relax encapsulation to access ToParentBlockJoinQuery.BlockJoinScorer.swapChildDocs(int[]). To accommodate this keeping encapsulation, we can add some public accessor class to o.a.l.search.join or made it default and add a class with o.a.l.search.join package into solr codebase (%100 ugly).
        As an alternative, we can migrate closer to regular a Solr approach, calculate childDocset and run faceting over it. Please put your opinion, otherwise I'll go to IRC and repeat the question.

        Show
        Mikhail Khludnev added a comment - Colleagues! I need your advice. This patch disables query result caching (that requires to make NO_CHECK_QCACHE public), enforce execution query every time (of course only if params are present). It calculates facets ongoing with search via DelegatingCollector. It's quite different to what Solr usually does. And it requires to relax encapsulation to access ToParentBlockJoinQuery.BlockJoinScorer.swapChildDocs(int[]) . To accommodate this keeping encapsulation, we can add some public accessor class to o.a.l.search.join or made it default and add a class with o.a.l.search.join package into solr codebase (%100 ugly). As an alternative, we can migrate closer to regular a Solr approach, calculate childDocset and run faceting over it. Please put your opinion, otherwise I'll go to IRC and repeat the question.
        Hide
        Mikhail Khludnev added a comment - - edited

        Revamped the patch SOLR-5743.patch. Caveat, bitwise ticks! Now it provides both approaches:

        • BlockJoinFacetComponent - enforces searching by NO_CHECK_QCACHE obtains child matches via BlockJoinScorer.swapChildDocs(int[]) see ChildTrackingCollector in the patch.
        • BlockJoinFacetDocSetComponent - it works more like Solr with toplevel doc sets
          I think to include both components into 5.5 disabled by default to let users to experiment.
          remaining TODOs:
        • exclude parent docs from faceting
        • now it's hardcoded to mincount=1, either set to 0 or copypaste mincount params logic and will be
        • improve simple test to handle edge cases with fields and hits.

        Any concerns?

        Show
        Mikhail Khludnev added a comment - - edited Revamped the patch SOLR-5743.patch . Caveat, bitwise ticks! Now it provides both approaches: BlockJoinFacetComponent - enforces searching by NO_CHECK_QCACHE obtains child matches via BlockJoinScorer.swapChildDocs(int[]) see ChildTrackingCollector in the patch. BlockJoinFacetDocSetComponent - it works more like Solr with toplevel doc sets I think to include both components into 5.5 disabled by default to let users to experiment. remaining TODOs: exclude parent docs from faceting now it's hardcoded to mincount=1, either set to 0 or copypaste mincount params logic and will be improve simple test to handle edge cases with fields and hits. Any concerns?
        Hide
        Mikhail Khludnev added a comment -

        tweaked SOLR-5743.patch.
        BlockJoinFacetDistribTest found discrepancy in shards response with facet=false.
        single node or shards with facet=true

        {responseHeader={status=0,QTime=133},response={numFound=11,start=0,docs=[]},
        facet_counts={facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,5=2,6=1,l=1,m=3,maxi=3,xl=3,xml=3,xxl=1,xxxl=1}}}}
        

        shards without facet=true

        {responseHeader={status=0,QTime=64},child_facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,m=3,maxi=3,xl=3,xml=3,5=2,6=1,l=1,xxl=1,xxxl=1}},response={numFound=11,start=0,maxScore=0.0,docs=[]}}
        

        junit

        junit.framework.AssertionFailedError: .child_facet_fields!=response (unordered or missing)
        	at 
        ...
        org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:893)
        	at 
        ...
        org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:571)
        	at org.apache.solr.search.join.BlockJoinFacetDistribTest.testBJQFacetComponent(BlockJoinFacetDistribTest.java:127)
        
        Show
        Mikhail Khludnev added a comment - tweaked SOLR-5743.patch . BlockJoinFacetDistribTest found discrepancy in shards response with facet=false . single node or shards with facet=true {responseHeader={status=0,QTime=133},response={numFound=11,start=0,docs=[]}, facet_counts={facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,5=2,6=1,l=1,m=3,maxi=3,xl=3,xml=3,xxl=1,xxxl=1}}}} shards without facet=true {responseHeader={status=0,QTime=64},child_facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,m=3,maxi=3,xl=3,xml=3,5=2,6=1,l=1,xxl=1,xxxl=1}},response={numFound=11,start=0,maxScore=0.0,docs=[]}} junit junit.framework.AssertionFailedError: .child_facet_fields!=response (unordered or missing) at ... org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:893) at ... org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:571) at org.apache.solr.search.join.BlockJoinFacetDistribTest.testBJQFacetComponent(BlockJoinFacetDistribTest.java:127)
        Hide
        Dr Oleg Savrasov added a comment -

        Fix for the distributed test failure

        Show
        Dr Oleg Savrasov added a comment - Fix for the distributed test failure
        Hide
        Mikhail Khludnev added a comment -

        I'm going to commit SOLR-5743.patch if there is no Christmas freeze.

        Show
        Mikhail Khludnev added a comment - I'm going to commit SOLR-5743.patch if there is no Christmas freeze.
        Hide
        Mikhail Khludnev added a comment -

        introducing ToParentBlockJoinQuery.ChildrenMatchesScorer to make javadoc happier

        Show
        Mikhail Khludnev added a comment - introducing ToParentBlockJoinQuery.ChildrenMatchesScorer to make javadoc happier
        Hide
        Mikhail Khludnev added a comment -

        now javadoc is perfect

        Show
        Mikhail Khludnev added a comment - now javadoc is perfect
        Hide
        ASF subversion and git services added a comment -

        Commit 1721644 from mkhl@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1721644 ]

        SOLR-5743: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters

        Show
        ASF subversion and git services added a comment - Commit 1721644 from mkhl@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1721644 ] SOLR-5743 : introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters
        Hide
        ASF subversion and git services added a comment -

        Commit 1721652 from mkhl@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721652 ]

        SOLR-5743: merging: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters

        Show
        ASF subversion and git services added a comment - Commit 1721652 from mkhl@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721652 ] SOLR-5743 : merging: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters
        Hide
        Vijay Sekhri added a comment - - edited

        Hi Mikhail, Dr. Oleg
        The requirement to use this feature is to have ToParentBlockJoinQuery like

         q={!parent which=<allParents>}<someChildren> 

        To use the ParentBlockJoinQuery it needs to search on fields present in child document. In real world your parent document would have most of the common fields and child document would have only the different fields. For example just like BRAND_s, there will be fields like description_s, name_s, title_s, partnumber_s, etc. in the parent document only. As they are same for all the child documents , one would not repeat them in the child document, rather only keep them in the parent document only. In the child document , we would have attributes like COLOR_s, SIZE_s as the differ.

        Now for any real searches , one would search for fields like BRAND_s, description_s, name_s, title_s, partnumber_s, etc to return appropriate documents. However , those fields are only present in parent docs.

        So searching them like

         q={!parent which=type_s:parent}BRAND_s:Nike&facet=true&child.facet.field=COLOR_s 

        does not work because search on BRAND_s:Nike is present in parent document . It gives this error also
        child query must only match non-parent docs, but parent docID=2 matched childScorer=class org.apache.lucene.search.TermScorer

        One could search on fields from child like this without any problem.

         q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s 

        To use this feature do we have to copy all the common fields ( and thousands of such fields alike ) back into the child (repeating them for every child) and search on those fields ? For example copying brand_s field like this

        [{
         "id": 10,
         "type_s": "parent",
         "BRAND_s": "Nike",
         "_childDocuments_": [{
           "id": 11,
           "COLOR_s": "Red",
           "SIZE_s": "XL",
           "BRAND_s": "Nike",
         }, 
         {
         "id": 12,
         "COLOR_s": "Blue",
         "SIZE_s": "XL",
         "BRAND_s": "Nike",
         }]
        }]
        

        This way the query works

        q={!parent which=type_s:parent}BRAND_s:Nike&facet=true&child.facet.field=COLOR_s
        

        Or there is some other way where we can still use the facets on the child fields (SIZE_s) , aggregate the counts on the parent docs (id:10) and still search on the common fields from parent docs (BRAND_s) ?

        Show
        Vijay Sekhri added a comment - - edited Hi Mikhail, Dr. Oleg The requirement to use this feature is to have ToParentBlockJoinQuery like q={!parent which=<allParents>}<someChildren> To use the ParentBlockJoinQuery it needs to search on fields present in child document. In real world your parent document would have most of the common fields and child document would have only the different fields. For example just like BRAND_s, there will be fields like description_s, name_s, title_s, partnumber_s, etc. in the parent document only. As they are same for all the child documents , one would not repeat them in the child document, rather only keep them in the parent document only. In the child document , we would have attributes like COLOR_s, SIZE_s as the differ. Now for any real searches , one would search for fields like BRAND_s, description_s, name_s, title_s, partnumber_s, etc to return appropriate documents. However , those fields are only present in parent docs. So searching them like q={!parent which=type_s:parent}BRAND_s:Nike&facet= true &child.facet.field=COLOR_s does not work because search on BRAND_s:Nike is present in parent document . It gives this error also child query must only match non-parent docs, but parent docID=2 matched childScorer=class org.apache.lucene.search.TermScorer One could search on fields from child like this without any problem. q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s To use this feature do we have to copy all the common fields ( and thousands of such fields alike ) back into the child (repeating them for every child) and search on those fields ? For example copying brand_s field like this [{ "id" : 10, "type_s" : "parent" , "BRAND_s" : "Nike" , "_childDocuments_" : [{ "id" : 11, "COLOR_s" : "Red" , "SIZE_s" : "XL" , "BRAND_s" : "Nike" , }, { "id" : 12, "COLOR_s" : "Blue" , "SIZE_s" : "XL" , "BRAND_s" : "Nike" , }] }] This way the query works q={!parent which=type_s:parent}BRAND_s:Nike&facet= true &child.facet.field=COLOR_s Or there is some other way where we can still use the facets on the child fields (SIZE_s) , aggregate the counts on the parent docs (id:10) and still search on the common fields from parent docs (BRAND_s) ?
        Hide
        Mikhail Khludnev added a comment -

        hold on..
        I wonder why you can't intersect it with parent level filer

        q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s&fq=BRAND_s:Nike
        

        in this case no copying is necessary. Make sure you checked examples from the blog

        Show
        Mikhail Khludnev added a comment - hold on.. I wonder why you can't intersect it with parent level filer q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s&fq=BRAND_s:Nike in this case no copying is necessary. Make sure you checked examples from the blog
        Hide
        Vijay Sekhri added a comment - - edited

        I saw the blog few times before already. Thank you for writing it Mikhail . I am not sure it cover the searching uses cases. Searching and filtering are two different use cases.
        For example say you have these in your solrconfig as one of the requesthandler . As you can see it would search in a lot fields and boost based on which fields matches. Plus if you declare pf field for proximity and mm field for must match , relevancy kicks in . All of this I am not sure how can be still used just by mere filter. Searching return relevant docs with accounting for boosts. Filter remove docs that matches criteria.

                 <str name="qf">
                         primaryLnames^5.0 partnumber^11.0 itemnumber^11.0 description^0.5  fullmfpartno^5.0 mfpartno^5.0 xref^10.0 storeOriginSearchable^3.0 nameSearchable^10.0 brandSearchable^5.0  searchPhrase^1.0 searchableAttributesSearchable^1.0
                    </str>
                    <str name="pf">
        				primaryLnames^0.5 nameSearchable^1.0 description^0.1 storeOriginSearchable^0.3 brandSearchable^0.5  xref^1.1 searchableAttributesSearchable^0.1
        			 </str>
                    <str name="fl">*</str>
                    <str name="mm">
                        2<-1 5<-2 6<-50%
                    </str>
        			
        		

        Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . Real world scenarios would return parent docs based on some order of relevancy and boost criteria.

        Show
        Vijay Sekhri added a comment - - edited I saw the blog few times before already. Thank you for writing it Mikhail . I am not sure it cover the searching uses cases. Searching and filtering are two different use cases. For example say you have these in your solrconfig as one of the requesthandler . As you can see it would search in a lot fields and boost based on which fields matches. Plus if you declare pf field for proximity and mm field for must match , relevancy kicks in . All of this I am not sure how can be still used just by mere filter. Searching return relevant docs with accounting for boosts. Filter remove docs that matches criteria. <str name= "qf" > primaryLnames^5.0 partnumber^11.0 itemnumber^11.0 description^0.5 fullmfpartno^5.0 mfpartno^5.0 xref^10.0 storeOriginSearchable^3.0 nameSearchable^10.0 brandSearchable^5.0 searchPhrase^1.0 searchableAttributesSearchable^1.0 </str> <str name= "pf" > primaryLnames^0.5 nameSearchable^1.0 description^0.1 storeOriginSearchable^0.3 brandSearchable^0.5 xref^1.1 searchableAttributesSearchable^0.1 </str> <str name= "fl" >*</str> <str name= "mm" > 2<-1 5<-2 6<-50% </str> Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . Real world scenarios would return parent docs based on some order of relevancy and boost criteria.
        Hide
        Mikhail Khludnev added a comment -

        Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery .

        q=+BRAND_s:Nike +_query_:"{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"
        
        Show
        Mikhail Khludnev added a comment - Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . q=+BRAND_s:Nike +_query_: "{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"
        Hide
        Vijay Sekhri added a comment -

        Thank you Mikhail. I already tried that before already and it did not work. Now I found out why it was not working earlier. Apparently if you have defType=dismax in the requestHandler, then that type of sibling clause query does not work. Removing it works as expected. Thank you again.

        Show
        Vijay Sekhri added a comment - Thank you Mikhail. I already tried that before already and it did not work. Now I found out why it was not working earlier. Apparently if you have defType=dismax in the requestHandler, then that type of sibling clause query does not work. Removing it works as expected. Thank you again.
        Hide
        Erick Erickson added a comment -

        Should we close this and add trunk to the fixed versions?

        Show
        Erick Erickson added a comment - Should we close this and add trunk to the fixed versions?
        Hide
        Mikhail Khludnev added a comment -

        Cassandra Targett would mind to have a look at the wiki? I appreciate feedback about content and format as well. Thanks!

        Show
        Mikhail Khludnev added a comment - Cassandra Targett would mind to have a look at the wiki ? I appreciate feedback about content and format as well. Thanks!
        Hide
        Cassandra Targett added a comment -

        Hey Mikhail Khludnev, I'll take a look today - thanks!

        Show
        Cassandra Targett added a comment - Hey Mikhail Khludnev , I'll take a look today - thanks!
        Hide
        Cassandra Targett added a comment -

        Please check out the changes I made and let me know if my edits made any information incorrect. I tried to study this issue a bit for the background, but might have misunderstood something.

        Show
        Cassandra Targett added a comment - Please check out the changes I made and let me know if my edits made any information incorrect. I tried to study this issue a bit for the background, but might have misunderstood something.
        Hide
        Vijay Sekhri added a comment -

        Hi Mikhail,
        I did benchmark testing of this feature to determine the efficiency and performance .
        In our stress environment I have roughly 57 Mil documents in solr index. 10 shards and each shard hosting around 5.7 Mil documents . Each shard has one replica and one leader .
        Like in this figure.

        There is solrj service that connects to solr cluster hosted on 8 hosts and each having 3 JVM instances. So in total 24 round robin instances of solrj service running and issuing queries to solr cluster.
        Solr version is 5.3.1

        Here is the baseline
        With a load of 50 requests per seconds to the solrj service the average response times in service is 290 milliseconds. Same translated into solr cluster results in average response Qtimes of 22 milliseconds.
        Here is the picture of average response times at service

        Here is the picture of average response Qtime of the solr

        Now I converted most of the documents with parent child relationship . In total there were 27 Mil new child documents . So the total count of the documents increased from 57 Mil to 83 Mil documents. I converted all the queries into the format of parent child in the solrj service layer . Now with the same load the average response times in service increased to 1.3 seconds and average response Qtimes increased to 500 milliseconds.
        The solr version is 5.4. trunk with your code in it .

        Here is the picture of average response times at service with parent child

        Here is the picture of average response Qtime of the solr with parent child

        The overall performance was 10 times slower in solr layer and 3 times slower in solrj service layer with the same load .

        BTW I only tested with org.apache.solr.search.join.BlockJoinFacetComponent . Do you think that org.apache.solr.search.join.BlockJoinDocSetFacetComponent would be faster?

        Vijay

        Show
        Vijay Sekhri added a comment - Hi Mikhail, I did benchmark testing of this feature to determine the efficiency and performance . In our stress environment I have roughly 57 Mil documents in solr index. 10 shards and each shard hosting around 5.7 Mil documents . Each shard has one replica and one leader . Like in this figure. There is solrj service that connects to solr cluster hosted on 8 hosts and each having 3 JVM instances. So in total 24 round robin instances of solrj service running and issuing queries to solr cluster. Solr version is 5.3.1 Here is the baseline With a load of 50 requests per seconds to the solrj service the average response times in service is 290 milliseconds. Same translated into solr cluster results in average response Qtimes of 22 milliseconds. Here is the picture of average response times at service Here is the picture of average response Qtime of the solr Now I converted most of the documents with parent child relationship . In total there were 27 Mil new child documents . So the total count of the documents increased from 57 Mil to 83 Mil documents. I converted all the queries into the format of parent child in the solrj service layer . Now with the same load the average response times in service increased to 1.3 seconds and average response Qtimes increased to 500 milliseconds. The solr version is 5.4. trunk with your code in it . Here is the picture of average response times at service with parent child Here is the picture of average response Qtime of the solr with parent child The overall performance was 10 times slower in solr layer and 3 times slower in solrj service layer with the same load . BTW I only tested with org.apache.solr.search.join.BlockJoinFacetComponent . Do you think that org.apache.solr.search.join.BlockJoinDocSetFacetComponent would be faster? Vijay
        Hide
        Mikhail Khludnev added a comment -

        Vijay, here are a few notes:

        1. 290 milli vs Qtimes of 22 millis, here either I'm missing something or here is the room for performance engineering even not search specific ones. Although, it's an off-top.
        2. I wonder how you compare performance on different indexes, and how to interpret the results: it's either might say about inefficient algorithm, or about high model expenses. To evaluate the former, you can compare the block join facet performance with child only queries and child field facet counting. ie it's worth to compare performance of :
          q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s
          

          with

          q=COLOR_s:Blue&facet=true&facet.field=COLOR_s
          

          Comparing these numbers can evidence about aggregation efficiency (almost, see below).

        3. BlockJoinDocSetFacetComponent should be faster for rarely changed indexes. Notice: BlockJoinFacetComponent disables query result cache and this also might impact benchmarking results.
        Show
        Mikhail Khludnev added a comment - Vijay, here are a few notes: 290 milli vs Qtimes of 22 millis, here either I'm missing something or here is the room for performance engineering even not search specific ones. Although, it's an off-top. I wonder how you compare performance on different indexes, and how to interpret the results: it's either might say about inefficient algorithm, or about high model expenses. To evaluate the former, you can compare the block join facet performance with child only queries and child field facet counting. ie it's worth to compare performance of : q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s with q=COLOR_s:Blue&facet= true &facet.field=COLOR_s Comparing these numbers can evidence about aggregation efficiency (almost, see below). BlockJoinDocSetFacetComponent should be faster for rarely changed indexes. Notice: BlockJoinFacetComponent disables query result cache and this also might impact benchmarking results.
        Hide
        Vijay Sekhri added a comment -

        Hi Mikhail,
        There were 2 reasons why the performance was bad I realized.
        a) For a whole lot of queries (internally generated by solr to different shards ) you code was giving a NPE. That made our service layer get the exception and do another query that added up to the overall response times (QTime). The NPE was not happening on all queries though. However, whenever it would happen it would degrade the performance because of multiple queries. This is the code where it was happening

        14:00:20,751 ERROR [org.apache.solr.servlet.HttpSolrCall] (http-/10.235.43.43:8580-82) null:java.lang.NullPointerException
                at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100)
                at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87)
        

        at this line

         final int[] docNums = blockJoinScorer.swapChildDocs(childDocs);
        

        because sometime the blockJoinScorer object would be null. Again this would happen half of the time but other half it would be fine.

        So I changed the code

        		
            if(blockJoinScorer == null) {
                //System.out.println("blockJoinScorer is NULL");
                return;
            }
        

        and reran my load and it brought down performance back to 60 millisecond from 200 milliseconds.

        b) All my queries were doing a wild card match like this

        		
        q={!parent%20which=type_s:parent}id:*_child
        

        and I changed that to

        		
        q={!parent%20which=type_s:parent}type_s:child
        

        This further brought down the qTimes to 30 milliseconds. Granted it is a bit higher than baseline but it is acceptable. Please let me know what to do about that NPE in the code. I am not sure if what I did is functionally correct or not.

        -regards

        Show
        Vijay Sekhri added a comment - Hi Mikhail, There were 2 reasons why the performance was bad I realized. a) For a whole lot of queries (internally generated by solr to different shards ) you code was giving a NPE. That made our service layer get the exception and do another query that added up to the overall response times (QTime). The NPE was not happening on all queries though. However, whenever it would happen it would degrade the performance because of multiple queries. This is the code where it was happening 14:00:20,751 ERROR [org.apache.solr.servlet.HttpSolrCall] (http-/10.235.43.43:8580-82) null :java.lang.NullPointerException at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100) at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87) at this line final int [] docNums = blockJoinScorer.swapChildDocs(childDocs); because sometime the blockJoinScorer object would be null. Again this would happen half of the time but other half it would be fine. So I changed the code if (blockJoinScorer == null ) { // System .out.println( "blockJoinScorer is NULL" ); return ; } and reran my load and it brought down performance back to 60 millisecond from 200 milliseconds. b) All my queries were doing a wild card match like this q={!parent%20which=type_s:parent}id:*_child and I changed that to q={!parent%20which=type_s:parent}type_s:child This further brought down the qTimes to 30 milliseconds. Granted it is a bit higher than baseline but it is acceptable. Please let me know what to do about that NPE in the code. I am not sure if what I did is functionally correct or not. -regards
        Hide
        Mikhail Khludnev added a comment -

        Vijay,
        this NPE is a twin of SOLR-8643, SOLR-8644 (I'll comment them soon too). Though it's might be caused by specific form of queries in SolrCloud. Could you please expose a few following line to catch which queries particularly cause a NPE?
        And, yes - BlockJoinFacetDocSetComponent shouldn't be impacted by this scorer routine

        Show
        Mikhail Khludnev added a comment - Vijay, this NPE is a twin of SOLR-8643 , SOLR-8644 (I'll comment them soon too). Though it's might be caused by specific form of queries in SolrCloud. Could you please expose a few following line to catch which queries particularly cause a NPE? And, yes - BlockJoinFacetDocSetComponent shouldn't be impacted by this scorer routine
        Hide
        Vijay Sekhri added a comment -

        Hi Mikhail,
        It could be related to stats query that does not even have any ToParentBlockJoin syntax . Here is one example

        15:07:56,736 INFO  [org.apache.solr.core.SolrCore.Request] (http-/10.235.43.43:8580-143) [core1]  webapp=/solr path=/select 
        params={shards.qt=searchStandard&tie=0.01&stats=true&distrib=false&q.alt=*:*&originIP=10.235.52.131&collection=search1&shards.tolerant=true&version=2&NOW=1454360876733&shard.url=http://solrx331p.qa.ch3.s.com:8580/solr/core1/|http://solrx351p.qa.ch3.s.com:8580/solr/core1/&fl=id&fl=score&bf=%0a++++++++++++&timeAllowed=10000&qt=searchStandard&fsv=true&fq=catalogs:(("10104"))&fq=searchableAttributes:(("Metal%3DTri+color"))&fq=brand:("Black+Hills+Gold")&fq=discount:("70")&fq=primaryCategory:("10104_3_Jewelry_Diamonds_Rings")&mm=%0a++++++++++++++++2<-1+5<-2+6<-50%25%0a++++++++++++&hasOrigCategories=1&qf=%0a+++++++++++++++++primaryLnames^5.0+partnumber^11.0+itemnumber^11.0+fullmfpartno^5.0+mfpartno^5.0+xref^10.0+storeOriginSearchable^3.0+nameSearchable^10.0+brandSearchable^5.0++searchPhrase^1.0++searchableAttributesSearchable^1.0++++%0a++++++++++++&wt=javabin&rows=0&pf=%0a+++++++++++++++primaryLnames^0.5+nameSearchable^1.0+storeOriginSearchable^0.3+brandSearchable^0.5++xref^1.1+searchableAttributesSearchable^0.1%0a++++++++++++&shards.purpose=516&start=0&q=white+diamonds+diamonds+elizabeth+taylor+body+lotion&bot=true&stats.field=price_10151_f&isShard=true&ps=100} hits=0 status=0 QTime=0
        
        
        15:07:56,758 ERROR [org.apache.solr.handler.RequestHandlerBase] (http-/10.235.43.43:8580-26) java.lang.NullPointerException
                at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100)
                at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87)
                at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:1153)
                at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:350)
                at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
                at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
        
        
        

        If you want I could revert back the code and run some load again to get more of these queries.
        Vijay

        Show
        Vijay Sekhri added a comment - Hi Mikhail, It could be related to stats query that does not even have any ToParentBlockJoin syntax . Here is one example 15:07:56,736 INFO [org.apache.solr.core.SolrCore.Request] (http-/10.235.43.43:8580-143) [core1] webapp=/solr path=/select params={shards.qt=searchStandard&tie=0.01&stats= true &distrib= false &q.alt=*:*&originIP=10.235.52.131&collection=search1&shards.tolerant= true &version=2&NOW=1454360876733&shard.url=http: //solrx331p.qa.ch3.s.com:8580/solr/core1/|http://solrx351p.qa.ch3.s.com:8580/solr/core1/&fl=id&fl=score&bf=%0a++++++++++++&timeAllowed=10000&qt=searchStandard&fsv= true &fq=catalogs:(( "10104" ))&fq=searchableAttributes:(( "Metal%3DTri+color" ))&fq=brand:( "Black+Hills+Gold" )&fq=discount:( "70" )&fq=primaryCategory:( "10104_3_Jewelry_Diamonds_Rings" )&mm=%0a++++++++++++++++2<-1+5<-2+6<-50%25%0a++++++++++++&hasOrigCategories=1&qf=%0a+++++++++++++++++primaryLnames^5.0+partnumber^11.0+itemnumber^11.0+fullmfpartno^5.0+mfpartno^5.0+xref^10.0+storeOriginSearchable^3.0+nameSearchable^10.0+brandSearchable^5.0++searchPhrase^1.0++searchableAttributesSearchable^1.0++++%0a++++++++++++&wt=javabin&rows=0&pf=%0a+++++++++++++++primaryLnames^0.5+nameSearchable^1.0+storeOriginSearchable^0.3+brandSearchable^0.5++xref^1.1+searchableAttributesSearchable^0.1%0a++++++++++++&shards.purpose=516&start=0&q=white+diamonds+diamonds+elizabeth+taylor+body+lotion&bot= true &stats.field=price_10151_f&isShard= true &ps=100} hits=0 status=0 QTime=0 15:07:56,758 ERROR [org.apache.solr.handler.RequestHandlerBase] (http-/10.235.43.43:8580-26) java.lang.NullPointerException at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100) at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:1153) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:350) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) If you want I could revert back the code and run some load again to get more of these queries. Vijay
        Hide
        Mikhail Khludnev added a comment -

        This, NPE is pretty strange, it's a gap in error handling functionality, but request log has status=0 it couldn't happen if an exception occurs. These query parameters can't enable BlockJoinFacet component. Actual query parameters causing this NPE should follow stacktrace.

        Show
        Mikhail Khludnev added a comment - This, NPE is pretty strange, it's a gap in error handling functionality, but request log has status=0 it couldn't happen if an exception occurs. These query parameters can't enable BlockJoinFacet component. Actual query parameters causing this NPE should follow stacktrace.
        Hide
        Vijay Sekhri added a comment -

        Mikhail,
        For issues like these and some others should I open a separate Jira for manageability ? I also observed that facet.prefix is not being honored on child.facet.field . Let me know and I can open a Jira .
        Thanks

        Show
        Vijay Sekhri added a comment - Mikhail, For issues like these and some others should I open a separate Jira for manageability ? I also observed that facet.prefix is not being honored on child.facet.field . Let me know and I can open a Jira . Thanks
        Hide
        Mikhail Khludnev added a comment -

        Vijay,
        Sure you can open, but personally I prefer to postpone any such extensions until we merge child.facet engine into json facets.

        Show
        Mikhail Khludnev added a comment - Vijay, Sure you can open, but personally I prefer to postpone any such extensions until we merge child.facet engine into json facets.
        Hide
        Vijay Sekhri added a comment -

        I created a new JIRA and also attached a rudimentary patch that takes care of NPE and honors facet.prefix.
        https://issues.apache.org/jira/secure/attachment/12792872/SOLR-8834.patch
        https://issues.apache.org/jira/browse/SOLR-8834

        Vijay

        Show
        Vijay Sekhri added a comment - I created a new JIRA and also attached a rudimentary patch that takes care of NPE and honors facet.prefix. https://issues.apache.org/jira/secure/attachment/12792872/SOLR-8834.patch https://issues.apache.org/jira/browse/SOLR-8834 Vijay
        Hide
        Alisa Zhila added a comment -

        Hi Mikhail and Oleg,
        Thank you for introducing this new feature and describing it your blog (http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev).

        I am wondering whether BlockJoin faceting supports the parameters for output "limit" and "mincount"? Unfortunately, I could not find any mentions in the wiki (https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting).

        My experiments with trying to use the common facet.limit and facet.mincount syntax failed:
        /bjqfacet?q=

        {!parent%20which=type_s:doc}

        type_s:doc.enriched.text.keywords&facet=true&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq=

        {!parent%20which=type_s:doc}

        type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true

        {
        "responseHeader":

        { "status":0, "QTime":1}

        ,
        "response":

        {"numFound":19,"start":0,"docs":[] }

        ,
        "facet_counts":{
        "facet_queries":{},
        "facet_fields":{
        "text_t":[
        "128x",1,
        "18xx",1,
        ...
        "ab",2,
        "access",5,
        "account",1,
        "accounts",1,
        "action",2,
        "address",1,
        "addressee",1,
        "afternoon",3,
        "agreement",2,
        ...
        "wsj",1,
        "year",2,
        "yoder",2,
        "york",1]}}

        As you see, the buckets are sorted in alphabetical order and the response yields all of them.

        Is limit and mincount implemented for BlockJoin faceting? If yes, can its usage be described in the wiki?

        Thank you!

        Show
        Alisa Zhila added a comment - Hi Mikhail and Oleg, Thank you for introducing this new feature and describing it your blog ( http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev ). I am wondering whether BlockJoin faceting supports the parameters for output "limit" and "mincount"? Unfortunately, I could not find any mentions in the wiki ( https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ). My experiments with trying to use the common facet.limit and facet.mincount syntax failed: /bjqfacet?q= {!parent%20which=type_s:doc} type_s:doc.enriched.text.keywords&facet=true&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq= {!parent%20which=type_s:doc} type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true { "responseHeader": { "status":0, "QTime":1} , "response": {"numFound":19,"start":0,"docs":[] } , "facet_counts":{ "facet_queries":{}, "facet_fields":{ "text_t":[ "128x",1, "18xx",1, ... "ab",2, "access",5, "account",1, "accounts",1, "action",2, "address",1, "addressee",1, "afternoon",3, "agreement",2, ... "wsj",1, "year",2, "yoder",2, "york",1]}} As you see, the buckets are sorted in alphabetical order and the response yields all of them. Is limit and mincount implemented for BlockJoin faceting? If yes, can its usage be described in the wiki? Thank you!
        Hide
        Mikhail Khludnev added a comment -

        Alisa,

        My personal preference is to don't implement limit and mincount here, but merge this logic into JSON Facets. But I don't know how, yet.

        Show
        Mikhail Khludnev added a comment - Alisa, My personal preference is to don't implement limit and mincount here, but merge this logic into JSON Facets. But I don't know how, yet.

          People

          • Assignee:
            Mikhail Khludnev
            Reporter:
            abipc
          • Votes:
            28 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development