Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: faceting
    • Labels:

      Description

      For a sample inventory(note - nested documents) like this -
      <doc>
      <field name="id">10</field>
      <field name="type_s">parent</field>
      <field name="BRAND_s">Nike</field>
      <doc>
      <field name="id">11</field>
      <field name="COLOR_s">Red</field>
      <field name="SIZE_s">XL</field>
      </doc>
      <doc>
      <field name="id">12</field>
      <field name="COLOR_s">Blue</field>
      <field name="SIZE_s">XL</field>
      </doc>
      </doc>

      Faceting results must contain -
      Red(1)
      XL(1)
      Blue(1)

      for a "q=*" query.

      PS : The inventory example has been taken from this blog - http://blog.griddynamics.com/2013/09/solr-block-join-support.html

      1. cluster.jpg
        61 kB
        Vijay Sekhri
      2. service_baseline.png
        35 kB
        Vijay Sekhri
      3. service_new_baseline.jpg
        54 kB
        Vijay Sekhri
      4. solr_baseline.jpg
        192 kB
        Vijay Sekhri
      5. solr_new_baseline.jpg
        191 kB
        Vijay Sekhri
      6. SOLR-5743.patch
        79 kB
        Mikhail Khludnev
      7. SOLR-5743.patch
        78 kB
        Mikhail Khludnev
      8. SOLR-5743.patch
        78 kB
        Mikhail Khludnev
      9. SOLR-5743.patch
        77 kB
        Dr Oleg Savrasov
      10. SOLR-5743.patch
        75 kB
        Mikhail Khludnev
      11. SOLR-5743.patch
        76 kB
        Mikhail Khludnev
      12. SOLR-5743.patch
        75 kB
        Mikhail Khludnev
      13. SOLR-5743.patch
        62 kB
        Dr Oleg Savrasov
      14. SOLR-5743.patch
        59 kB
        Dr Oleg Savrasov
      15. SOLR-5743.patch
        59 kB
        Dr Oleg Savrasov
      16. SOLR-5743.patch
        47 kB
        Dr Oleg Savrasov
      17. SOLR-5743.patch
        38 kB
        Dr Oleg Savrasov
      18. SOLR-5743.patch
        37 kB
        Dr Oleg Savrasov
      19. SOLR-5743.patch
        36 kB
        Dr Oleg Savrasov
      20. SOLR-5743.patch
        73 kB
        Dr Oleg Savrasov

        Issue Links

          Activity

          Hide
          Dr Oleg Savrasov added a comment -

          I'm preparing Lucene Revolution talk http://lucenerevolution.uservoice.com/forums/254256-internals-track/suggestions/5995621-faceting-with-lucene-blockjoinquery which addresses the feature. Your votes would be much appreciated.

          Show
          Dr Oleg Savrasov added a comment - I'm preparing Lucene Revolution talk http://lucenerevolution.uservoice.com/forums/254256-internals-track/suggestions/5995621-faceting-with-lucene-blockjoinquery which addresses the feature. Your votes would be much appreciated.
          Hide
          Dr Oleg Savrasov added a comment -

          Initial implementation which meets functionality requirements. There is new BlockJoinFacetComponent which expects ToParentBlockJoinQuery in search request. Facets are calculated for fields defined by child.facet.field parameter. Only DocValues fields are supported.

          Show
          Dr Oleg Savrasov added a comment - Initial implementation which meets functionality requirements. There is new BlockJoinFacetComponent which expects ToParentBlockJoinQuery in search request. Facets are calculated for fields defined by child.facet.field parameter. Only DocValues fields are supported.
          Hide
          ash fo added a comment -

          There are two xml files you are tweaking in this patch that do not even exist in the source:

          solr/core/src/test-files/solr/collection1/conf/schema-blockjoinfacetcomponent.xml
          solr/core/src/test-files/solr/collection1/conf/solrconfig-blockjoinfacetcomponent.xml

          Could you please explain where I find those files? Patch is modifying them not adding the whole file. When I apply the patch it skips those files basically.

          Thank you

          Show
          ash fo added a comment - There are two xml files you are tweaking in this patch that do not even exist in the source: solr/core/src/test-files/solr/collection1/conf/schema-blockjoinfacetcomponent.xml solr/core/src/test-files/solr/collection1/conf/solrconfig-blockjoinfacetcomponent.xml Could you please explain where I find those files? Patch is modifying them not adding the whole file. When I apply the patch it skips those files basically. Thank you
          Hide
          Dr Oleg Savrasov added a comment -

          I created the files by copying and modifying existing configurations. It looks like my IDE processed changes incorrectly. Sorry about that. Please find updated patch attached. Should you have any issues, please notice me.

          Show
          Dr Oleg Savrasov added a comment - I created the files by copying and modifying existing configurations. It looks like my IDE processed changes incorrectly. Sorry about that. Please find updated patch attached. Should you have any issues, please notice me.
          Hide
          ash fo added a comment -

          Thanks, I applied the patch but still passing "child.facet.field=xxxxxxx" doesn't do anything. Here is my query:

          http://localhost:8080/solr/nested_collecion2/select?q=*%3A*&fq=content_type%3AparentDocument&fl=id&wt=json&indent=true&facet=true&child.facet.field=retid

          And this is what I get back, basically Solr doesn't know the 'child.facet.field' parameter:

          {
          "responseHeader":{
          "status":0,
          "QTime":1,
          "params":{
          "facet":"true",
          "fl":"id",
          "indent":"true",
          "q":":",
          "child.facet.field":"retid",
          "wt":"json",
          "fq":"content_type:parentDocument"}},
          "response":{"numFound":998,"start":0,"docs":[

          { "id":"1554855923"}

          ,

          { "id":"1556730933"}

          ,

          { "id":"1437257890"}

          ,

          { "id":"1463296684"}

          ,

          { "id":"1143793641"}

          ,

          { "id":"1168208507"}

          ,

          { "id":"1201399772"}

          ,

          { "id":"1162769709"}

          ,

          { "id":"1199906811"}

          ,

          { "id":"1296203203"}

          ]
          },
          "facet_counts":{
          "facet_queries":{},
          "facet_fields":{},
          "facet_dates":{},
          "facet_ranges":{},
          "facet_intervals":{}}}

          the retid field has the docValues="true" too.

          <field name="retid" type="int" indexed="true" stored="true" docValues="true"/>

          Is there anything else needs to be done?

          Thanks

          Show
          ash fo added a comment - Thanks, I applied the patch but still passing "child.facet.field=xxxxxxx" doesn't do anything. Here is my query: http://localhost:8080/solr/nested_collecion2/select?q=*%3A*&fq=content_type%3AparentDocument&fl=id&wt=json&indent=true&facet=true&child.facet.field=retid And this is what I get back, basically Solr doesn't know the 'child.facet.field' parameter: { "responseHeader":{ "status":0, "QTime":1, "params":{ "facet":"true", "fl":"id", "indent":"true", "q":" : ", "child.facet.field":"retid", "wt":"json", "fq":"content_type:parentDocument"}}, "response":{"numFound":998,"start":0,"docs":[ { "id":"1554855923"} , { "id":"1556730933"} , { "id":"1437257890"} , { "id":"1463296684"} , { "id":"1143793641"} , { "id":"1168208507"} , { "id":"1201399772"} , { "id":"1162769709"} , { "id":"1199906811"} , { "id":"1296203203"} ] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}}} the retid field has the docValues="true" too. <field name="retid" type="int" indexed="true" stored="true" docValues="true"/> Is there anything else needs to be done? Thanks
          Hide
          Dr Oleg Savrasov added a comment -

          In order to utilize proposed component, you need to configure it in solrconfig.xml and introduce some search handler which uses it, for example

          <searchComponent name="blockJoinFacet" class="org.apache.solr.handler.component.BlockJoinFacetComponent">

          </searchComponent>

          <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler">
          <arr name="last-components">
          <str>blockJoinFacet</str>
          </arr>
          </requestHandler>

          Please notice that only string docValues fields could be used for faceting, int type can be covered later, so you need to update appropriate fields configuration in schema.xml file, for example

          <field name="COLOR_s" type="string" indexed="true" stored="true" docValues="true"/>
          <field name="SIZE_s" type="string" indexed="true" stored="true" docValues="true"/>

          Then after indexing some set of hierarchical documents like

          <doc>
          <field name="id">10</field>
          <field name="type_s">parent</field>
          <field name="BRAND_s">Nike</field>
          <doc>
          <field name="id">11</field>
          <field name="type_s">child</field>
          <field name="COLOR_s">Red</field>
          <field name="SIZE_s">XL</field>
          </doc>
          <doc>
          <field name="id">12</field>
          <field name="type_s">child</field>
          <field name="COLOR_s">Blue</field>
          <field name="SIZE_s">XL</field>
          </doc>
          </doc>

          you need to pass required ToParentBlockJoinQuery to the configured request handler, for example

          http://localhost:8983/solr/collection1/blockJoinFacetRH?q=

          {!parent+which%3D%22type_s%3Aparent%22}

          type_s%3Achild&wt=json&indent=true&facet=true&child.facet.field=COLOR_s&child.facet.field=SIZE_s

          and it yields you the desired result

          {
          "responseHeader":

          { "status":0, "QTime":1}

          ,
          "response":{"numFound":1,"start":0,"docs":[

          { "id":"10", "type_s":"parent", "BRAND_s":"Nike", "_version_":1491642108914696192}

          ]
          },
          "facet_counts":{
          "facet_queries":{},
          "facet_fields":{},
          "facet_dates":{},
          "facet_ranges":{},
          "facet_intervals":{},
          "facet_fields":[
          "COLOR_s",[
          "Blue",1,
          "Red",1],
          "SIZE_s",[
          "XL",1]]}}

          Please take the latest patch, it contains fix related to just found caching issue.

          Show
          Dr Oleg Savrasov added a comment - In order to utilize proposed component, you need to configure it in solrconfig.xml and introduce some search handler which uses it, for example <searchComponent name="blockJoinFacet" class="org.apache.solr.handler.component.BlockJoinFacetComponent"> </searchComponent> <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler"> <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler> Please notice that only string docValues fields could be used for faceting, int type can be covered later, so you need to update appropriate fields configuration in schema.xml file, for example <field name="COLOR_s" type="string" indexed="true" stored="true" docValues="true"/> <field name="SIZE_s" type="string" indexed="true" stored="true" docValues="true"/> Then after indexing some set of hierarchical documents like <doc> <field name="id">10</field> <field name="type_s">parent</field> <field name="BRAND_s">Nike</field> <doc> <field name="id">11</field> <field name="type_s">child</field> <field name="COLOR_s">Red</field> <field name="SIZE_s">XL</field> </doc> <doc> <field name="id">12</field> <field name="type_s">child</field> <field name="COLOR_s">Blue</field> <field name="SIZE_s">XL</field> </doc> </doc> you need to pass required ToParentBlockJoinQuery to the configured request handler, for example http://localhost:8983/solr/collection1/blockJoinFacetRH?q= {!parent+which%3D%22type_s%3Aparent%22} type_s%3Achild&wt=json&indent=true&facet=true&child.facet.field=COLOR_s&child.facet.field=SIZE_s and it yields you the desired result { "responseHeader": { "status":0, "QTime":1} , "response":{"numFound":1,"start":0,"docs":[ { "id":"10", "type_s":"parent", "BRAND_s":"Nike", "_version_":1491642108914696192} ] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_fields":[ "COLOR_s",[ "Blue",1, "Red",1], "SIZE_s",[ "XL",1]]}} Please take the latest patch, it contains fix related to just found caching issue.
          Hide
          ash fo added a comment -

          Thank you, finally I got it working. Is it possible to include the integer and float fields in this patch as well? Two of my child fields are integer and float (retailer id and price) and I need to facet on them too.

          Show
          ash fo added a comment - Thank you, finally I got it working. Is it possible to include the integer and float fields in this patch as well? Two of my child fields are integer and float (retailer id and price) and I need to facet on them too.
          Hide
          Dr Oleg Savrasov added a comment -

          After investigating it, I've found that float and int types work fine for multivalued fields, i.e. they should be configured like

          <field name="RETAILER_ID" type="int" indexed="true" stored="true" docValues="true" multiValued="true"/>
          <field name="PRICE" type="float" indexed="true" stored="true" docValues="true" multiValued="true"/>

          Unit test in the patch is extended to cover int and float types.
          I'll try to find out if it's possible to make it working for multiValued="false".

          Show
          Dr Oleg Savrasov added a comment - After investigating it, I've found that float and int types work fine for multivalued fields, i.e. they should be configured like <field name="RETAILER_ID" type="int" indexed="true" stored="true" docValues="true" multiValued="true"/> <field name="PRICE" type="float" indexed="true" stored="true" docValues="true" multiValued="true"/> Unit test in the patch is extended to cover int and float types. I'll try to find out if it's possible to make it working for multiValued="false".
          Hide
          ash fo added a comment -

          Thank you.

          Could you please also include the 'child.facet.query'? A lot of times people want to know how many offers for example are in a specific price range, something like this:

          &child.facet.query=price :[1 TO 100]

          Show
          ash fo added a comment - Thank you. Could you please also include the 'child.facet.query'? A lot of times people want to know how many offers for example are in a specific price range, something like this: &child.facet.query=price : [1 TO 100]
          Hide
          ash fo added a comment -

          It seems that the patch isn't working with Solr cloud. When I have a single instance it works, but in cloud with multiple nodes and shards it just doesn't work. Is there a way to have this working with multiple nodes/shards? Thank you.

          Show
          ash fo added a comment - It seems that the patch isn't working with Solr cloud. When I have a single instance it works, but in cloud with multiple nodes and shards it just doesn't work. Is there a way to have this working with multiple nodes/shards? Thank you.
          Hide
          Dr Oleg Savrasov added a comment -

          Solr Cloud support has been implemented

          Show
          Dr Oleg Savrasov added a comment - Solr Cloud support has been implemented
          Hide
          Dr Oleg Savrasov added a comment -

          Please checkout the latest patch. Solr Cloud support has been implemented here. Please notice that in order to make it working you should make some configuration changes.

          1. If you have /select SearchHandler definition, you should add blockJoinFacet as a last component here, like

          <requestHandler name="/select" class="solr.SearchHandler">
          .....
          <arr name="last-components">
          <str>blockJoinFacet</str>
          </arr>
          </requestHandler>

          2. If you don't have /select SearchHandler definition, you should configure your custom BlockJoinFacet search handler with shards.qt parameter, which should reference on search handler name. For example:

          <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler">
          <lst name="defaults">
          <str name="shards.qt">blockJoinFacetRH</str>
          </lst>
          <arr name="last-components">
          <str>blockJoinFacet</str>
          </arr>
          </requestHandler>

          Show
          Dr Oleg Savrasov added a comment - Please checkout the latest patch. Solr Cloud support has been implemented here. Please notice that in order to make it working you should make some configuration changes. 1. If you have /select SearchHandler definition, you should add blockJoinFacet as a last component here, like <requestHandler name="/select" class="solr.SearchHandler"> ..... <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler> 2. If you don't have /select SearchHandler definition, you should configure your custom BlockJoinFacet search handler with shards.qt parameter, which should reference on search handler name. For example: <requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="shards.qt">blockJoinFacetRH</str> </lst> <arr name="last-components"> <str>blockJoinFacet</str> </arr> </requestHandler>
          Hide
          Dr Oleg Savrasov added a comment -

          Video from the Lucene Revolution talk is available here http://www.youtube.com/watch?v=Su5SHc_uJw8

          Show
          Dr Oleg Savrasov added a comment - Video from the Lucene Revolution talk is available here http://www.youtube.com/watch?v=Su5SHc_uJw8
          Hide
          Jacob Carter added a comment -

          I've applied this patch to the Solr 5.0.0 and with a index containing around 400k parent documents and 1.5 million child documents it's taking over a minute to return the values of a child facet and their counts. Is this performance to be expected at the present time or have I potentially misconfigured my instance?

          Show
          Jacob Carter added a comment - I've applied this patch to the Solr 5.0.0 and with a index containing around 400k parent documents and 1.5 million child documents it's taking over a minute to return the values of a child facet and their counts. Is this performance to be expected at the present time or have I potentially misconfigured my instance?
          Hide
          Dr Oleg Savrasov added a comment -

          Performance improvements are still under investigation at the moment. I don't have too much time these days so I cannot promise that I'll come up with some solution soon. But we keep working on it.

          Show
          Dr Oleg Savrasov added a comment - Performance improvements are still under investigation at the moment. I don't have too much time these days so I cannot promise that I'll come up with some solution soon. But we keep working on it.
          Hide
          Dr Oleg Savrasov added a comment -

          I don't think that we need to introduce one more new special child.facet.query parameter here.
          It looks like that it's possible to achieve the same result by specifying appropriate ToParentQuery in facet.query parameter.
          For example, facet.query=

          {!parent which=type_s:parent}

          price:[1 TO 100].
          But please notice that in this case facet.query result could count child documents which are not matched by search query.
          For example, there could be a parent document with two children. One child has COLOR_s:Red and price:200, while another one COLOR_s:Blue and price:50.
          If you request q=

          {!parent which=type_s:parent}

          COLOR_s:Red
          and facet.query=

          {!parent which=type_s:parent}

          price:[1 TO 100], this document is going go be counted.
          Sometimes it's OK, but if you want to eliminate this effect, you need to add child documents filter from q to facet.query.
          The best way to do it is introducing new http parameter, say qq=COLOR_s:Red and referencing it both from q and facet.query, i.e.
          q=

          {!parent which=type_s:parent v=$qq}

          &facet.query=

          {!parent which=type_s:parent}

          +price:[1 TO 100] +

          {!v=$qq}

          &qq=type_s:child&facet=true

          Show
          Dr Oleg Savrasov added a comment - I don't think that we need to introduce one more new special child.facet.query parameter here. It looks like that it's possible to achieve the same result by specifying appropriate ToParentQuery in facet.query parameter. For example, facet.query= {!parent which=type_s:parent} price: [1 TO 100] . But please notice that in this case facet.query result could count child documents which are not matched by search query. For example, there could be a parent document with two children. One child has COLOR_s:Red and price:200, while another one COLOR_s:Blue and price:50. If you request q= {!parent which=type_s:parent} COLOR_s:Red and facet.query= {!parent which=type_s:parent} price: [1 TO 100] , this document is going go be counted. Sometimes it's OK, but if you want to eliminate this effect, you need to add child documents filter from q to facet.query. The best way to do it is introducing new http parameter, say qq=COLOR_s:Red and referencing it both from q and facet.query, i.e. q= {!parent which=type_s:parent v=$qq} &facet.query= {!parent which=type_s:parent} +price: [1 TO 100] + {!v=$qq} &qq=type_s:child&facet=true
          Hide
          Jim Musil added a comment -

          Curious, how would you handle this if a user searches for "pink shoes" or "large gloves"?

          Show
          Jim Musil added a comment - Curious, how would you handle this if a user searches for "pink shoes" or "large gloves"?
          Hide
          Dr Oleg Savrasov added a comment -

          We call this kind of requests which mix and match fields from different related entities a "deep search". To handle such requests we need to create a composition of Boolean query which will provide linguistic matching and Block Join query which will allow to return top level document when match happened on nested document. This topic worth its own JIRA (or few of them). Here, we are focusing on faceting rather than matching.

          Show
          Dr Oleg Savrasov added a comment - We call this kind of requests which mix and match fields from different related entities a "deep search". To handle such requests we need to create a composition of Boolean query which will provide linguistic matching and Block Join query which will allow to return top level document when match happened on nested document. This topic worth its own JIRA (or few of them). Here, we are focusing on faceting rather than matching.
          Hide
          Dr Oleg Savrasov added a comment -

          Performance improvement patch, which is prepared for lucene_solr_5_2 branch. On my local test data it makes proposed component faster in about 25 times. Please notice that it's recommended to apply patch SOLR-7730 as well, since it yields significant performance benefits too.

          Show
          Dr Oleg Savrasov added a comment - Performance improvement patch, which is prepared for lucene_solr_5_2 branch. On my local test data it makes proposed component faster in about 25 times. Please notice that it's recommended to apply patch SOLR-7730 as well, since it yields significant performance benefits too.
          Hide
          Dr Oleg Savrasov added a comment -

          Proposed component has been reworked to utilize algorithm described here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html. As a result code became more elegant and faster in about 2 times in comparison with the previous version.

          Show
          Dr Oleg Savrasov added a comment - Proposed component has been reworked to utilize algorithm described here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html . As a result code became more elegant and faster in about 2 times in comparison with the previous version.
          Hide
          Dr Oleg Savrasov added a comment -

          Unit test is extended to cover single value flow

          Show
          Dr Oleg Savrasov added a comment - Unit test is extended to cover single value flow
          Hide
          Ishan Chattopadhyaya added a comment -

          IMHO this is an important issue to fix, and the patch looks good to me (based on initial look, and due to the tests included in the patch). It would be very good to have some committer attention here.

          Show
          Ishan Chattopadhyaya added a comment - IMHO this is an important issue to fix, and the patch looks good to me (based on initial look, and due to the tests included in the patch). It would be very good to have some committer attention here.
          Hide
          Mikhail Khludnev added a comment - - edited

          Ishan Chattopadhyaya I'd like to commit it. I just want to confirm that there is no veto from anyone.
          I also appreciate if colleagues leave feedbacks for the recent patch, especially about its' performance. Jacob Carter would you comment on that?

          To summarize, If we decide to go on, I'll add it into defaultComponents, after that user will be able to get aggregated facets for children fields along side with the usual one:

          q={!parent ...}...&facet=true&child.facet.field=COLOR
          

          Here is the brief use case description https://www.mail-archive.com/solr-user@lucene.apache.org/msg115732.html

          Show
          Mikhail Khludnev added a comment - - edited Ishan Chattopadhyaya I'd like to commit it. I just want to confirm that there is no veto from anyone. I also appreciate if colleagues leave feedbacks for the recent patch, especially about its' performance. Jacob Carter would you comment on that? To summarize, If we decide to go on, I'll add it into defaultComponents, after that user will be able to get aggregated facets for children fields along side with the usual one: q={!parent ...}...&facet= true &child.facet.field=COLOR Here is the brief use case description https://www.mail-archive.com/solr-user@lucene.apache.org/msg115732.html
          Hide
          Mikhail Khludnev added a comment -

          Colleagues! I need your advice.
          This patch disables query result caching (that requires to make NO_CHECK_QCACHE public), enforce execution query every time (of course only if params are present).
          It calculates facets ongoing with search via DelegatingCollector. It's quite different to what Solr usually does. And it requires to relax encapsulation to access ToParentBlockJoinQuery.BlockJoinScorer.swapChildDocs(int[]). To accommodate this keeping encapsulation, we can add some public accessor class to o.a.l.search.join or made it default and add a class with o.a.l.search.join package into solr codebase (%100 ugly).
          As an alternative, we can migrate closer to regular a Solr approach, calculate childDocset and run faceting over it. Please put your opinion, otherwise I'll go to IRC and repeat the question.

          Show
          Mikhail Khludnev added a comment - Colleagues! I need your advice. This patch disables query result caching (that requires to make NO_CHECK_QCACHE public), enforce execution query every time (of course only if params are present). It calculates facets ongoing with search via DelegatingCollector. It's quite different to what Solr usually does. And it requires to relax encapsulation to access ToParentBlockJoinQuery.BlockJoinScorer.swapChildDocs(int[]) . To accommodate this keeping encapsulation, we can add some public accessor class to o.a.l.search.join or made it default and add a class with o.a.l.search.join package into solr codebase (%100 ugly). As an alternative, we can migrate closer to regular a Solr approach, calculate childDocset and run faceting over it. Please put your opinion, otherwise I'll go to IRC and repeat the question.
          Hide
          Mikhail Khludnev added a comment - - edited

          Revamped the patch SOLR-5743.patch. Caveat, bitwise ticks! Now it provides both approaches:

          • BlockJoinFacetComponent - enforces searching by NO_CHECK_QCACHE obtains child matches via BlockJoinScorer.swapChildDocs(int[]) see ChildTrackingCollector in the patch.
          • BlockJoinFacetDocSetComponent - it works more like Solr with toplevel doc sets
            I think to include both components into 5.5 disabled by default to let users to experiment.
            remaining TODOs:
          • exclude parent docs from faceting
          • now it's hardcoded to mincount=1, either set to 0 or copypaste mincount params logic and will be
          • improve simple test to handle edge cases with fields and hits.

          Any concerns?

          Show
          Mikhail Khludnev added a comment - - edited Revamped the patch SOLR-5743.patch . Caveat, bitwise ticks! Now it provides both approaches: BlockJoinFacetComponent - enforces searching by NO_CHECK_QCACHE obtains child matches via BlockJoinScorer.swapChildDocs(int[]) see ChildTrackingCollector in the patch. BlockJoinFacetDocSetComponent - it works more like Solr with toplevel doc sets I think to include both components into 5.5 disabled by default to let users to experiment. remaining TODOs: exclude parent docs from faceting now it's hardcoded to mincount=1, either set to 0 or copypaste mincount params logic and will be improve simple test to handle edge cases with fields and hits. Any concerns?
          Hide
          Mikhail Khludnev added a comment -

          tweaked SOLR-5743.patch.
          BlockJoinFacetDistribTest found discrepancy in shards response with facet=false.
          single node or shards with facet=true

          {responseHeader={status=0,QTime=133},response={numFound=11,start=0,docs=[]},
          facet_counts={facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,5=2,6=1,l=1,m=3,maxi=3,xl=3,xml=3,xxl=1,xxxl=1}}}}
          

          shards without facet=true

          {responseHeader={status=0,QTime=64},child_facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,m=3,maxi=3,xl=3,xml=3,5=2,6=1,l=1,xxl=1,xxxl=1}},response={numFound=11,start=0,maxScore=0.0,docs=[]}}
          

          junit

          junit.framework.AssertionFailedError: .child_facet_fields!=response (unordered or missing)
          	at 
          ...
          org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:893)
          	at 
          ...
          org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:571)
          	at org.apache.solr.search.join.BlockJoinFacetDistribTest.testBJQFacetComponent(BlockJoinFacetDistribTest.java:127)
          
          Show
          Mikhail Khludnev added a comment - tweaked SOLR-5743.patch . BlockJoinFacetDistribTest found discrepancy in shards response with facet=false . single node or shards with facet=true {responseHeader={status=0,QTime=133},response={numFound=11,start=0,docs=[]}, facet_counts={facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,5=2,6=1,l=1,m=3,maxi=3,xl=3,xml=3,xxl=1,xxxl=1}}}} shards without facet=true {responseHeader={status=0,QTime=64},child_facet_fields={COLOR_s={black=6,fuchsia=8,magenta=2},SIZE_s={3=4,4=3,m=3,maxi=3,xl=3,xml=3,5=2,6=1,l=1,xxl=1,xxxl=1}},response={numFound=11,start=0,maxScore=0.0,docs=[]}} junit junit.framework.AssertionFailedError: .child_facet_fields!=response (unordered or missing) at ... org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:893) at ... org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:571) at org.apache.solr.search.join.BlockJoinFacetDistribTest.testBJQFacetComponent(BlockJoinFacetDistribTest.java:127)
          Hide
          Dr Oleg Savrasov added a comment -

          Fix for the distributed test failure

          Show
          Dr Oleg Savrasov added a comment - Fix for the distributed test failure
          Hide
          Mikhail Khludnev added a comment -

          I'm going to commit SOLR-5743.patch if there is no Christmas freeze.

          Show
          Mikhail Khludnev added a comment - I'm going to commit SOLR-5743.patch if there is no Christmas freeze.
          Hide
          Mikhail Khludnev added a comment -

          introducing ToParentBlockJoinQuery.ChildrenMatchesScorer to make javadoc happier

          Show
          Mikhail Khludnev added a comment - introducing ToParentBlockJoinQuery.ChildrenMatchesScorer to make javadoc happier
          Hide
          Mikhail Khludnev added a comment -

          now javadoc is perfect

          Show
          Mikhail Khludnev added a comment - now javadoc is perfect
          Hide
          ASF subversion and git services added a comment -

          Commit 1721644 from mkhl@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1721644 ]

          SOLR-5743: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters

          Show
          ASF subversion and git services added a comment - Commit 1721644 from mkhl@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1721644 ] SOLR-5743 : introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters
          Hide
          ASF subversion and git services added a comment -

          Commit 1721652 from mkhl@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1721652 ]

          SOLR-5743: merging: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters

          Show
          ASF subversion and git services added a comment - Commit 1721652 from mkhl@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721652 ] SOLR-5743 : merging: introducing BlockJoinFacet*Component which are acting on child.facet.field request parameters
          Hide
          Vijay Sekhri added a comment - - edited

          Hi Mikhail, Dr. Oleg
          The requirement to use this feature is to have ToParentBlockJoinQuery like

           q={!parent which=<allParents>}<someChildren> 

          To use the ParentBlockJoinQuery it needs to search on fields present in child document. In real world your parent document would have most of the common fields and child document would have only the different fields. For example just like BRAND_s, there will be fields like description_s, name_s, title_s, partnumber_s, etc. in the parent document only. As they are same for all the child documents , one would not repeat them in the child document, rather only keep them in the parent document only. In the child document , we would have attributes like COLOR_s, SIZE_s as the differ.

          Now for any real searches , one would search for fields like BRAND_s, description_s, name_s, title_s, partnumber_s, etc to return appropriate documents. However , those fields are only present in parent docs.

          So searching them like

           q={!parent which=type_s:parent}BRAND_s:Nike&facet=true&child.facet.field=COLOR_s 

          does not work because search on BRAND_s:Nike is present in parent document . It gives this error also
          child query must only match non-parent docs, but parent docID=2 matched childScorer=class org.apache.lucene.search.TermScorer

          One could search on fields from child like this without any problem.

           q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s 

          To use this feature do we have to copy all the common fields ( and thousands of such fields alike ) back into the child (repeating them for every child) and search on those fields ? For example copying brand_s field like this

          [{
           "id": 10,
           "type_s": "parent",
           "BRAND_s": "Nike",
           "_childDocuments_": [{
             "id": 11,
             "COLOR_s": "Red",
             "SIZE_s": "XL",
             "BRAND_s": "Nike",
           }, 
           {
           "id": 12,
           "COLOR_s": "Blue",
           "SIZE_s": "XL",
           "BRAND_s": "Nike",
           }]
          }]
          

          This way the query works

          q={!parent which=type_s:parent}BRAND_s:Nike&facet=true&child.facet.field=COLOR_s
          

          Or there is some other way where we can still use the facets on the child fields (SIZE_s) , aggregate the counts on the parent docs (id:10) and still search on the common fields from parent docs (BRAND_s) ?

          Show
          Vijay Sekhri added a comment - - edited Hi Mikhail, Dr. Oleg The requirement to use this feature is to have ToParentBlockJoinQuery like q={!parent which=<allParents>}<someChildren> To use the ParentBlockJoinQuery it needs to search on fields present in child document. In real world your parent document would have most of the common fields and child document would have only the different fields. For example just like BRAND_s, there will be fields like description_s, name_s, title_s, partnumber_s, etc. in the parent document only. As they are same for all the child documents , one would not repeat them in the child document, rather only keep them in the parent document only. In the child document , we would have attributes like COLOR_s, SIZE_s as the differ. Now for any real searches , one would search for fields like BRAND_s, description_s, name_s, title_s, partnumber_s, etc to return appropriate documents. However , those fields are only present in parent docs. So searching them like q={!parent which=type_s:parent}BRAND_s:Nike&facet= true &child.facet.field=COLOR_s does not work because search on BRAND_s:Nike is present in parent document . It gives this error also child query must only match non-parent docs, but parent docID=2 matched childScorer=class org.apache.lucene.search.TermScorer One could search on fields from child like this without any problem. q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s To use this feature do we have to copy all the common fields ( and thousands of such fields alike ) back into the child (repeating them for every child) and search on those fields ? For example copying brand_s field like this [{ "id" : 10, "type_s" : "parent" , "BRAND_s" : "Nike" , "_childDocuments_" : [{ "id" : 11, "COLOR_s" : "Red" , "SIZE_s" : "XL" , "BRAND_s" : "Nike" , }, { "id" : 12, "COLOR_s" : "Blue" , "SIZE_s" : "XL" , "BRAND_s" : "Nike" , }] }] This way the query works q={!parent which=type_s:parent}BRAND_s:Nike&facet= true &child.facet.field=COLOR_s Or there is some other way where we can still use the facets on the child fields (SIZE_s) , aggregate the counts on the parent docs (id:10) and still search on the common fields from parent docs (BRAND_s) ?
          Hide
          Mikhail Khludnev added a comment -

          hold on..
          I wonder why you can't intersect it with parent level filer

          q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s&fq=BRAND_s:Nike
          

          in this case no copying is necessary. Make sure you checked examples from the blog

          Show
          Mikhail Khludnev added a comment - hold on.. I wonder why you can't intersect it with parent level filer q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s&fq=BRAND_s:Nike in this case no copying is necessary. Make sure you checked examples from the blog
          Hide
          Vijay Sekhri added a comment - - edited

          I saw the blog few times before already. Thank you for writing it Mikhail . I am not sure it cover the searching uses cases. Searching and filtering are two different use cases.
          For example say you have these in your solrconfig as one of the requesthandler . As you can see it would search in a lot fields and boost based on which fields matches. Plus if you declare pf field for proximity and mm field for must match , relevancy kicks in . All of this I am not sure how can be still used just by mere filter. Searching return relevant docs with accounting for boosts. Filter remove docs that matches criteria.

                   <str name="qf">
                           primaryLnames^5.0 partnumber^11.0 itemnumber^11.0 description^0.5  fullmfpartno^5.0 mfpartno^5.0 xref^10.0 storeOriginSearchable^3.0 nameSearchable^10.0 brandSearchable^5.0  searchPhrase^1.0 searchableAttributesSearchable^1.0
                      </str>
                      <str name="pf">
          				primaryLnames^0.5 nameSearchable^1.0 description^0.1 storeOriginSearchable^0.3 brandSearchable^0.5  xref^1.1 searchableAttributesSearchable^0.1
          			 </str>
                      <str name="fl">*</str>
                      <str name="mm">
                          2<-1 5<-2 6<-50%
                      </str>
          			
          		

          Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . Real world scenarios would return parent docs based on some order of relevancy and boost criteria.

          Show
          Vijay Sekhri added a comment - - edited I saw the blog few times before already. Thank you for writing it Mikhail . I am not sure it cover the searching uses cases. Searching and filtering are two different use cases. For example say you have these in your solrconfig as one of the requesthandler . As you can see it would search in a lot fields and boost based on which fields matches. Plus if you declare pf field for proximity and mm field for must match , relevancy kicks in . All of this I am not sure how can be still used just by mere filter. Searching return relevant docs with accounting for boosts. Filter remove docs that matches criteria. <str name= "qf" > primaryLnames^5.0 partnumber^11.0 itemnumber^11.0 description^0.5 fullmfpartno^5.0 mfpartno^5.0 xref^10.0 storeOriginSearchable^3.0 nameSearchable^10.0 brandSearchable^5.0 searchPhrase^1.0 searchableAttributesSearchable^1.0 </str> <str name= "pf" > primaryLnames^0.5 nameSearchable^1.0 description^0.1 storeOriginSearchable^0.3 brandSearchable^0.5 xref^1.1 searchableAttributesSearchable^0.1 </str> <str name= "fl" >*</str> <str name= "mm" > 2<-1 5<-2 6<-50% </str> Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . Real world scenarios would return parent docs based on some order of relevancy and boost criteria.
          Hide
          Mikhail Khludnev added a comment -

          Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery .

          q=+BRAND_s:Nike +_query_:"{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"
          
          Show
          Mikhail Khludnev added a comment - Let me know if there is a way to still search, not filter and still use ToParentBlockJoinQuery . q=+BRAND_s:Nike +_query_: "{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"
          Hide
          Vijay Sekhri added a comment -

          Thank you Mikhail. I already tried that before already and it did not work. Now I found out why it was not working earlier. Apparently if you have defType=dismax in the requestHandler, then that type of sibling clause query does not work. Removing it works as expected. Thank you again.

          Show
          Vijay Sekhri added a comment - Thank you Mikhail. I already tried that before already and it did not work. Now I found out why it was not working earlier. Apparently if you have defType=dismax in the requestHandler, then that type of sibling clause query does not work. Removing it works as expected. Thank you again.
          Hide
          Erick Erickson added a comment -

          Should we close this and add trunk to the fixed versions?

          Show
          Erick Erickson added a comment - Should we close this and add trunk to the fixed versions?
          Hide
          Mikhail Khludnev added a comment -

          Cassandra Targett would mind to have a look at the wiki? I appreciate feedback about content and format as well. Thanks!

          Show
          Mikhail Khludnev added a comment - Cassandra Targett would mind to have a look at the wiki ? I appreciate feedback about content and format as well. Thanks!
          Hide
          Cassandra Targett added a comment -

          Hey Mikhail Khludnev, I'll take a look today - thanks!

          Show
          Cassandra Targett added a comment - Hey Mikhail Khludnev , I'll take a look today - thanks!
          Hide
          Cassandra Targett added a comment -

          Please check out the changes I made and let me know if my edits made any information incorrect. I tried to study this issue a bit for the background, but might have misunderstood something.

          Show
          Cassandra Targett added a comment - Please check out the changes I made and let me know if my edits made any information incorrect. I tried to study this issue a bit for the background, but might have misunderstood something.
          Hide
          Vijay Sekhri added a comment -

          Hi Mikhail,
          I did benchmark testing of this feature to determine the efficiency and performance .
          In our stress environment I have roughly 57 Mil documents in solr index. 10 shards and each shard hosting around 5.7 Mil documents . Each shard has one replica and one leader .
          Like in this figure.

          There is solrj service that connects to solr cluster hosted on 8 hosts and each having 3 JVM instances. So in total 24 round robin instances of solrj service running and issuing queries to solr cluster.
          Solr version is 5.3.1

          Here is the baseline
          With a load of 50 requests per seconds to the solrj service the average response times in service is 290 milliseconds. Same translated into solr cluster results in average response Qtimes of 22 milliseconds.
          Here is the picture of average response times at service

          Here is the picture of average response Qtime of the solr

          Now I converted most of the documents with parent child relationship . In total there were 27 Mil new child documents . So the total count of the documents increased from 57 Mil to 83 Mil documents. I converted all the queries into the format of parent child in the solrj service layer . Now with the same load the average response times in service increased to 1.3 seconds and average response Qtimes increased to 500 milliseconds.
          The solr version is 5.4. trunk with your code in it .

          Here is the picture of average response times at service with parent child

          Here is the picture of average response Qtime of the solr with parent child

          The overall performance was 10 times slower in solr layer and 3 times slower in solrj service layer with the same load .

          BTW I only tested with org.apache.solr.search.join.BlockJoinFacetComponent . Do you think that org.apache.solr.search.join.BlockJoinDocSetFacetComponent would be faster?

          Vijay

          Show
          Vijay Sekhri added a comment - Hi Mikhail, I did benchmark testing of this feature to determine the efficiency and performance . In our stress environment I have roughly 57 Mil documents in solr index. 10 shards and each shard hosting around 5.7 Mil documents . Each shard has one replica and one leader . Like in this figure. There is solrj service that connects to solr cluster hosted on 8 hosts and each having 3 JVM instances. So in total 24 round robin instances of solrj service running and issuing queries to solr cluster. Solr version is 5.3.1 Here is the baseline With a load of 50 requests per seconds to the solrj service the average response times in service is 290 milliseconds. Same translated into solr cluster results in average response Qtimes of 22 milliseconds. Here is the picture of average response times at service Here is the picture of average response Qtime of the solr Now I converted most of the documents with parent child relationship . In total there were 27 Mil new child documents . So the total count of the documents increased from 57 Mil to 83 Mil documents. I converted all the queries into the format of parent child in the solrj service layer . Now with the same load the average response times in service increased to 1.3 seconds and average response Qtimes increased to 500 milliseconds. The solr version is 5.4. trunk with your code in it . Here is the picture of average response times at service with parent child Here is the picture of average response Qtime of the solr with parent child The overall performance was 10 times slower in solr layer and 3 times slower in solrj service layer with the same load . BTW I only tested with org.apache.solr.search.join.BlockJoinFacetComponent . Do you think that org.apache.solr.search.join.BlockJoinDocSetFacetComponent would be faster? Vijay
          Hide
          Mikhail Khludnev added a comment -

          Vijay, here are a few notes:

          1. 290 milli vs Qtimes of 22 millis, here either I'm missing something or here is the room for performance engineering even not search specific ones. Although, it's an off-top.
          2. I wonder how you compare performance on different indexes, and how to interpret the results: it's either might say about inefficient algorithm, or about high model expenses. To evaluate the former, you can compare the block join facet performance with child only queries and child field facet counting. ie it's worth to compare performance of :
            q={!parent%20which=type_s:parent}COLOR_s:Blue&facet=true&child.facet.field=COLOR_s
            

            with

            q=COLOR_s:Blue&facet=true&facet.field=COLOR_s
            

            Comparing these numbers can evidence about aggregation efficiency (almost, see below).

          3. BlockJoinDocSetFacetComponent should be faster for rarely changed indexes. Notice: BlockJoinFacetComponent disables query result cache and this also might impact benchmarking results.
          Show
          Mikhail Khludnev added a comment - Vijay, here are a few notes: 290 milli vs Qtimes of 22 millis, here either I'm missing something or here is the room for performance engineering even not search specific ones. Although, it's an off-top. I wonder how you compare performance on different indexes, and how to interpret the results: it's either might say about inefficient algorithm, or about high model expenses. To evaluate the former, you can compare the block join facet performance with child only queries and child field facet counting. ie it's worth to compare performance of : q={!parent%20which=type_s:parent}COLOR_s:Blue&facet= true &child.facet.field=COLOR_s with q=COLOR_s:Blue&facet= true &facet.field=COLOR_s Comparing these numbers can evidence about aggregation efficiency (almost, see below). BlockJoinDocSetFacetComponent should be faster for rarely changed indexes. Notice: BlockJoinFacetComponent disables query result cache and this also might impact benchmarking results.
          Hide
          Vijay Sekhri added a comment -

          Hi Mikhail,
          There were 2 reasons why the performance was bad I realized.
          a) For a whole lot of queries (internally generated by solr to different shards ) you code was giving a NPE. That made our service layer get the exception and do another query that added up to the overall response times (QTime). The NPE was not happening on all queries though. However, whenever it would happen it would degrade the performance because of multiple queries. This is the code where it was happening

          14:00:20,751 ERROR [org.apache.solr.servlet.HttpSolrCall] (http-/10.235.43.43:8580-82) null:java.lang.NullPointerException
                  at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100)
                  at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87)
          

          at this line

           final int[] docNums = blockJoinScorer.swapChildDocs(childDocs);
          

          because sometime the blockJoinScorer object would be null. Again this would happen half of the time but other half it would be fine.

          So I changed the code

          		
              if(blockJoinScorer == null) {
                  //System.out.println("blockJoinScorer is NULL");
                  return;
              }
          

          and reran my load and it brought down performance back to 60 millisecond from 200 milliseconds.

          b) All my queries were doing a wild card match like this

          		
          q={!parent%20which=type_s:parent}id:*_child
          

          and I changed that to

          		
          q={!parent%20which=type_s:parent}type_s:child
          

          This further brought down the qTimes to 30 milliseconds. Granted it is a bit higher than baseline but it is acceptable. Please let me know what to do about that NPE in the code. I am not sure if what I did is functionally correct or not.

          -regards

          Show
          Vijay Sekhri added a comment - Hi Mikhail, There were 2 reasons why the performance was bad I realized. a) For a whole lot of queries (internally generated by solr to different shards ) you code was giving a NPE. That made our service layer get the exception and do another query that added up to the overall response times (QTime). The NPE was not happening on all queries though. However, whenever it would happen it would degrade the performance because of multiple queries. This is the code where it was happening 14:00:20,751 ERROR [org.apache.solr.servlet.HttpSolrCall] (http-/10.235.43.43:8580-82) null :java.lang.NullPointerException at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100) at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87) at this line final int [] docNums = blockJoinScorer.swapChildDocs(childDocs); because sometime the blockJoinScorer object would be null. Again this would happen half of the time but other half it would be fine. So I changed the code if (blockJoinScorer == null ) { // System .out.println( "blockJoinScorer is NULL" ); return ; } and reran my load and it brought down performance back to 60 millisecond from 200 milliseconds. b) All my queries were doing a wild card match like this q={!parent%20which=type_s:parent}id:*_child and I changed that to q={!parent%20which=type_s:parent}type_s:child This further brought down the qTimes to 30 milliseconds. Granted it is a bit higher than baseline but it is acceptable. Please let me know what to do about that NPE in the code. I am not sure if what I did is functionally correct or not. -regards
          Hide
          Mikhail Khludnev added a comment -

          Vijay,
          this NPE is a twin of SOLR-8643, SOLR-8644 (I'll comment them soon too). Though it's might be caused by specific form of queries in SolrCloud. Could you please expose a few following line to catch which queries particularly cause a NPE?
          And, yes - BlockJoinFacetDocSetComponent shouldn't be impacted by this scorer routine

          Show
          Mikhail Khludnev added a comment - Vijay, this NPE is a twin of SOLR-8643 , SOLR-8644 (I'll comment them soon too). Though it's might be caused by specific form of queries in SolrCloud. Could you please expose a few following line to catch which queries particularly cause a NPE? And, yes - BlockJoinFacetDocSetComponent shouldn't be impacted by this scorer routine
          Hide
          Vijay Sekhri added a comment -

          Hi Mikhail,
          It could be related to stats query that does not even have any ToParentBlockJoin syntax . Here is one example

          15:07:56,736 INFO  [org.apache.solr.core.SolrCore.Request] (http-/10.235.43.43:8580-143) [core1]  webapp=/solr path=/select 
          params={shards.qt=searchStandard&tie=0.01&stats=true&distrib=false&q.alt=*:*&originIP=10.235.52.131&collection=search1&shards.tolerant=true&version=2&NOW=1454360876733&shard.url=http://solrx331p.qa.ch3.s.com:8580/solr/core1/|http://solrx351p.qa.ch3.s.com:8580/solr/core1/&fl=id&fl=score&bf=%0a++++++++++++&timeAllowed=10000&qt=searchStandard&fsv=true&fq=catalogs:(("10104"))&fq=searchableAttributes:(("Metal%3DTri+color"))&fq=brand:("Black+Hills+Gold")&fq=discount:("70")&fq=primaryCategory:("10104_3_Jewelry_Diamonds_Rings")&mm=%0a++++++++++++++++2<-1+5<-2+6<-50%25%0a++++++++++++&hasOrigCategories=1&qf=%0a+++++++++++++++++primaryLnames^5.0+partnumber^11.0+itemnumber^11.0+fullmfpartno^5.0+mfpartno^5.0+xref^10.0+storeOriginSearchable^3.0+nameSearchable^10.0+brandSearchable^5.0++searchPhrase^1.0++searchableAttributesSearchable^1.0++++%0a++++++++++++&wt=javabin&rows=0&pf=%0a+++++++++++++++primaryLnames^0.5+nameSearchable^1.0+storeOriginSearchable^0.3+brandSearchable^0.5++xref^1.1+searchableAttributesSearchable^0.1%0a++++++++++++&shards.purpose=516&start=0&q=white+diamonds+diamonds+elizabeth+taylor+body+lotion&bot=true&stats.field=price_10151_f&isShard=true&ps=100} hits=0 status=0 QTime=0
          
          
          15:07:56,758 ERROR [org.apache.solr.handler.RequestHandlerBase] (http-/10.235.43.43:8580-26) java.lang.NullPointerException
                  at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100)
                  at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87)
                  at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:1153)
                  at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:350)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
          
          
          

          If you want I could revert back the code and run some load again to get more of these queries.
          Vijay

          Show
          Vijay Sekhri added a comment - Hi Mikhail, It could be related to stats query that does not even have any ToParentBlockJoin syntax . Here is one example 15:07:56,736 INFO [org.apache.solr.core.SolrCore.Request] (http-/10.235.43.43:8580-143) [core1] webapp=/solr path=/select params={shards.qt=searchStandard&tie=0.01&stats= true &distrib= false &q.alt=*:*&originIP=10.235.52.131&collection=search1&shards.tolerant= true &version=2&NOW=1454360876733&shard.url=http: //solrx331p.qa.ch3.s.com:8580/solr/core1/|http://solrx351p.qa.ch3.s.com:8580/solr/core1/&fl=id&fl=score&bf=%0a++++++++++++&timeAllowed=10000&qt=searchStandard&fsv= true &fq=catalogs:(( "10104" ))&fq=searchableAttributes:(( "Metal%3DTri+color" ))&fq=brand:( "Black+Hills+Gold" )&fq=discount:( "70" )&fq=primaryCategory:( "10104_3_Jewelry_Diamonds_Rings" )&mm=%0a++++++++++++++++2<-1+5<-2+6<-50%25%0a++++++++++++&hasOrigCategories=1&qf=%0a+++++++++++++++++primaryLnames^5.0+partnumber^11.0+itemnumber^11.0+fullmfpartno^5.0+mfpartno^5.0+xref^10.0+storeOriginSearchable^3.0+nameSearchable^10.0+brandSearchable^5.0++searchPhrase^1.0++searchableAttributesSearchable^1.0++++%0a++++++++++++&wt=javabin&rows=0&pf=%0a+++++++++++++++primaryLnames^0.5+nameSearchable^1.0+storeOriginSearchable^0.3+brandSearchable^0.5++xref^1.1+searchableAttributesSearchable^0.1%0a++++++++++++&shards.purpose=516&start=0&q=white+diamonds+diamonds+elizabeth+taylor+body+lotion&bot= true &stats.field=price_10151_f&isShard= true &ps=100} hits=0 status=0 QTime=0 15:07:56,758 ERROR [org.apache.solr.handler.RequestHandlerBase] (http-/10.235.43.43:8580-26) java.lang.NullPointerException at org.apache.solr.search.join.BlockJoinFacetCollector.incrementFacets(BlockJoinFacetCollector.java:100) at org.apache.solr.search.join.BlockJoinFacetCollector.collect(BlockJoinFacetCollector.java:87) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:1153) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:350) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) If you want I could revert back the code and run some load again to get more of these queries. Vijay
          Hide
          Mikhail Khludnev added a comment -

          This, NPE is pretty strange, it's a gap in error handling functionality, but request log has status=0 it couldn't happen if an exception occurs. These query parameters can't enable BlockJoinFacet component. Actual query parameters causing this NPE should follow stacktrace.

          Show
          Mikhail Khludnev added a comment - This, NPE is pretty strange, it's a gap in error handling functionality, but request log has status=0 it couldn't happen if an exception occurs. These query parameters can't enable BlockJoinFacet component. Actual query parameters causing this NPE should follow stacktrace.
          Hide
          Vijay Sekhri added a comment -

          Mikhail,
          For issues like these and some others should I open a separate Jira for manageability ? I also observed that facet.prefix is not being honored on child.facet.field . Let me know and I can open a Jira .
          Thanks

          Show
          Vijay Sekhri added a comment - Mikhail, For issues like these and some others should I open a separate Jira for manageability ? I also observed that facet.prefix is not being honored on child.facet.field . Let me know and I can open a Jira . Thanks
          Hide
          Mikhail Khludnev added a comment -

          Vijay,
          Sure you can open, but personally I prefer to postpone any such extensions until we merge child.facet engine into json facets.

          Show
          Mikhail Khludnev added a comment - Vijay, Sure you can open, but personally I prefer to postpone any such extensions until we merge child.facet engine into json facets.
          Hide
          Vijay Sekhri added a comment -

          I created a new JIRA and also attached a rudimentary patch that takes care of NPE and honors facet.prefix.
          https://issues.apache.org/jira/secure/attachment/12792872/SOLR-8834.patch
          https://issues.apache.org/jira/browse/SOLR-8834

          Vijay

          Show
          Vijay Sekhri added a comment - I created a new JIRA and also attached a rudimentary patch that takes care of NPE and honors facet.prefix. https://issues.apache.org/jira/secure/attachment/12792872/SOLR-8834.patch https://issues.apache.org/jira/browse/SOLR-8834 Vijay
          Hide
          Alisa Zhila added a comment -

          Hi Mikhail and Oleg,
          Thank you for introducing this new feature and describing it your blog (http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev).

          I am wondering whether BlockJoin faceting supports the parameters for output "limit" and "mincount"? Unfortunately, I could not find any mentions in the wiki (https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting).

          My experiments with trying to use the common facet.limit and facet.mincount syntax failed:
          /bjqfacet?q=

          {!parent%20which=type_s:doc}

          type_s:doc.enriched.text.keywords&facet=true&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq=

          {!parent%20which=type_s:doc}

          type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true

          {
          "responseHeader":

          { "status":0, "QTime":1}

          ,
          "response":

          {"numFound":19,"start":0,"docs":[] }

          ,
          "facet_counts":{
          "facet_queries":{},
          "facet_fields":{
          "text_t":[
          "128x",1,
          "18xx",1,
          ...
          "ab",2,
          "access",5,
          "account",1,
          "accounts",1,
          "action",2,
          "address",1,
          "addressee",1,
          "afternoon",3,
          "agreement",2,
          ...
          "wsj",1,
          "year",2,
          "yoder",2,
          "york",1]}}

          As you see, the buckets are sorted in alphabetical order and the response yields all of them.

          Is limit and mincount implemented for BlockJoin faceting? If yes, can its usage be described in the wiki?

          Thank you!

          Show
          Alisa Zhila added a comment - Hi Mikhail and Oleg, Thank you for introducing this new feature and describing it your blog ( http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev ). I am wondering whether BlockJoin faceting supports the parameters for output "limit" and "mincount"? Unfortunately, I could not find any mentions in the wiki ( https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ). My experiments with trying to use the common facet.limit and facet.mincount syntax failed: /bjqfacet?q= {!parent%20which=type_s:doc} type_s:doc.enriched.text.keywords&facet=true&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq= {!parent%20which=type_s:doc} type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true { "responseHeader": { "status":0, "QTime":1} , "response": {"numFound":19,"start":0,"docs":[] } , "facet_counts":{ "facet_queries":{}, "facet_fields":{ "text_t":[ "128x",1, "18xx",1, ... "ab",2, "access",5, "account",1, "accounts",1, "action",2, "address",1, "addressee",1, "afternoon",3, "agreement",2, ... "wsj",1, "year",2, "yoder",2, "york",1]}} As you see, the buckets are sorted in alphabetical order and the response yields all of them. Is limit and mincount implemented for BlockJoin faceting? If yes, can its usage be described in the wiki? Thank you!
          Hide
          Mikhail Khludnev added a comment -

          Alisa,

          My personal preference is to don't implement limit and mincount here, but merge this logic into JSON Facets. But I don't know how, yet.

          Show
          Mikhail Khludnev added a comment - Alisa, My personal preference is to don't implement limit and mincount here, but merge this logic into JSON Facets. But I don't know how, yet.

            People

            • Assignee:
              Mikhail Khludnev
              Reporter:
              abipc
            • Votes:
              28 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development