Solr
  1. Solr
  2. SOLR-4905

Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Using a non-SolrCloud setup, it is possible to perform cross core joins (http://wiki.apache.org/solr/Join). When testing with SolrCloud, however, neither the collection name, alias name (we have created aliases to SolrCloud collections), or the automatically generated core name (i.e. <collection>_shard1_replica1) work as the fromIndex parameter for a cross-core join.

      1. patch.txt
        3 kB
        Jack Lo
      2. SOLR-4905.patch
        12 kB
        Timothy Potter
      3. SOLR-4905.patch
        11 kB
        Timothy Potter

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          I'm pretty sure Join only supports a single node.

          Show
          Mark Miller added a comment - I'm pretty sure Join only supports a single node.
          Hide
          Philip K. Warren added a comment -

          Would this work if there is only a single shard in the collection being joined?

          Show
          Philip K. Warren added a comment - Would this work if there is only a single shard in the collection being joined?
          Hide
          Yonik Seeley added a comment -

          You can do a cross-core join on any two cores in the same node (JVM).
          Join does not (yet) have cross-node support, and doing so would always be an order of magnitude slower in any case.

          Show
          Yonik Seeley added a comment - You can do a cross-core join on any two cores in the same node (JVM). Join does not (yet) have cross-node support, and doing so would always be an order of magnitude slower in any case.
          Hide
          Hoss Man added a comment -

          Switching from BUg to feature request.

          FWIW: this limitation has been documented in the wiki, but not easy to find ... i've attempted to remedy that by adding better cross linking between the SolrCloud and DistributeSearch wiki pages, as well as adding an explicit note on the Join page

          https://wiki.apache.org/solr/SolrCloud#Known_Limitations
          https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
          https://wiki.apache.org/solr/Join#Limitations

          Show
          Hoss Man added a comment - Switching from BUg to feature request. FWIW: this limitation has been documented in the wiki, but not easy to find ... i've attempted to remedy that by adding better cross linking between the SolrCloud and DistributeSearch wiki pages, as well as adding an explicit note on the Join page https://wiki.apache.org/solr/SolrCloud#Known_Limitations https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations https://wiki.apache.org/solr/Join#Limitations
          Hide
          Philip K. Warren added a comment -

          In our current configuration, both of the cores are in the same JVM. Both are single-shard collections with a replica. I understand however that it is not likely to be implemented for SolrCloud without a better story for distributed search.

          Thanks for the information.

          Show
          Philip K. Warren added a comment - In our current configuration, both of the cores are in the same JVM. Both are single-shard collections with a replica. I understand however that it is not likely to be implemented for SolrCloud without a better story for distributed search. Thanks for the information.
          Hide
          Yonik Seeley added a comment -

          In our current configuration, both of the cores are in the same JVM. Both are single-shard collections with a replica.

          If that's the case, you should be able to get it to work using the actual core names. Is it not working?
          You could also try passing "distrib=false" to manually prevent distributed search from kicking in.

          Show
          Yonik Seeley added a comment - In our current configuration, both of the cores are in the same JVM. Both are single-shard collections with a replica. If that's the case, you should be able to get it to work using the actual core names. Is it not working? You could also try passing "distrib=false" to manually prevent distributed search from kicking in.
          Hide
          Chris Toomey added a comment -

          I've been experimenting with cross-core joins in SolrCloud and found that this problem does indeed occur when the collection being joined is created via the collections API. Solr returns the error "Cross-core join: no such core <fromIndex name>".

          But if the collection being joined is created at bootstrap time, cross-core join does work and doesn't give that error.

          So the bug must be somewhere in how the collection create command is implemented, such that the join plugin doesn't find created collection's core? But even if it worked as is, the fact that the collection create command names the created cores "<collectionName>_shard<x>_replica<y>" instead of "<collectionName>" would prevent this from working across multiple nodes (like 2 replicas) since the fromIndex name is different on each node. So it seems that naming convention should be changed, agreed? Should I file a separate bug on that?

          My use case is where the collection being joined is small and doesn't need to be sharded, but the "outer" collection may be large and need to be sharded. So even without distributed joins, this should work, since the collection being joined would be replicated to each node and hence both cores being joined would be on the same node (for each shard of the outer collection). And like I say, it does work when all the collections are configured/created at bootstrap time. The problem there is that there's only a single numShards param. which applies to every bootstrapped collection, so having the outer collection sharded and the inner one not isn't possible.

          Show
          Chris Toomey added a comment - I've been experimenting with cross-core joins in SolrCloud and found that this problem does indeed occur when the collection being joined is created via the collections API. Solr returns the error "Cross-core join: no such core <fromIndex name>". But if the collection being joined is created at bootstrap time, cross-core join does work and doesn't give that error. So the bug must be somewhere in how the collection create command is implemented, such that the join plugin doesn't find created collection's core? But even if it worked as is, the fact that the collection create command names the created cores "<collectionName>_shard<x>_replica<y>" instead of "<collectionName>" would prevent this from working across multiple nodes (like 2 replicas) since the fromIndex name is different on each node. So it seems that naming convention should be changed, agreed? Should I file a separate bug on that? My use case is where the collection being joined is small and doesn't need to be sharded, but the "outer" collection may be large and need to be sharded. So even without distributed joins, this should work, since the collection being joined would be replicated to each node and hence both cores being joined would be on the same node (for each shard of the outer collection). And like I say, it does work when all the collections are configured/created at bootstrap time. The problem there is that there's only a single numShards param. which applies to every bootstrapped collection, so having the outer collection sharded and the inner one not isn't possible.
          Hide
          Utkarsh Sengar added a comment - - edited

          @Chris I have solrcloud 4.4 running with 3 shards and 2 cores. A cross-core join does not work even when cores are created during the bootstrap time.

          This is my query:

           http://SOLR_SERVER/solr/merchant/select?q={!join from=merchantId to=merchantId fromIndex=deals}apple  

          This query returns no documents, full response with debugQuery=true: http://apaste.info/uHOw

          But both of my cores have a common merchantId when I query for "apple". So I think this problem is a general problem in solrcloud.

          Show
          Utkarsh Sengar added a comment - - edited @Chris I have solrcloud 4.4 running with 3 shards and 2 cores. A cross-core join does not work even when cores are created during the bootstrap time. This is my query: http://SOLR_SERVER/solr/merchant/select?q={!join from=merchantId to=merchantId fromIndex=deals}apple This query returns no documents, full response with debugQuery=true: http://apaste.info/uHOw But both of my cores have a common merchantId when I query for "apple". So I think this problem is a general problem in solrcloud.
          Hide
          Jack Lo added a comment - - edited

          I have noticed this issue have been lying around for a year, seems like nobody is bother to use JOIN in solrcloud, so I decide to tackle this myself.

          A small patch has been uploaded here to allow fromIndex to specify a collection under cloud environment. Currently, it works if the fromIndex collection is a single shard having at least 1 replica on each node. I am planning to support multi-shard collection but not really sure how to do it given I am not that familiar with the internal mechanics of solr.

          Even if we support multishard, given the current implementation of JoinQParser, I think we can only support it when a collection with at least 1 replica of all shards physically residing on every node. We need all local IndexSearcher. If we need full solrcloud join support, I think we need to revamp JoinQParser or make something on a higher level to gather terms collection from remote shards on StandardRequestHandler.

          By the way, I noticed we haven't use JoinUtil in LUCENE, is there a reason to not use it, their implementation seems to be more cleaner than the one in SOLR right now, I have no idea how JoinQParser works, especally the getdocset stage.

          Show
          Jack Lo added a comment - - edited I have noticed this issue have been lying around for a year, seems like nobody is bother to use JOIN in solrcloud, so I decide to tackle this myself. A small patch has been uploaded here to allow fromIndex to specify a collection under cloud environment. Currently, it works if the fromIndex collection is a single shard having at least 1 replica on each node. I am planning to support multi-shard collection but not really sure how to do it given I am not that familiar with the internal mechanics of solr. Even if we support multishard, given the current implementation of JoinQParser, I think we can only support it when a collection with at least 1 replica of all shards physically residing on every node. We need all local IndexSearcher. If we need full solrcloud join support, I think we need to revamp JoinQParser or make something on a higher level to gather terms collection from remote shards on StandardRequestHandler. By the way, I noticed we haven't use JoinUtil in LUCENE, is there a reason to not use it, their implementation seems to be more cleaner than the one in SOLR right now, I have no idea how JoinQParser works, especally the getdocset stage.
          Hide
          Timothy Potter added a comment -

          Good stuff here ... reminds me of a Map-side JOIN in the Map/Reduce world by having a smaller dataset locally available for all shards of a larger data set. I'm cleaning up the patch and adding some unit tests.

          Show
          Timothy Potter added a comment - Good stuff here ... reminds me of a Map-side JOIN in the Map/Reduce world by having a smaller dataset locally available for all shards of a larger data set. I'm cleaning up the patch and adding some unit tests.
          Hide
          Timothy Potter added a comment -

          Here's an updated patch with a few things cleaned up from the previous one, such as checking replica health before querying. Patch also includes a basic unit test.

          Show
          Timothy Potter added a comment - Here's an updated patch with a few things cleaned up from the previous one, such as checking replica health before querying. Patch also includes a basic unit test.
          Hide
          Timothy Potter added a comment -

          Renaming this ticket to be more clear about the solution it is solving.

          Show
          Timothy Potter added a comment - Renaming this ticket to be more clear about the solution it is solving.
          Hide
          Timothy Potter added a comment -

          Updated patch that removes the change to core container in the previous patch and checks if the fromIndex is a collection alias. I wasn't sure how to handle aliases that point to multiple collections, so that produces an error now. I think this is ready to commit unless there are any other comments?

          Show
          Timothy Potter added a comment - Updated patch that removes the change to core container in the previous patch and checks if the fromIndex is a collection alias. I wasn't sure how to handle aliases that point to multiple collections, so that produces an error now. I think this is ready to commit unless there are any other comments?
          Hide
          ASF subversion and git services added a comment -

          Commit 1656622 from Timothy Potter in branch 'dev/trunk'
          [ https://svn.apache.org/r1656622 ]

          SOLR-4905: Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes

          Show
          ASF subversion and git services added a comment - Commit 1656622 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1656622 ] SOLR-4905 : Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes
          Hide
          ASF subversion and git services added a comment -

          Commit 1657701 from Timothy Potter in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1657701 ]

          SOLR-4905: Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes

          Show
          ASF subversion and git services added a comment - Commit 1657701 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1657701 ] SOLR-4905 : Allow fromIndex parameter to JoinQParserPlugin to refer to a single-sharded collection that has a replica on all nodes
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release
          Hide
          Mikhail Khludnev added a comment -

          Timothy Potter, I want to got further, but not too far..
          Joel Bernstein wrote in https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/

          Certain Solr features such as grouping’s ngroups feature and joins require documents to be co-located in the same core or vm.

          Is it worth to raise a jira to allow q-time join for collection with shards collocated via compositeId or implicit router with router.field? Concerns and suggestions are welcome!

          Show
          Mikhail Khludnev added a comment - Timothy Potter , I want to got further, but not too far.. Joel Bernstein wrote in https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/ Certain Solr features such as grouping’s ngroups feature and joins require documents to be co-located in the same core or vm. Is it worth to raise a jira to allow q-time join for collection with shards collocated via compositeId or implicit router with router.field ? Concerns and suggestions are welcome!
          Hide
          Yonik Seeley added a comment -

          Is it worth to raise a jira to allow q-time join for collection with shards collocated

          Not sure I understand... doesn't this already work?

          Show
          Yonik Seeley added a comment - Is it worth to raise a jira to allow q-time join for collection with shards collocated Not sure I understand... doesn't this already work?
          Hide
          Mikhail Khludnev added a comment -

          Does it?
          When I create join collection with two shards, I got

          ...
               "q": "{!join from=manu_id_s to=id fromIndex=join}name:one",
          ...
            },
            "error": {
              "msg": "SolrCloud join: multiple shards not yet supported join",
              "code": 400
            }
          

          The purpose of this jira SOLR-4905 was to allow it for fully replicated (collocated) single shard collection.

          Show
          Mikhail Khludnev added a comment - Does it? When I create join collection with two shards , I got ... "q" : "{!join from=manu_id_s to=id fromIndex=join}name:one" , ... }, "error" : { "msg" : "SolrCloud join: multiple shards not yet supported join" , "code" : 400 } The purpose of this jira SOLR-4905 was to allow it for fully replicated (collocated) single shard collection.
          Hide
          Paul Blanchaert added a comment -

          Today, I came across the same issue as reported by Mikhail Khludnev ("SolrCloud join: multiple shards not yet supported join").

          I would expect that there is no issue to perform a cross-core join over sharded collections when the following conditions are met:
          1) both collections are sharded with the same replicationFactor and numShards
          2) router.field of the collection is set to the same "key-field"
          3) the cross-core join is based on that same (from/to) "key-field"

          The router.field setup ensures that documents with the same "key-field" are routed to the same node.
          So the combination based on the "key-field" should always be available within the same node.

          From a user perspective, I believe these assumptions seem to be the "normal" use-case in the cross-core join.
          So, when these assumptions are correct, the next question is probably how feasable it is to implement this logic in the JoinQParserPlugin?

          Thanks for your feedback.

          Show
          Paul Blanchaert added a comment - Today, I came across the same issue as reported by Mikhail Khludnev ("SolrCloud join: multiple shards not yet supported join"). I would expect that there is no issue to perform a cross-core join over sharded collections when the following conditions are met: 1) both collections are sharded with the same replicationFactor and numShards 2) router.field of the collection is set to the same "key-field" 3) the cross-core join is based on that same (from/to) "key-field" The router.field setup ensures that documents with the same "key-field" are routed to the same node. So the combination based on the "key-field" should always be available within the same node. From a user perspective, I believe these assumptions seem to be the "normal" use-case in the cross-core join. So, when these assumptions are correct, the next question is probably how feasable it is to implement this logic in the JoinQParserPlugin? Thanks for your feedback.
          Hide
          Mikhail Khludnev added a comment -

          Paul Blanchaert would you mind to raise a separate jira for this?

          Show
          Mikhail Khludnev added a comment - Paul Blanchaert would you mind to raise a separate jira for this?
          Hide
          Mikhail Khludnev added a comment -

          Trey Grainger responded to the list, reposting here

          Just to add another voice to the discussion, I have the exact same use case described by Paul and Mikhail that I'm working through a Proof of Concept for right now. I'd very much like to see the "single shard collection with a replica on all nodes" restriction removed.

          Show
          Mikhail Khludnev added a comment - Trey Grainger responded to the list, reposting here Just to add another voice to the discussion, I have the exact same use case described by Paul and Mikhail that I'm working through a Proof of Concept for right now. I'd very much like to see the "single shard collection with a replica on all nodes" restriction removed.
          Hide
          Paul Blanchaert added a comment -

          Hi Mikhail,
          I wanted to raise the issue today, but didn't find the time before the weekend.
          Also found an issue while debugging, will report next monday...
          Thanks

          Show
          Paul Blanchaert added a comment - Hi Mikhail, I wanted to raise the issue today, but didn't find the time before the weekend. Also found an issue while debugging, will report next monday... Thanks
          Hide
          Paul Blanchaert added a comment -

          SOLR-8297 created.

          Show
          Paul Blanchaert added a comment - SOLR-8297 created.

            People

            • Assignee:
              Timothy Potter
              Reporter:
              Philip K. Warren
            • Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development