Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9057 CloudSolrClient should be able to work w/o ZK url
  3. SOLR-10446

Http based ClusterStateProvider (CloudSolrClient needn't talk to ZooKeeper)

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.6, 7.0
    • Component/s: SolrJ
    • Labels:
      None

      Description

      An HTTP based ClusterStateProvider to remove the sole dependency of CloudSolrClient on ZooKeeper, and hence provide an optional way for CSC to access cluster state without requiring ZK.

      1. SOLR-10446.doc.patch
        0.8 kB
        Ishan Chattopadhyaya
      2. SOLR-10446.patch
        41 kB
        Ishan Chattopadhyaya
      3. SOLR-10446.patch
        32 kB
        Ishan Chattopadhyaya
      4. SOLR-10446.patch
        31 kB
        Ishan Chattopadhyaya
      5. SOLR-10446.patch
        29 kB
        Ishan Chattopadhyaya
      6. SOLR-10446.patch
        29 kB
        Ishan Chattopadhyaya
      7. SOLR-10446.patch
        19 kB
        Ishan Chattopadhyaya
      8. SOLR-10446.patch
        13 kB
        Ishan Chattopadhyaya
      9. SOLR-9057.patch
        13 kB
        Ishan Chattopadhyaya

        Issue Links

          Activity

          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Adding a WIP patch that introduces a Solr instance based ClusterStateProvider.

          TODO: Add an endpoint for Collection Aliases and call that from here. I'll add it to a separate ticket.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Adding a WIP patch that introduces a Solr instance based ClusterStateProvider. TODO: Add an endpoint for Collection Aliases and call that from here. I'll add it to a separate ticket.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Updated patch. Using the LISTALIASES command.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Updated patch. Using the LISTALIASES command.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          You are only adding a optional way to use CSC without ZooKeeper, right? Pretty sure we don't want to eliminate the dependency on ZooKeeper completely.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - You are only adding a optional way to use CSC without ZooKeeper, right? Pretty sure we don't want to eliminate the dependency on ZooKeeper completely.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Yes thanks, that's right! I meant that this is to remove the "sole dependency" on ZK (by providing another option that doesn't depend on ZK). I'll fix the wording (English is my 3rd language ).

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Yes thanks, that's right! I meant that this is to remove the "sole dependency" on ZK (by providing another option that doesn't depend on ZK). I'll fix the wording (English is my 3rd language ).
          Hide
          noble.paul Noble Paul added a comment -
          • unused Map<String, Object> all = req.getParams().getAll(null); in CollectionsHandler
          • what is the endpoint for aliases in /v2?
          • use NamedList#asMap() instead of adding to Utils
          • in constructor, fetch the livenodes, instead of just relying on the seed nodes
          Show
          noble.paul Noble Paul added a comment - unused Map<String, Object> all = req.getParams().getAll(null); in CollectionsHandler what is the endpoint for aliases in /v2 ? use NamedList#asMap() instead of adding to Utils in constructor, fetch the livenodes, instead of just relying on the seed nodes
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          Updated patch as per Noble's review. This patch also contains SOLR-10447 patch.

          unused Map<String, Object> all = req.getParams().getAll(null); in CollectionsHandler

          Fixed

          what is the endpoint for aliases in /v2?

          /collections/aliases now

          use NamedList#asMap() instead of adding to Utils

          Fixed

          in constructor, fetch the livenodes, instead of just relying on the seed nodes

          Fixed

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Updated patch as per Noble's review. This patch also contains SOLR-10447 patch. unused Map<String, Object> all = req.getParams().getAll(null); in CollectionsHandler Fixed what is the endpoint for aliases in /v2? /collections/aliases now use NamedList#asMap() instead of adding to Utils Fixed in constructor, fetch the livenodes, instead of just relying on the seed nodes Fixed
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Updated the patch with the following changes:

          1. Instead of keeping HttpSolrClients for every liveNode, now creating the HSCs on the fly.
          2. Caching the liveNodes and aliases for a configurable timeout, defaulting to 5 seconds. After this timeout, a fetch is done for liveNodes and aliases upon a request.
          3. LISTALIASES endpoint now registered at /cluster/aliases for V2 APIs.
          4. All unit tests in CloudSolrClientTest now randomly use either the ZK based cluster state provider or this new Solr based cluster state provider.
          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Updated the patch with the following changes: Instead of keeping HttpSolrClients for every liveNode, now creating the HSCs on the fly. Caching the liveNodes and aliases for a configurable timeout, defaulting to 5 seconds. After this timeout, a fetch is done for liveNodes and aliases upon a request. LISTALIASES endpoint now registered at /cluster/aliases for V2 APIs. All unit tests in CloudSolrClientTest now randomly use either the ZK based cluster state provider or this new Solr based cluster state provider.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Renamed SolrClusterStateProvider to HttpClusterStateProvider.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Renamed SolrClusterStateProvider to HttpClusterStateProvider.
          Hide
          noble.paul Noble Paul added a comment -

          Why does the class HttpClusterStateProvider use deprecated methods?

          Show
          noble.paul Noble Paul added a comment - Why does the class HttpClusterStateProvider use deprecated methods?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Updated patch, removed calls to deprecated HttpSolrClient constructor calls.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Updated patch, removed calls to deprecated HttpSolrClient constructor calls.
          Hide
          noble.paul Noble Paul added a comment -
          • this has no error handling when there is no collection with that name available
          • code like below is bad
            Set<String> liveNodes = new HashSet((List<String>)(clusterStateMap.get("live_nodes")));
                if (liveNodes != null) {
                  this.liveNodes = liveNodes;
                  liveNodesTimestamp = System.nanoTime();
                }
            

            how can liveNodes be null

          what kind of exception are you trying to catch and why? what is the point in hitting another server after you get a SolrException?

          try (HttpSolrClient client = new HttpSolrClient.Builder().
                    withBaseSolrUrl(ZkStateReader.getBaseUrlForNodeName(nodeName, urlScheme)).
                    withHttpClient(httpClient).build()) {
                  ClusterState cs = fetchClusterState(client, collection);
                  return cs.getCollectionRef(collection);
                } catch (SolrServerException | IOException e) {
                  log.warn("Attempt to fetch cluster state from " +
                      ZkStateReader.getBaseUrlForNodeName(nodeName, urlScheme) + " failed.", e);
                }
          

          NamedList#asMap() does a deep copy . why do you even do it here?

           NamedList cluster = (SimpleOrderedMap) client.request(request).get("cluster");
             Map<String, Object> clusterStateMap = cluster.asMap(10); // contains live_nodes and collections
          
          Show
          noble.paul Noble Paul added a comment - this has no error handling when there is no collection with that name available code like below is bad Set< String > liveNodes = new HashSet((List< String >)(clusterStateMap.get( "live_nodes" ))); if (liveNodes != null ) { this .liveNodes = liveNodes; liveNodesTimestamp = System .nanoTime(); } how can liveNodes be null what kind of exception are you trying to catch and why? what is the point in hitting another server after you get a SolrException? try (HttpSolrClient client = new HttpSolrClient.Builder(). withBaseSolrUrl(ZkStateReader.getBaseUrlForNodeName(nodeName, urlScheme)). withHttpClient(httpClient).build()) { ClusterState cs = fetchClusterState(client, collection); return cs.getCollectionRef(collection); } catch (SolrServerException | IOException e) { log.warn( "Attempt to fetch cluster state from " + ZkStateReader.getBaseUrlForNodeName(nodeName, urlScheme) + " failed." , e); } NamedList#asMap() does a deep copy . why do you even do it here? NamedList cluster = (SimpleOrderedMap) client.request(request).get( "cluster" ); Map< String , Object > clusterStateMap = cluster.asMap(10); // contains live_nodes and collections
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          this has no error handling when there is no collection with that name available

          The exception for non-existent collection comes from CloudSolrClient when using ZkClientClusterStateProvider, and it came from HttpClusterStateProvider as per the patch. However, I've updated the patch now where I've caught the RemoteSolrException, checked for the exception message (as returned from the CLUSTERSTATUS api) and returned null when collection doesn't exist so that CloudSolrClient can return the same exception as it does when ZkClientClusterStateProvider. Added a test for this (CloudSolrClientTest#testCollectionDoesntExist).

          code like below is bad

          Removed the spurious null check.

          what kind of exception are you trying to catch and why? what is the point in hitting another server after you get a SolrException?

          I imagine a server that is struggling for some reason, throwing things like OOMs or timeouts or 404s. This server could possibly be not live at the moment. The point of hitting another server is that hopefully another live server would respond with the proper clusterstate/live_nodes.

          NamedList#asMap() does a deep copy . why do you even do it here?

          Ah, I didn't realize there exists a NamedList#asShallowMap() method. Switched to use that.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - this has no error handling when there is no collection with that name available The exception for non-existent collection comes from CloudSolrClient when using ZkClientClusterStateProvider, and it came from HttpClusterStateProvider as per the patch. However, I've updated the patch now where I've caught the RemoteSolrException, checked for the exception message (as returned from the CLUSTERSTATUS api) and returned null when collection doesn't exist so that CloudSolrClient can return the same exception as it does when ZkClientClusterStateProvider. Added a test for this (CloudSolrClientTest#testCollectionDoesntExist). code like below is bad Removed the spurious null check. what kind of exception are you trying to catch and why? what is the point in hitting another server after you get a SolrException? I imagine a server that is struggling for some reason, throwing things like OOMs or timeouts or 404s. This server could possibly be not live at the moment. The point of hitting another server is that hopefully another live server would respond with the proper clusterstate/live_nodes. NamedList#asMap() does a deep copy . why do you even do it here? Ah, I didn't realize there exists a NamedList#asShallowMap() method. Switched to use that.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4df4c52c0cfb8b47a066a0495bd164f6a4c973de in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4df4c52 ]

          SOLR-10447, SOLR-10447: LISTALIASES Collections API command; CloudSolrClient can be initialized using Solr URL

          SOLR-10447: Collections API now supports a LISTALIASES command to return a list of all collection aliases.

          SOLR-10446: CloudSolrClient can now be initialized using the base URL of a Solr instance instead of
          ZooKeeper hosts. This is possible through the use of newly introduced HttpClusterStateProvider.
          To fetch a list of collection aliases, this depends on LISTALIASES command, and hence this way of
          initializing CloudSolrClient would not work with older versions of Solr that doesn't support LISTALIASES.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4df4c52c0cfb8b47a066a0495bd164f6a4c973de in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4df4c52 ] SOLR-10447 , SOLR-10447 : LISTALIASES Collections API command; CloudSolrClient can be initialized using Solr URL SOLR-10447 : Collections API now supports a LISTALIASES command to return a list of all collection aliases. SOLR-10446 : CloudSolrClient can now be initialized using the base URL of a Solr instance instead of ZooKeeper hosts. This is possible through the use of newly introduced HttpClusterStateProvider. To fetch a list of collection aliases, this depends on LISTALIASES command, and hence this way of initializing CloudSolrClient would not work with older versions of Solr that doesn't support LISTALIASES.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7eedb81c4274bf1b9ad4f3b2e3ef6ae1b816469e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7eedb81 ]

          SOLR-10447, SOLR-10447: LISTALIASES Collections API command; CloudSolrClient can be initialized using Solr URL

          SOLR-10447: Collections API now supports a LISTALIASES command to return a list of all collection aliases.

          SOLR-10446: CloudSolrClient can now be initialized using the base URL of a Solr instance instead of
          ZooKeeper hosts. This is possible through the use of newly introduced HttpClusterStateProvider.
          To fetch a list of collection aliases, this depends on LISTALIASES command, and hence this way of
          initializing CloudSolrClient would not work with older versions of Solr that doesn't support LISTALIASES.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7eedb81c4274bf1b9ad4f3b2e3ef6ae1b816469e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7eedb81 ] SOLR-10447 , SOLR-10447 : LISTALIASES Collections API command; CloudSolrClient can be initialized using Solr URL SOLR-10447 : Collections API now supports a LISTALIASES command to return a list of all collection aliases. SOLR-10446 : CloudSolrClient can now be initialized using the base URL of a Solr instance instead of ZooKeeper hosts. This is possible through the use of newly introduced HttpClusterStateProvider. To fetch a list of collection aliases, this depends on LISTALIASES command, and hence this way of initializing CloudSolrClient would not work with older versions of Solr that doesn't support LISTALIASES.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I have backported this to 6x, however this won't work with Solr versions older than 6.6 due to unavailability of LISTALIASES endpoint. If someone wants this to work with older Solr as well, we would need additional work to make this work such that the aliases are obtained from /admin/zookeeper or the CLUSTERSTATUS command (which contains aliases for every collection).

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I have backported this to 6x, however this won't work with Solr versions older than 6.6 due to unavailability of LISTALIASES endpoint. If someone wants this to work with older Solr as well, we would need additional work to make this work such that the aliases are obtained from /admin/zookeeper or the CLUSTERSTATUS command (which contains aliases for every collection).
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          this won't work with Solr versions older than 6.6 due to unavailability of LISTALIASES endpoint.

          That should be okay, but can we detect a 404 on LISTALIASES and throw an exception with an appropriate message?

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - this won't work with Solr versions older than 6.6 due to unavailability of LISTALIASES endpoint. That should be okay, but can we detect a 404 on LISTALIASES and throw an exception with an appropriate message?
          Hide
          noble.paul Noble Paul added a comment -

          This also means that, this feature works with older solr versions, if the alias feature is not used. We should document it as such?

          Show
          noble.paul Noble Paul added a comment - This also means that, this feature works with older solr versions, if the alias feature is not used. We should document it as such?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          That should be okay, but can we detect a 404 on LISTALIASES and throw an exception with an appropriate message?

          I am testing a fix for this. Would also test if "this feature works for older Solr, if the alias feature is not used."

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - That should be okay, but can we detect a 404 on LISTALIASES and throw an exception with an appropriate message? I am testing a fix for this. Would also test if "this feature works for older Solr, if the alias feature is not used."
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fd4125ea413d90497789a2dcceaece9174293bef in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fd4125e ]

          SOLR-10446: Making HttpClusterStateProvider work with server that doesn't have LISTALIASES

          Show
          jira-bot ASF subversion and git services added a comment - Commit fd4125ea413d90497789a2dcceaece9174293bef in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fd4125e ] SOLR-10446 : Making HttpClusterStateProvider work with server that doesn't have LISTALIASES
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          This feature will continue to work with Solr servers that don't have LISTALIASES. It will throw a warning while fetching aliases, and won't work if aliases are used in queries/requests.

          If someone wants support for aliases with older Solr versions, we would need to make this work using the cluster state's output which contains alias information as per another JIRA issue.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - This feature will continue to work with Solr servers that don't have LISTALIASES. It will throw a warning while fetching aliases, and won't work if aliases are used in queries/requests. If someone wants support for aliases with older Solr versions, we would need to make this work using the cluster state's output which contains alias information as per another JIRA issue.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a3b9e8ebf4e88f567978c0fb0f2ed3c985e1a15e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a3b9e8e ]

          SOLR-10446: Making HttpClusterStateProvider work with server that doesn't have LISTALIASES

          Show
          jira-bot ASF subversion and git services added a comment - Commit a3b9e8ebf4e88f567978c0fb0f2ed3c985e1a15e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a3b9e8e ] SOLR-10446 : Making HttpClusterStateProvider work with server that doesn't have LISTALIASES
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Cassandra Targett, please review the documentation changes.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Cassandra Targett , please review the documentation changes.
          Hide
          ctargett Cassandra Targett added a comment -

          please review the documentation changes.

          +1 Ishan Chattopadhyaya, looks good. Thanks.

          Show
          ctargett Cassandra Targett added a comment - please review the documentation changes. +1 Ishan Chattopadhyaya , looks good. Thanks.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5c4f0a27a327dba22e121680a19c192a53b8d75e in lucene-solr's branch refs/heads/branch_6_6 from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c4f0a2 ]

          SOLR-10446, SOLR-6736: Ref guide documentation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5c4f0a27a327dba22e121680a19c192a53b8d75e in lucene-solr's branch refs/heads/branch_6_6 from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c4f0a2 ] SOLR-10446 , SOLR-6736 : Ref guide documentation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f358c6834d3957b73690d73e49c021644c2f61fb in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f358c68 ]

          SOLR-10446, SOLR-6736: Ref guide documentation

          Show
          jira-bot ASF subversion and git services added a comment - Commit f358c6834d3957b73690d73e49c021644c2f61fb in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f358c68 ] SOLR-10446 , SOLR-6736 : Ref guide documentation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2eb324f9bae1553c9c68c4a740a4f865b0ec6da5 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2eb324f ]

          SOLR-10446, SOLR-6736: Ref guide documentation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2eb324f9bae1553c9c68c4a740a4f865b0ec6da5 in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2eb324f ] SOLR-10446 , SOLR-6736 : Ref guide documentation

            People

            • Assignee:
              ichattopadhyaya Ishan Chattopadhyaya
              Reporter:
              ichattopadhyaya Ishan Chattopadhyaya
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development