Details

      Description

      The collections API lets you add, delete and modify existing collections. At the moment the API does not let you get a list of current collections or view information about a specific collection.

      The workaround is the use the Zookeeper API to get the list. This makes the Collections API harder to work with.

      Adding an action=LIST would significantly improve the function of this API.

      1. SOLR-5466.patch
        21 kB
        Shalin Shekhar Mangar
      2. SOLR-5466.patch
        15 kB
        Shalin Shekhar Mangar
      3. SOLR-5466.patch
        15 kB
        Shalin Shekhar Mangar
      4. SOLR-5466.patch
        13 kB
        Shalin Shekhar Mangar
      5. SOLR-5466.patch
        16 kB
        Erick Erickson
      6. SOLR-5466.patch
        14 kB
        Erick Erickson
      7. SOLR-5466.patch
        13 kB
        Vitaliy Zhovtyuk
      8. SOLR-5466.patch
        7 kB
        Varun Thacker

        Issue Links

          Activity

          Hide
          Varun Thacker added a comment -

          Initial patch. Need to improve the test case a bit

          Show
          Varun Thacker added a comment - Initial patch. Need to improve the test case a bit
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Varun. I think we should repurpose this API to be more general. I think it should return collection properties as well as shard properties.

          For example:

          /admin/collections?action=STATUS
          

          The above returns a list of all collections in the cluster.

          /admin/collections?action=STATUS&collection=collection1
          

          The above returns info only about collection1 and its shards (including their properties)
          or

          /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3...
          

          The above call should return collection properties and shard properties for shard1,shard2 and shard3.

          I wish to remove the need to lookup against ZK directly for information that is inside cluster state.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Varun. I think we should repurpose this API to be more general. I think it should return collection properties as well as shard properties. For example: /admin/collections?action=STATUS The above returns a list of all collections in the cluster. /admin/collections?action=STATUS&collection=collection1 The above returns info only about collection1 and its shards (including their properties) or /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3... The above call should return collection properties and shard properties for shard1,shard2 and shard3. I wish to remove the need to lookup against ZK directly for information that is inside cluster state.
          Hide
          Vitaliy Zhovtyuk added a comment -

          Added STATUS operation for collection, for specific collection by collection parameter, for specific collection and shard (comma separated shard parameter), all properties retrieved from cluster state without request to ZK host.
          Collection status action is similar to core admin STATUS call.

          I left LISTCOLLECTIONS action as is cause user should have an option wheather to get collection status from cluster state or from ZK host directly.

          Show
          Vitaliy Zhovtyuk added a comment - Added STATUS operation for collection, for specific collection by collection parameter, for specific collection and shard (comma separated shard parameter), all properties retrieved from cluster state without request to ZK host. Collection status action is similar to core admin STATUS call. I left LISTCOLLECTIONS action as is cause user should have an option wheather to get collection status from cluster state or from ZK host directly.
          Hide
          Noble Paul added a comment -

          Can I also have a simple API to fetch nodes list for a given shard too.

          Show
          Noble Paul added a comment - Can I also have a simple API to fetch nodes list for a given shard too.
          Hide
          Erick Erickson added a comment -

          Shalin:

          For some reason I got interested in this topic, can I help? I have about zero UI skills, but I can help shepherd it.... Let me know.

          Erick

          Show
          Erick Erickson added a comment - Shalin: For some reason I got interested in this topic, can I help? I have about zero UI skills, but I can help shepherd it.... Let me know. Erick
          Hide
          Erick Erickson added a comment -

          Updated patch that resolves some merge conflicts against trunk.

          I noticed that, while the test succeeds, it always throws this exception both from a terminal and in IntelliJ, is there a way we can clean this up?

          25125 T111 oasc.SolrException.log ERROR There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections
          at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
          at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
          at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
          at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:206)
          at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:203)
          at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
          at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:203)
          at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:414)
          at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:383)
          at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370)
          at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112)
          at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:273)
          at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164)
          at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108)
          at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:55)
          at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:137)
          at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
          at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

          Show
          Erick Erickson added a comment - Updated patch that resolves some merge conflicts against trunk. I noticed that, while the test succeeds, it always throws this exception both from a terminal and in IntelliJ, is there a way we can clean this up? 25125 T111 oasc.SolrException.log ERROR There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:206) at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:203) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:203) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:414) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:383) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370) at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:273) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:137) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
          Hide
          Erick Erickson added a comment -

          OK, I'm officially confused. What are the guidelines for collection actions going through the Overseer? The LISTCOLLECTIONS action does, but the STATUS command doesn't, it's handled locally in CollectionsHandler. What's the right thing to do here?

          And if the answer is to go through the Overseer for all collection actions, can we prevent short-circuiting like this?

          Thanks

          Show
          Erick Erickson added a comment - OK, I'm officially confused. What are the guidelines for collection actions going through the Overseer? The LISTCOLLECTIONS action does, but the STATUS command doesn't, it's handled locally in CollectionsHandler. What's the right thing to do here? And if the answer is to go through the Overseer for all collection actions, can we prevent short-circuiting like this? Thanks
          Hide
          Shalin Shekhar Mangar added a comment -

          I haven't looked at the latest patch yet Erick but the Overseer Collection Processor should be involved. I don't see a role for Overseer directly in these APIs. This issue is for the API part only, we can have a follow-up issue for the UI work.

          Show
          Shalin Shekhar Mangar added a comment - I haven't looked at the latest patch yet Erick but the Overseer Collection Processor should be involved. I don't see a role for Overseer directly in these APIs. This issue is for the API part only, we can have a follow-up issue for the UI work.
          Hide
          Erick Erickson added a comment -

          Bah! Got confused which JIRA I was adding a comment to when I talked
          about the UI work, ignore that part.

          The UI stuff is, indeed, already a separate issue. I think the UI work
          will depend on this, I'll link it momentarily.

          OK, I can take a stab at moving the STATUS work over to the Overseer...

          Erick

          On Sat, Mar 1, 2014 at 10:03 AM, Shalin Shekhar Mangar (JIRA)

          Show
          Erick Erickson added a comment - Bah! Got confused which JIRA I was adding a comment to when I talked about the UI work, ignore that part. The UI stuff is, indeed, already a separate issue. I think the UI work will depend on this, I'll link it momentarily. OK, I can take a stab at moving the STATUS work over to the Overseer... Erick On Sat, Mar 1, 2014 at 10:03 AM, Shalin Shekhar Mangar (JIRA)
          Hide
          Erick Erickson added a comment -

          Hack at moving the status action over to the overseer.

          Shalin:
          Is this what you had in mind?

          Show
          Erick Erickson added a comment - Hack at moving the status action over to the overseer. Shalin: Is this what you had in mind?
          Hide
          Mark Miller added a comment -

          is there a way we can clean this up?

          It could probably be changed to an info event, and perhaps just print a message rather than the whole stack trace...it's expected on shutdown.

          Show
          Mark Miller added a comment - is there a way we can clean this up? It could probably be changed to an info event, and perhaps just print a message rather than the whole stack trace...it's expected on shutdown.
          Hide
          Shalin Shekhar Mangar added a comment -

          Shalin:
          Is this what you had in mind?

          Yes, thanks Erick! I'm still going through the patch but this looks good. I just noticed one thing - OverseerCollectionProcessor.getCollectionStatus uses Arrays.binarySearch but the array isn't sorted so it won't work.

          How about folding both STATUS and LISTCOLLECTIONS into a single status API? What do you think?

          Show
          Shalin Shekhar Mangar added a comment - Shalin: Is this what you had in mind? Yes, thanks Erick! I'm still going through the patch but this looks good. I just noticed one thing - OverseerCollectionProcessor.getCollectionStatus uses Arrays.binarySearch but the array isn't sorted so it won't work. How about folding both STATUS and LISTCOLLECTIONS into a single status API? What do you think?
          Hide
          Erick Erickson added a comment -

          Shalin:

          I really did no review of the code, just moved it over into the Overseer class and got it to compile against trunk.

          Offhand (and I really haven't thought about it much) I'd rather leave the two concepts separate. I'm thinking of a GUI tool to manipulate collections/shards and it'd be more intuitive for anyone creating that UI to see list.

          Looking again (briefly), "listcollections" should probably be "list" (it's already in the collections API, we don't "createcollection" for instance).

          And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...

          Show
          Erick Erickson added a comment - Shalin: I really did no review of the code, just moved it over into the Overseer class and got it to compile against trunk. Offhand (and I really haven't thought about it much) I'd rather leave the two concepts separate. I'm thinking of a GUI tool to manipulate collections/shards and it'd be more intuitive for anyone creating that UI to see list. Looking again (briefly), "listcollections" should probably be "list" (it's already in the collections API, we don't "createcollection" for instance). And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...
          Hide
          Varun Thacker added a comment -

          I was thinking we could have something like what Shalin mentioned in a comment -

          For example:

          /admin/collections?action=STATUS
          

          The above should return all the status info from the cluster.

          /admin/collections?action=STATUS&collection=collection1
          

          The above returns info only about collection1 and its shards (including their properties)

          /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3...
          

          The above returns info only about collection1 and the specified shards (including their properties)

          along with something like

          /admin/collections?action=STATUS&fl=name
          

          This would list only collection names

          or

          /admin/collections?action=STATUS&fl=name,router,replicationFactor
          

          This would list only collection names and other details specified.

          Basically leverage the "fl" syntax ( we could call it something else also ) to ask for only specific information like -
          -name
          -slices
          -activeSlices
          -router
          -shards
          -maxShardsPerNode
          -router
          -replicationFactor

          etc.

          And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...

          Even I found this confusing. We should use either one right?

          Show
          Varun Thacker added a comment - I was thinking we could have something like what Shalin mentioned in a comment - For example: /admin/collections?action=STATUS The above should return all the status info from the cluster. /admin/collections?action=STATUS&collection=collection1 The above returns info only about collection1 and its shards (including their properties) /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3... The above returns info only about collection1 and the specified shards (including their properties) along with something like /admin/collections?action=STATUS&fl=name This would list only collection names or /admin/collections?action=STATUS&fl=name,router,replicationFactor This would list only collection names and other details specified. Basically leverage the "fl" syntax ( we could call it something else also ) to ask for only specific information like - -name -slices -activeSlices -router -shards -maxShardsPerNode -router -replicationFactor etc. And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though... Even I found this confusing. We should use either one right?
          Hide
          Erick Erickson added a comment -

          bq: Even I found this confusing. We should use either one right?

          That's what I think, but I also tried a quick hack at it (well, not so quick it took several hours) and tests immediately started failing so of course I messed something up. So I think untangling all that is best put in another JIRA so we don't hold up this functionality for what is essentially cleanup.

          About the STATUS command. I see what you mean, it just feels overly complex. I'd just go for the STATUS returning the clusterstate. Here's why:
          1> whatever specifics we put in there are going to require that we maintain it. Take this example:
          STATUS&fl=name,router,replicationFactor

          If I have several collections, I now have to define a syntax for what I return to associate the replicationFactor with Colleciton1, Collection2, etc. Would it be better just to give them the cluster state and let it go at that?

          I think my driving question is whether there's a need to do this that we're responding to or just doing it because we can. If the latter, I'm neutral to - on it...

          Show
          Erick Erickson added a comment - bq: Even I found this confusing. We should use either one right? That's what I think, but I also tried a quick hack at it (well, not so quick it took several hours) and tests immediately started failing so of course I messed something up. So I think untangling all that is best put in another JIRA so we don't hold up this functionality for what is essentially cleanup. About the STATUS command. I see what you mean, it just feels overly complex. I'd just go for the STATUS returning the clusterstate. Here's why: 1> whatever specifics we put in there are going to require that we maintain it. Take this example: STATUS&fl=name,router,replicationFactor If I have several collections, I now have to define a syntax for what I return to associate the replicationFactor with Colleciton1, Collection2, etc. Would it be better just to give them the cluster state and let it go at that? I think my driving question is whether there's a need to do this that we're responding to or just doing it because we can. If the latter, I'm neutral to - on it...
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk with some changes:

          1. Removed incorrect use of Arrays.binarySearch in OCP.getCollectionStatus
          2. Renamed 'status' to 'clusterstatus'. This is necessary because SOLR-5466 added a 'status' API for request status. Renamed all variables, enum fields too appropriately.
          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk with some changes: Removed incorrect use of Arrays.binarySearch in OCP.getCollectionStatus Renamed 'status' to 'clusterstatus'. This is necessary because SOLR-5466 added a 'status' API for request status. Renamed all variables, enum fields too appropriately.
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Added _route_ parameter which can be used in place of shard. The given route key will be used to determine the shard info to be returned. This will be useful to know which shard a given route key resolves to.
          2. Fixed TestCollectionAPI which was incorrectly expecting collections to be returned in a certain order.

          I'm going to add shard address in the response. Most clients would like to know the base url of the shard. It is not easy to know that info from the node_name, core_name etc returned by the cluster state. I'll also add cluster properties, aliases and roles.

          Show
          Shalin Shekhar Mangar added a comment - Added _route_ parameter which can be used in place of shard . The given route key will be used to determine the shard info to be returned. This will be useful to know which shard a given route key resolves to. Fixed TestCollectionAPI which was incorrectly expecting collections to be returned in a certain order. I'm going to add shard address in the response. Most clients would like to know the base url of the shard. It is not easy to know that info from the node_name, core_name etc returned by the cluster state. I'll also add cluster properties, aliases and roles.
          Hide
          Shalin Shekhar Mangar added a comment -

          And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though

          Yeah, it is a mess. I'll open an issue to clean this up.

          Varun Thacker - I like the fl syntax but I think it may be hard to maintain. At least this way all properties inside cluster state are automatically returned. We can always add the filtering feature later if we want.

          Show
          Shalin Shekhar Mangar added a comment - And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though Yeah, it is a mess. I'll open an issue to clean this up. Varun Thacker - I like the fl syntax but I think it may be hard to maintain. At least this way all properties inside cluster state are automatically returned. We can always add the filtering feature later if we want.
          Hide
          Shalin Shekhar Mangar added a comment -

          Fixed a bug in OCP.getCollectionStatus which removed shards from the parent cluster state.

          Hi Vitaliy Zhovtyuk - Be very careful about the cluster state information. It is not protected and if you remove something from collection properties, it will not be visible anymore to other classes in Solr. Always make copies if you are modifying it.

          We should re-factor cluster state and associated classes to return immutable objects by default. I shall open an issue.

          Show
          Shalin Shekhar Mangar added a comment - Fixed a bug in OCP.getCollectionStatus which removed shards from the parent cluster state. Hi Vitaliy Zhovtyuk - Be very careful about the cluster state information. It is not protected and if you remove something from collection properties, it will not be visible anymore to other classes in Solr. Always make copies if you are modifying it. We should re-factor cluster state and associated classes to return immutable objects by default. I shall open an issue.
          Hide
          Shalin Shekhar Mangar added a comment -

          Adds aliases and collection properties. The last patch failed because custom objects like DocCollection and Slice cannot be serialized. This patch uses JsonWriter to convert ClusterState into a generic serializable object. I think this is good to go.

          Show
          Shalin Shekhar Mangar added a comment - Adds aliases and collection properties. The last patch failed because custom objects like DocCollection and Slice cannot be serialized. This patch uses JsonWriter to convert ClusterState into a generic serializable object. I think this is good to go.
          Hide
          ASF subversion and git services added a comment -

          Commit 1582734 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1582734 ]

          SOLR-5466: A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

          Show
          ASF subversion and git services added a comment - Commit 1582734 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1582734 ] SOLR-5466 : A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper
          Hide
          ASF subversion and git services added a comment -

          Commit 1582736 from shalin@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1582736 ]

          SOLR-5466: A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

          Show
          ASF subversion and git services added a comment - Commit 1582736 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1582736 ] SOLR-5466 : A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper
          Hide
          Shalin Shekhar Mangar added a comment -

          This will be released with Solr 4.8.

          Thanks everyone!

          Show
          Shalin Shekhar Mangar added a comment - This will be released with Solr 4.8. Thanks everyone!
          Hide
          Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          Show
          Uwe Schindler added a comment - Close issue after release of 4.8.0

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Dave Seltzer
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development