Details

    Description

      The collections API lets you add, delete and modify existing collections. At the moment the API does not let you get a list of current collections or view information about a specific collection.

      The workaround is the use the Zookeeper API to get the list. This makes the Collections API harder to work with.

      Adding an action=LIST would significantly improve the function of this API.

      Attachments

        1. SOLR-5466.patch
          21 kB
          Shalin Shekhar Mangar
        2. SOLR-5466.patch
          15 kB
          Shalin Shekhar Mangar
        3. SOLR-5466.patch
          15 kB
          Shalin Shekhar Mangar
        4. SOLR-5466.patch
          13 kB
          Shalin Shekhar Mangar
        5. SOLR-5466.patch
          16 kB
          Erick Erickson
        6. SOLR-5466.patch
          14 kB
          Erick Erickson
        7. SOLR-5466.patch
          13 kB
          Vitaliy Zhovtyuk
        8. SOLR-5466.patch
          7 kB
          Varun Thacker

        Issue Links

          Activity

            varun Varun Thacker added a comment -

            Initial patch. Need to improve the test case a bit

            varun Varun Thacker added a comment - Initial patch. Need to improve the test case a bit

            Thanks Varun. I think we should repurpose this API to be more general. I think it should return collection properties as well as shard properties.

            For example:

            /admin/collections?action=STATUS
            

            The above returns a list of all collections in the cluster.

            /admin/collections?action=STATUS&collection=collection1
            

            The above returns info only about collection1 and its shards (including their properties)
            or

            /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3...
            

            The above call should return collection properties and shard properties for shard1,shard2 and shard3.

            I wish to remove the need to lookup against ZK directly for information that is inside cluster state.

            shalin Shalin Shekhar Mangar added a comment - Thanks Varun. I think we should repurpose this API to be more general. I think it should return collection properties as well as shard properties. For example: /admin/collections?action=STATUS The above returns a list of all collections in the cluster. /admin/collections?action=STATUS&collection=collection1 The above returns info only about collection1 and its shards (including their properties) or /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3... The above call should return collection properties and shard properties for shard1,shard2 and shard3. I wish to remove the need to lookup against ZK directly for information that is inside cluster state.

            Added STATUS operation for collection, for specific collection by collection parameter, for specific collection and shard (comma separated shard parameter), all properties retrieved from cluster state without request to ZK host.
            Collection status action is similar to core admin STATUS call.

            I left LISTCOLLECTIONS action as is cause user should have an option wheather to get collection status from cluster state or from ZK host directly.

            vzhovtiuk Vitaliy Zhovtyuk added a comment - Added STATUS operation for collection, for specific collection by collection parameter, for specific collection and shard (comma separated shard parameter), all properties retrieved from cluster state without request to ZK host. Collection status action is similar to core admin STATUS call. I left LISTCOLLECTIONS action as is cause user should have an option wheather to get collection status from cluster state or from ZK host directly.
            noble.paul Noble Paul added a comment -

            Can I also have a simple API to fetch nodes list for a given shard too.

            noble.paul Noble Paul added a comment - Can I also have a simple API to fetch nodes list for a given shard too.
            erickerickson Erick Erickson added a comment -

            Shalin:

            For some reason I got interested in this topic, can I help? I have about zero UI skills, but I can help shepherd it.... Let me know.

            Erick

            erickerickson Erick Erickson added a comment - Shalin: For some reason I got interested in this topic, can I help? I have about zero UI skills, but I can help shepherd it.... Let me know. Erick
            erickerickson Erick Erickson added a comment -

            Updated patch that resolves some merge conflicts against trunk.

            I noticed that, while the test succeeds, it always throws this exception both from a terminal and in IntelliJ, is there a way we can clean this up?

            25125 T111 oasc.SolrException.log ERROR There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections
            at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
            at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
            at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
            at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:206)
            at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:203)
            at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
            at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:203)
            at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:414)
            at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:383)
            at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370)
            at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112)
            at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:273)
            at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164)
            at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108)
            at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:55)
            at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:137)
            at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
            at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

            erickerickson Erick Erickson added a comment - Updated patch that resolves some merge conflicts against trunk. I noticed that, while the test succeeds, it always throws this exception both from a terminal and in IntelliJ, is there a way we can clean this up? 25125 T111 oasc.SolrException.log ERROR There was a problem trying to register as the leader:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:206) at org.apache.solr.common.cloud.SolrZkClient$3.execute(SolrZkClient.java:203) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:203) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:414) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:383) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370) at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:273) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:137) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
            erickerickson Erick Erickson added a comment -

            OK, I'm officially confused. What are the guidelines for collection actions going through the Overseer? The LISTCOLLECTIONS action does, but the STATUS command doesn't, it's handled locally in CollectionsHandler. What's the right thing to do here?

            And if the answer is to go through the Overseer for all collection actions, can we prevent short-circuiting like this?

            Thanks

            erickerickson Erick Erickson added a comment - OK, I'm officially confused. What are the guidelines for collection actions going through the Overseer? The LISTCOLLECTIONS action does, but the STATUS command doesn't, it's handled locally in CollectionsHandler. What's the right thing to do here? And if the answer is to go through the Overseer for all collection actions, can we prevent short-circuiting like this? Thanks

            I haven't looked at the latest patch yet Erick but the Overseer Collection Processor should be involved. I don't see a role for Overseer directly in these APIs. This issue is for the API part only, we can have a follow-up issue for the UI work.

            shalin Shalin Shekhar Mangar added a comment - I haven't looked at the latest patch yet Erick but the Overseer Collection Processor should be involved. I don't see a role for Overseer directly in these APIs. This issue is for the API part only, we can have a follow-up issue for the UI work.
            erickerickson Erick Erickson added a comment -

            Bah! Got confused which JIRA I was adding a comment to when I talked
            about the UI work, ignore that part.

            The UI stuff is, indeed, already a separate issue. I think the UI work
            will depend on this, I'll link it momentarily.

            OK, I can take a stab at moving the STATUS work over to the Overseer...

            Erick

            On Sat, Mar 1, 2014 at 10:03 AM, Shalin Shekhar Mangar (JIRA)

            erickerickson Erick Erickson added a comment - Bah! Got confused which JIRA I was adding a comment to when I talked about the UI work, ignore that part. The UI stuff is, indeed, already a separate issue. I think the UI work will depend on this, I'll link it momentarily. OK, I can take a stab at moving the STATUS work over to the Overseer... Erick On Sat, Mar 1, 2014 at 10:03 AM, Shalin Shekhar Mangar (JIRA)
            erickerickson Erick Erickson added a comment -

            Hack at moving the status action over to the overseer.

            Shalin:
            Is this what you had in mind?

            erickerickson Erick Erickson added a comment - Hack at moving the status action over to the overseer. Shalin: Is this what you had in mind?
            markrmiller@gmail.com Mark Miller added a comment -

            is there a way we can clean this up?

            It could probably be changed to an info event, and perhaps just print a message rather than the whole stack trace...it's expected on shutdown.

            markrmiller@gmail.com Mark Miller added a comment - is there a way we can clean this up? It could probably be changed to an info event, and perhaps just print a message rather than the whole stack trace...it's expected on shutdown.

            Shalin:
            Is this what you had in mind?

            Yes, thanks Erick! I'm still going through the patch but this looks good. I just noticed one thing - OverseerCollectionProcessor.getCollectionStatus uses Arrays.binarySearch but the array isn't sorted so it won't work.

            How about folding both STATUS and LISTCOLLECTIONS into a single status API? What do you think?

            shalin Shalin Shekhar Mangar added a comment - Shalin: Is this what you had in mind? Yes, thanks Erick! I'm still going through the patch but this looks good. I just noticed one thing - OverseerCollectionProcessor.getCollectionStatus uses Arrays.binarySearch but the array isn't sorted so it won't work. How about folding both STATUS and LISTCOLLECTIONS into a single status API? What do you think?
            erickerickson Erick Erickson added a comment -

            Shalin:

            I really did no review of the code, just moved it over into the Overseer class and got it to compile against trunk.

            Offhand (and I really haven't thought about it much) I'd rather leave the two concepts separate. I'm thinking of a GUI tool to manipulate collections/shards and it'd be more intuitive for anyone creating that UI to see list.

            Looking again (briefly), "listcollections" should probably be "list" (it's already in the collections API, we don't "createcollection" for instance).

            And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...

            erickerickson Erick Erickson added a comment - Shalin: I really did no review of the code, just moved it over into the Overseer class and got it to compile against trunk. Offhand (and I really haven't thought about it much) I'd rather leave the two concepts separate. I'm thinking of a GUI tool to manipulate collections/shards and it'd be more intuitive for anyone creating that UI to see list. Looking again (briefly), "listcollections" should probably be "list" (it's already in the collections API, we don't "createcollection" for instance). And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...
            varun Varun Thacker added a comment -

            I was thinking we could have something like what Shalin mentioned in a comment -

            For example:

            /admin/collections?action=STATUS
            

            The above should return all the status info from the cluster.

            /admin/collections?action=STATUS&collection=collection1
            

            The above returns info only about collection1 and its shards (including their properties)

            /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3...
            

            The above returns info only about collection1 and the specified shards (including their properties)

            along with something like

            /admin/collections?action=STATUS&fl=name
            

            This would list only collection names

            or

            /admin/collections?action=STATUS&fl=name,router,replicationFactor
            

            This would list only collection names and other details specified.

            Basically leverage the "fl" syntax ( we could call it something else also ) to ask for only specific information like -
            -name
            -slices
            -activeSlices
            -router
            -shards
            -maxShardsPerNode
            -router
            -replicationFactor

            etc.

            And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though...

            Even I found this confusing. We should use either one right?

            varun Varun Thacker added a comment - I was thinking we could have something like what Shalin mentioned in a comment - For example: /admin/collections?action=STATUS The above should return all the status info from the cluster. /admin/collections?action=STATUS&collection=collection1 The above returns info only about collection1 and its shards (including their properties) /admin/collections?action=STATUS&collection=collection1&shard=shard1,shard2,shard3... The above returns info only about collection1 and the specified shards (including their properties) along with something like /admin/collections?action=STATUS&fl=name This would list only collection names or /admin/collections?action=STATUS&fl=name,router,replicationFactor This would list only collection names and other details specified. Basically leverage the "fl" syntax ( we could call it something else also ) to ask for only specific information like - -name -slices -activeSlices -router -shards -maxShardsPerNode -router -replicationFactor etc. And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though... Even I found this confusing. We should use either one right?
            erickerickson Erick Erickson added a comment -

            bq: Even I found this confusing. We should use either one right?

            That's what I think, but I also tried a quick hack at it (well, not so quick it took several hours) and tests immediately started failing so of course I messed something up. So I think untangling all that is best put in another JIRA so we don't hold up this functionality for what is essentially cleanup.

            About the STATUS command. I see what you mean, it just feels overly complex. I'd just go for the STATUS returning the clusterstate. Here's why:
            1> whatever specifics we put in there are going to require that we maintain it. Take this example:
            STATUS&fl=name,router,replicationFactor

            If I have several collections, I now have to define a syntax for what I return to associate the replicationFactor with Colleciton1, Collection2, etc. Would it be better just to give them the cluster state and let it go at that?

            I think my driving question is whether there's a need to do this that we're responding to or just doing it because we can. If the latter, I'm neutral to - on it...

            erickerickson Erick Erickson added a comment - bq: Even I found this confusing. We should use either one right? That's what I think, but I also tried a quick hack at it (well, not so quick it took several hours) and tests immediately started failing so of course I messed something up. So I think untangling all that is best put in another JIRA so we don't hold up this functionality for what is essentially cleanup. About the STATUS command. I see what you mean, it just feels overly complex. I'd just go for the STATUS returning the clusterstate. Here's why: 1> whatever specifics we put in there are going to require that we maintain it. Take this example: STATUS&fl=name,router,replicationFactor If I have several collections, I now have to define a syntax for what I return to associate the replicationFactor with Colleciton1, Collection2, etc. Would it be better just to give them the cluster state and let it go at that? I think my driving question is whether there's a need to do this that we're responding to or just doing it because we can. If the latter, I'm neutral to - on it...

            Patch updated to trunk with some changes:

            1. Removed incorrect use of Arrays.binarySearch in OCP.getCollectionStatus
            2. Renamed 'status' to 'clusterstatus'. This is necessary because SOLR-5466 added a 'status' API for request status. Renamed all variables, enum fields too appropriately.
            shalin Shalin Shekhar Mangar added a comment - Patch updated to trunk with some changes: Removed incorrect use of Arrays.binarySearch in OCP.getCollectionStatus Renamed 'status' to 'clusterstatus'. This is necessary because SOLR-5466 added a 'status' API for request status. Renamed all variables, enum fields too appropriately.
            1. Added _route_ parameter which can be used in place of shard. The given route key will be used to determine the shard info to be returned. This will be useful to know which shard a given route key resolves to.
            2. Fixed TestCollectionAPI which was incorrectly expecting collections to be returned in a certain order.

            I'm going to add shard address in the response. Most clients would like to know the base url of the shard. It is not easy to know that info from the node_name, core_name etc returned by the cluster state. I'll also add cluster properties, aliases and roles.

            shalin Shalin Shekhar Mangar added a comment - Added _route_ parameter which can be used in place of shard . The given route key will be used to determine the shard info to be returned. This will be useful to know which shard a given route key resolves to. Fixed TestCollectionAPI which was incorrectly expecting collections to be returned in a certain order. I'm going to add shard address in the response. Most clients would like to know the base url of the shard. It is not easy to know that info from the node_name, core_name etc returned by the cluster state. I'll also add cluster properties, aliases and roles.
            shalin Shalin Shekhar Mangar added a comment - - edited

            And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though

            Yeah, it is a mess. I'll open an issue to clean this up.

            varun - I like the fl syntax but I think it may be hard to maintain. At least this way all properties inside cluster state are automatically returned. We can always add the filtering feature later if we want.

            shalin Shalin Shekhar Mangar added a comment - - edited And, I started to untangle the fact that we have all the strings in OverseerCollectionProcessor, but also have a nice CollectionAction enum. And the commands are intermingled with parameters, it all seems rather confusing. Does it make sense to use the enum rather than the strings? Or somehow associate the two? Probably something for another JIRA though Yeah, it is a mess. I'll open an issue to clean this up. varun - I like the fl syntax but I think it may be hard to maintain. At least this way all properties inside cluster state are automatically returned. We can always add the filtering feature later if we want.

            Fixed a bug in OCP.getCollectionStatus which removed shards from the parent cluster state.

            Hi vzhovtiuk - Be very careful about the cluster state information. It is not protected and if you remove something from collection properties, it will not be visible anymore to other classes in Solr. Always make copies if you are modifying it.

            We should re-factor cluster state and associated classes to return immutable objects by default. I shall open an issue.

            shalin Shalin Shekhar Mangar added a comment - Fixed a bug in OCP.getCollectionStatus which removed shards from the parent cluster state. Hi vzhovtiuk - Be very careful about the cluster state information. It is not protected and if you remove something from collection properties, it will not be visible anymore to other classes in Solr. Always make copies if you are modifying it. We should re-factor cluster state and associated classes to return immutable objects by default. I shall open an issue.

            Adds aliases and collection properties. The last patch failed because custom objects like DocCollection and Slice cannot be serialized. This patch uses JsonWriter to convert ClusterState into a generic serializable object. I think this is good to go.

            shalin Shalin Shekhar Mangar added a comment - Adds aliases and collection properties. The last patch failed because custom objects like DocCollection and Slice cannot be serialized. This patch uses JsonWriter to convert ClusterState into a generic serializable object. I think this is good to go.

            Commit 1582734 from shalin@apache.org in branch 'dev/trunk'
            [ https://svn.apache.org/r1582734 ]

            SOLR-5466: A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

            jira-bot ASF subversion and git services added a comment - Commit 1582734 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1582734 ] SOLR-5466 : A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

            Commit 1582736 from shalin@apache.org in branch 'dev/branches/branch_4x'
            [ https://svn.apache.org/r1582736 ]

            SOLR-5466: A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

            jira-bot ASF subversion and git services added a comment - Commit 1582736 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1582736 ] SOLR-5466 : A new List collections and cluster status API which clients can use to read collection and shard information instead of reading data directly from ZooKeeper

            This will be released with Solr 4.8.

            Thanks everyone!

            shalin Shalin Shekhar Mangar added a comment - This will be released with Solr 4.8. Thanks everyone!
            uschindler Uwe Schindler added a comment -

            Close issue after release of 4.8.0

            uschindler Uwe Schindler added a comment - Close issue after release of 4.8.0

            People

              shalin Shalin Shekhar Mangar
              daveseltzer Dave Seltzer
              Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: