Thanks for the feedback Shalin Shekhar Mangar.
The comments are w.r.t a patch that's almost complete, shall put it up in a few hours hopefully.
Can we use "collection" instead of "name" just like we use in splitshard?
Sure, changed. Actually, we could look at moving all APIs to use "collection" with 5.0 perhaps.
Fixed the exception to have an OR (which short circuits and never ends up being an NPE)
Can we re-use the code between "deletecollection" and "deleteshard"? The collectionCmd code checks for "live" state as well.
We 'can' but I feel we shouldn't as shard deletion is specific to a Slice in a collection, the collection deletes all Slices. I think it'd be an overkill.
DeleteSliceTest.waitAndConfirmSliceDeletion is wrong. It does not actually use the counter variable. Also, cloudClient.getZkStateReader().getClusterState() doesn't actually force refresh the cluster state
The counter variable is incremented. The force refresh bit is fixed.
In DeleteSliceTest, after setSliceAsInactive(), we should poll the slice state until it becomes inactive or until a timeout value instead of just waiting for 5000ms
We should fail with appropriate error message if there were nodes which could not be unloaded. Perhaps a separate "deletecore" call is appropriate here?
Isn't this opposite of what you say in #3?
Do we know what would happen if such a "zombie" node comes back up? We need to make sure it cleans up properly.
Not sure of that behaviour. Wouldn't that be as good as a random shard coming up with an invalid/non-existent state? Do we currently know what would happen in that case, leaving the delete API aside? Will test it out anyways.