Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      To enable SSL, we have instructions like the following:

      server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd put /clusterprops.json '{"urlScheme":"https"}'
      

      Overwriting the value won't work well when we have more properties to put in clusterprops. We should be able to change individual values or perhaps merge values.

      1. SOLR-7176.patch
        13 kB
        Per Steffensen
      2. SOLR-7176.patch
        13 kB
        Per Steffensen
      3. SOLR-7176.patch
        12 kB
        Hrishikesh Gadre
      4. SOLR-7176.patch
        12 kB
        Hrishikesh Gadre
      5. SOLR-7176.patch
        11 kB
        Hrishikesh Gadre

        Activity

        Hide
        Hoss Man added a comment -

        supporting arbitrary JSON editing could be dangerous (hello clusterstate.json !)

        perhaps it would be best to just have a simple "clusterprop" command?

        alternatively: if the only real concern here for normal users is the urlSchema clusterprop, couldn't we make bin/solr smart enough to set that property for you if it detects that SOLR_SSL_OPTS is set and the property isn't already set?

        Show
        Hoss Man added a comment - supporting arbitrary JSON editing could be dangerous (hello clusterstate.json !) perhaps it would be best to just have a simple "clusterprop" command? alternatively: if the only real concern here for normal users is the urlSchema clusterprop, couldn't we make bin/solr smart enough to set that property for you if it detects that SOLR_SSL_OPTS is set and the property isn't already set?
        Hide
        Ramkumar Aiyengar added a comment -

        perhaps it would be best to just have a simple "clusterprop" command?

        Solr seems a better place to add this logic than the independent zkcli – perhaps as a separate admin kind of API like collection API. zkcli should stay as agnostic of Solr as possible – and from that perspective it would be nice to do modifications like what's suggested here – but it should be what it looks like, a sharp dumb tool to deal with cases which need ZK needs to be modified, and I have modified cluster state on occasion when the overseer throws a fit. But if we are improving on it and making it look like a real targeted API for something in Solr, it should be in Solr.

        Show
        Ramkumar Aiyengar added a comment - perhaps it would be best to just have a simple "clusterprop" command? Solr seems a better place to add this logic than the independent zkcli – perhaps as a separate admin kind of API like collection API. zkcli should stay as agnostic of Solr as possible – and from that perspective it would be nice to do modifications like what's suggested here – but it should be what it looks like, a sharp dumb tool to deal with cases which need ZK needs to be modified, and I have modified cluster state on occasion when the overseer throws a fit. But if we are improving on it and making it look like a real targeted API for something in Solr, it should be in Solr.
        Hide
        Jan Høydahl added a comment -

        Solr seems a better place to add this logic than the independent zkcli

        I don't like the idea of requiring a running Solr cluster for basic cluster installation tasks.
        Perhaps if we deem SSL on/off to be a fundamentally important flag, perhaps it deserves a dedicated ZK node?

        Show
        Jan Høydahl added a comment - Solr seems a better place to add this logic than the independent zkcli I don't like the idea of requiring a running Solr cluster for basic cluster installation tasks. Perhaps if we deem SSL on/off to be a fundamentally important flag, perhaps it deserves a dedicated ZK node?
        Hide
        Mark Miller added a comment -

        Isn't this just old? As far as I remember there is a cluster properties rest api now.

        Show
        Mark Miller added a comment - Isn't this just old? As far as I remember there is a cluster properties rest api now.
        Hide
        Hrishikesh Gadre added a comment -

        [~markrmiller] Ramkumar Aiyengar There is a cluster management API to support this functionality.

        solr/admin/collections?action=CLUSTERPROP&name=urlScheme&val=https

        The problem with using this API is that it requires that Solr server need to be up and running. This could be problematic if Solr cluster is being managed by an external tool (e.g. Cloudera Manager). e.g. assume following workflow,

        1. User shuts down Solr cluster.
        2. Users configures Solr service to use SSL (via. Cloudera Manager)
        3. User restart the Solr cluster

        Here you can see that the Solr servers would be down during step (3). Instead if we expose this functionality through zkcli, then we would be able to bypass the requirement for Solr servers to be up and running. Obviously we don't have to add this option to zkcli. Any other option (which would not depend upon Solr server to be up) should be fine.

        Show
        Hrishikesh Gadre added a comment - [~markrmiller] Ramkumar Aiyengar There is a cluster management API to support this functionality. solr/admin/collections?action=CLUSTERPROP&name=urlScheme&val=https The problem with using this API is that it requires that Solr server need to be up and running. This could be problematic if Solr cluster is being managed by an external tool (e.g. Cloudera Manager). e.g. assume following workflow, 1. User shuts down Solr cluster. 2. Users configures Solr service to use SSL (via. Cloudera Manager) 3. User restart the Solr cluster Here you can see that the Solr servers would be down during step (3). Instead if we expose this functionality through zkcli, then we would be able to bypass the requirement for Solr servers to be up and running. Obviously we don't have to add this option to zkcli. Any other option (which would not depend upon Solr server to be up) should be fine.
        Hide
        Hrishikesh Gadre added a comment -
        Show
        Hrishikesh Gadre added a comment - + Mark Miller
        Hide
        Ramkumar Aiyengar added a comment -

        I understand the need now, this would be in support of the clusterprop command Hoss mentioned. The trouble still is in branding something specific such as this as part of zkCli. Changing cluster props might involve us changing something else in ZK or even doing something else apart from modifying ZK, this looks like it is making use of an implementation detail. May be this calls for a tool which shares implementation with Solr to provide limited 'offline' access to such functionality. I know there has been this long standing discussion of moving solr.xml to ZK, some such tool might also help us long term with that..

        Show
        Ramkumar Aiyengar added a comment - I understand the need now, this would be in support of the clusterprop command Hoss mentioned. The trouble still is in branding something specific such as this as part of zkCli. Changing cluster props might involve us changing something else in ZK or even doing something else apart from modifying ZK, this looks like it is making use of an implementation detail. May be this calls for a tool which shares implementation with Solr to provide limited 'offline' access to such functionality. I know there has been this long standing discussion of moving solr.xml to ZK, some such tool might also help us long term with that..
        Hide
        Yonik Seeley added a comment -

        this looks like it is making use of an implementation detail.

        Only if one considers the contents of clusterprops.json to be an implementation detail.
        It looks/feels like configuration to me, which is more interface than implementation detail.

        Show
        Yonik Seeley added a comment - this looks like it is making use of an implementation detail. Only if one considers the contents of clusterprops.json to be an implementation detail. It looks/feels like configuration to me, which is more interface than implementation detail.
        Hide
        Shalin Shekhar Mangar added a comment -

        Only if one considers the contents of clusterprops.json to be an implementation detail. It looks/feels like configuration to me, which is more interface than implementation detail.

        It is certainly configuration and there will likely be more of the same in future. But I agree with Hoss that there should be a clusterproperty (clusterprop!??) command in zkcli and not something which can add/remove values from arbitrary Json objects.

        Show
        Shalin Shekhar Mangar added a comment - Only if one considers the contents of clusterprops.json to be an implementation detail. It looks/feels like configuration to me, which is more interface than implementation detail. It is certainly configuration and there will likely be more of the same in future. But I agree with Hoss that there should be a clusterproperty (clusterprop!??) command in zkcli and not something which can add/remove values from arbitrary Json objects.
        Hide
        Yonik Seeley added a comment -

        I guess I see those as orthogonal questions. Even if there is some sort of clusterprop command for zkcli, the ability to change an arbitrary file seems to fit right into the feature set of being able to upload arbitrary files (which zkcli can do today).

        Show
        Yonik Seeley added a comment - I guess I see those as orthogonal questions. Even if there is some sort of clusterprop command for zkcli, the ability to change an arbitrary file seems to fit right into the feature set of being able to upload arbitrary files (which zkcli can do today).
        Hide
        Ramkumar Aiyengar added a comment - - edited

        Fair enough, if we do see and intend to use clusterprops.json as an interface, I withdraw my objection.

        I guess I see those as orthogonal questions. Even if there is some sort of clusterprop command for zkcli, the ability to change an arbitrary file seems to fit right into the feature set of being able to upload arbitrary files (which zkcli can do today).

        +1, if we are adding a clusterprop command as a safety mechanism, it seems better to add it to a "solr config tool" even if it just wraps zkcli. The feature Yonik currently is proposing is a suitable feature to add to what zkcli does today, being a sharp tool with the ability to modify/damage ZK arbitrarily as it stands today. Having people moving away from it in favour of a nicer wrapper tool which does specific things seems like a separate, good feature to have..

        Show
        Ramkumar Aiyengar added a comment - - edited Fair enough, if we do see and intend to use clusterprops.json as an interface, I withdraw my objection. I guess I see those as orthogonal questions. Even if there is some sort of clusterprop command for zkcli, the ability to change an arbitrary file seems to fit right into the feature set of being able to upload arbitrary files (which zkcli can do today). +1, if we are adding a clusterprop command as a safety mechanism, it seems better to add it to a "solr config tool" even if it just wraps zkcli. The feature Yonik currently is proposing is a suitable feature to add to what zkcli does today, being a sharp tool with the ability to modify/damage ZK arbitrarily as it stands today. Having people moving away from it in favour of a nicer wrapper tool which does specific things seems like a separate, good feature to have..
        Hide
        Per Steffensen added a comment - - edited

        I think we should not make ZkCLI a Swiss Army Knife. ZkCLI should deal with ZK operations, like uploading and downloading. Changing then content of a json-file is not a ZK operation. Why not see it as different things. So what you need to do if you want to make arbitrary changes to json-content of a znode in ZK is

        zkcli getfile thefile.json
        <use another tool for modifying thefile.json>
        zkcli putfile thefile.json
        

        It would be nice to make different tools that can make operations on the different types of jsons we have in a safe way - ensuring e.g. consistency, integrity, validity etc. But I think those tools should use the classes already handling those jsons, and not have anything to do with ZkCLI. E.g. a tool for manipulating clusterstate.json should use classes from org.apache.solr.common.cloud in solrj project, like ZkStateReader, ClusterState, ClusterStateUpdater etc. Then it is always those classes that are used to operate on a specific type of json, and we only have to build consistency, integrity, validity etc into those classes (separation of concern). The new thing should be that those classes can be used via an external tool also when no Solr nodes are running.

        At least make sure to use those existing classes for actually reading/writing/verifying the jsons, and make separate tool-classes. Changing ZkCLI to be able to trigger operations in those tool-classes is less important, but personally I do not like - actually, if it has to be, I would rather see e.g. an ClusterPropCLI-tool support a "do-in-zk"-flag making it use ZkCLI tool to download first, do its manipulations and then use ZkCLI to upload again. That is, new tools that use ZkCLI (if you are too lazy doing the download/upload using ZkCLI yourself), instead of the other way around, where ZkCLI uses other tools or even just does json-manipulation itself. Please do not try to implement rules about what can and cannot be in a specific type of json, what operations you can do on it etc two places in the code-base - use what we already have.

        Would like to see clean definitions of which classes own responsibilities. E.g.

        • ClusterState.java own the responsibility of a clusterstate.json always being consistent, valid etc. You never do changes to clusterstate.json with using ClusterState.java
        • ClusterStateUpdater own the set of allowed operations on clusterstate.json. You never manipulate a clusterstate.json without going through ClusterStateUpdater. The new off-line ClusterStateCLI-tool should use ClusterStateUpdater to do its operations - just change ClusterStateUpdater, so that it support receiving jobs not coming from other sources than overseer queue.

        Yes, I know you primarily talk about ClusterProps, but the principle will be the same. It is just that I know more about which classes handle cluster-state than which handle cluster-props.

        That said, maybe you can live with using e.g. http://trentm.com/json/ (or one of the million other command-line json tools available) in <use another tool for modifying thefile.json> above?

        Show
        Per Steffensen added a comment - - edited I think we should not make ZkCLI a Swiss Army Knife. ZkCLI should deal with ZK operations, like uploading and downloading. Changing then content of a json-file is not a ZK operation. Why not see it as different things. So what you need to do if you want to make arbitrary changes to json-content of a znode in ZK is zkcli getfile thefile.json <use another tool for modifying thefile.json> zkcli putfile thefile.json It would be nice to make different tools that can make operations on the different types of jsons we have in a safe way - ensuring e.g. consistency, integrity, validity etc. But I think those tools should use the classes already handling those jsons, and not have anything to do with ZkCLI. E.g. a tool for manipulating clusterstate.json should use classes from org.apache.solr.common.cloud in solrj project, like ZkStateReader, ClusterState, ClusterStateUpdater etc. Then it is always those classes that are used to operate on a specific type of json, and we only have to build consistency, integrity, validity etc into those classes (separation of concern). The new thing should be that those classes can be used via an external tool also when no Solr nodes are running. At least make sure to use those existing classes for actually reading/writing/verifying the jsons, and make separate tool-classes. Changing ZkCLI to be able to trigger operations in those tool-classes is less important, but personally I do not like - actually, if it has to be, I would rather see e.g. an ClusterPropCLI-tool support a "do-in-zk"-flag making it use ZkCLI tool to download first, do its manipulations and then use ZkCLI to upload again. That is, new tools that use ZkCLI (if you are too lazy doing the download/upload using ZkCLI yourself), instead of the other way around, where ZkCLI uses other tools or even just does json-manipulation itself. Please do not try to implement rules about what can and cannot be in a specific type of json, what operations you can do on it etc two places in the code-base - use what we already have. Would like to see clean definitions of which classes own responsibilities. E.g. ClusterState.java own the responsibility of a clusterstate.json always being consistent, valid etc. You never do changes to clusterstate.json with using ClusterState.java ClusterStateUpdater own the set of allowed operations on clusterstate.json. You never manipulate a clusterstate.json without going through ClusterStateUpdater. The new off-line ClusterStateCLI-tool should use ClusterStateUpdater to do its operations - just change ClusterStateUpdater, so that it support receiving jobs not coming from other sources than overseer queue. Yes, I know you primarily talk about ClusterProps, but the principle will be the same. It is just that I know more about which classes handle cluster-state than which handle cluster-props. That said, maybe you can live with using e.g. http://trentm.com/json/ (or one of the million other command-line json tools available) in <use another tool for modifying thefile.json> above?
        Hide
        Noble Paul added a comment -

        There is already a clusterprop command why do we need a zkCli command

        I'm against zkCli command to arbitrarily edit content in ZK

        Show
        Noble Paul added a comment - There is already a clusterprop command why do we need a zkCli command I'm against zkCli command to arbitrarily edit content in ZK
        Hide
        Mark Miller added a comment -

        There is already a clusterprop command why do we need a zkCli command

        There is already a response to this question above ?

        Show
        Mark Miller added a comment - There is already a clusterprop command why do we need a zkCli command There is already a response to this question above ?
        Hide
        Noble Paul added a comment -

        I missed that.
        However , I agree with the recommendation given by Per Steffensen

        zkCli can update the whole file after doing an offline edit

        Show
        Noble Paul added a comment - I missed that. However , I agree with the recommendation given by Per Steffensen zkCli can update the whole file after doing an offline edit
        Hide
        Noble Paul added a comment -

        I propose a new command in zkcli

        zkcli.sh -zkhost 127.0.0.1:9983 -cmd clusterprops clusterprops.json
        

        It is not a generic command to upload json. But , it should be doing all the regular validations done by the CLUSTERPROP API before uploading the file

        Show
        Noble Paul added a comment - I propose a new command in zkcli zkcli.sh -zkhost 127.0.0.1:9983 -cmd clusterprops clusterprops.json It is not a generic command to upload json. But , it should be doing all the regular validations done by the CLUSTERPROP API before uploading the file
        Hide
        Hrishikesh Gadre added a comment -

        Ramkumar Aiyengar [~nobel.paul]Yonik SeeleyPer Steffensen

        I think the problem with the latest proposal (from Nobel) is that the user is expected to download the clusterprops.json file and update it correctly. Shouldn't this logic be part of Solr itself (to ensure backward compatibility w.r.t. schema changes etc.) ?

        How about just adding a separate class in Solr which would provide this functionality viz.

        • Download the ZK config
        • Update it correctly
        • Upload the ZK config

        We can potentially refactor the current collections API implementation to use this class (so as to avoid code duplication). We can optionally provide a separate script to invoke this class (or leave it for consumers to implement). This will address the concerns with respect to adding this functionality to zkcli.sh.

        Thoughts?

        Show
        Hrishikesh Gadre added a comment - Ramkumar Aiyengar [~nobel.paul] Yonik Seeley Per Steffensen I think the problem with the latest proposal (from Nobel) is that the user is expected to download the clusterprops.json file and update it correctly. Shouldn't this logic be part of Solr itself (to ensure backward compatibility w.r.t. schema changes etc.) ? How about just adding a separate class in Solr which would provide this functionality viz. Download the ZK config Update it correctly Upload the ZK config We can potentially refactor the current collections API implementation to use this class (so as to avoid code duplication). We can optionally provide a separate script to invoke this class (or leave it for consumers to implement). This will address the concerns with respect to adding this functionality to zkcli.sh. Thoughts?
        Hide
        Noble Paul added a comment -

        I would like to make another proposal

        zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https 
        

        This should behave exactly like the collections API . All the params and behavior will be same as CLUSTERPROP API but will work directly on the command line

        The advantage is that the user does not need to learn new param names and their semantics, Moreover we can extend the same pattern to all our other collection APIs as required

        Show
        Noble Paul added a comment - I would like to make another proposal zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https This should behave exactly like the collections API . All the params and behavior will be same as CLUSTERPROP API but will work directly on the command line The advantage is that the user does not need to learn new param names and their semantics, Moreover we can extend the same pattern to all our other collection APIs as required
        Hide
        Hrishikesh Gadre added a comment -

        >>The advantage is that the user does not need to learn new param names and their semantics, Moreover we can extend the same pattern to all our other collection APIs as required

        Sure. I like this idea. But can we define it as a separate script (in line with the earlier reasoning for not adding in the zkcli.sh) ?

        Show
        Hrishikesh Gadre added a comment - >>The advantage is that the user does not need to learn new param names and their semantics, Moreover we can extend the same pattern to all our other collection APIs as required Sure. I like this idea. But can we define it as a separate script (in line with the earlier reasoning for not adding in the zkcli.sh) ?
        Hide
        Jan Høydahl added a comment -

        Yea, I even wonder if we should have a Cluster API /admin/cluster/ and move commands like CLUSTERPROP, ADDROLE, REMOVEROLE, OVERSEERSTATUS, CLUSTERSTATUS away from collections API? Then we could have a cluster.sh which aids in calling these from cmdline. Of course some cmds may require Solr to be running while others can work with ZK only?

        Show
        Jan Høydahl added a comment - Yea, I even wonder if we should have a Cluster API /admin/cluster/ and move commands like CLUSTERPROP , ADDROLE , REMOVEROLE , OVERSEERSTATUS , CLUSTERSTATUS away from collections API? Then we could have a cluster.sh which aids in calling these from cmdline. Of course some cmds may require Solr to be running while others can work with ZK only?
        Hide
        Noble Paul added a comment -

        A lot of commands will need a running cluster. What good is an OVERSEERSTATUS without overseer.

        The only relevant one I see now is CLUSTERPROP . Andd there is an immediate need for that as well.

        Show
        Noble Paul added a comment - A lot of commands will need a running cluster. What good is an OVERSEERSTATUS without overseer. The only relevant one I see now is CLUSTERPROP . Andd there is an immediate need for that as well.
        Hide
        Per Steffensen added a comment -
        zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https
        

        I agree, except that it should not be the zkcli.sh tool that is extended. Since it is the collections API you make a CLI for, so to speak, make a collectionscli.sh script

        collectionscli.sh -zkhost 127.0.0.1:9983 -action CLUSTERPROP -name urlScheme -val https
        

        And later maybe

        collectionscli.sh -zkhost 127.0.0.1:9983 -action ADDROLE -role overseer -val ...
        

        etc

        It think also, that it needs to be considered how and if this is an extension/modification to the SolrCLI-tool (used from solr/bin/solr and solr/bin/solr.cmd)

        solr.sh CLUSTERPROP -zkhost 127.0.0.1:9983 -name urlScheme -val https
        

        Just saying, even though I do not like the current state of it, because of the enormous amounts of redundant code. But we do not want to end up with a million different cli-tools either.
        BTW, I think solr/bin/solr should be renamed to solr.sh, so I pretended above

        Show
        Per Steffensen added a comment - zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https I agree, except that it should not be the zkcli.sh tool that is extended. Since it is the collections API you make a CLI for, so to speak, make a collectionscli.sh script collectionscli.sh -zkhost 127.0.0.1:9983 -action CLUSTERPROP -name urlScheme -val https And later maybe collectionscli.sh -zkhost 127.0.0.1:9983 -action ADDROLE -role overseer -val ... etc It think also, that it needs to be considered how and if this is an extension/modification to the SolrCLI-tool (used from solr/bin/solr and solr/bin/solr.cmd) solr.sh CLUSTERPROP -zkhost 127.0.0.1:9983 -name urlScheme -val https Just saying, even though I do not like the current state of it, because of the enormous amounts of redundant code. But we do not want to end up with a million different cli-tools either. BTW, I think solr/bin/solr should be renamed to solr.sh, so I pretended above
        Hide
        Shawn Heisey added a comment -

        BTW, I think solr/bin/solr should be renamed to solr.sh, so I pretended above

        Renaming the script would be a bad idea, IMHO. With the current setup, you can use "bin/solr" at the commandline on *NIX and "bin\solr" on Windows, the only difference is the path separator, which will not be a surprise to most admins.

        If we rename solr to solr.sh, then the command will be different on *NIX and unified documentation becomes more difficult.

        If there is going to be any renaming, I believe it should be to remove .sh from the other scripts, so zkCli.sh becomes zkCli ... and it should be handled in a separate issue.

        Show
        Shawn Heisey added a comment - BTW, I think solr/bin/solr should be renamed to solr.sh, so I pretended above Renaming the script would be a bad idea, IMHO. With the current setup, you can use "bin/solr" at the commandline on *NIX and "bin\solr" on Windows, the only difference is the path separator, which will not be a surprise to most admins. If we rename solr to solr.sh, then the command will be different on *NIX and unified documentation becomes more difficult. If there is going to be any renaming, I believe it should be to remove .sh from the other scripts, so zkCli.sh becomes zkCli ... and it should be handled in a separate issue.
        Hide
        Noble Paul added a comment -

        I don't expect a lot of commands to be exposed with this. This will be used when you can't use the command because the solr cliuster is not up and running . It will be an expert thing Having a dedicated script for this seems overklll.

        I would still prefer to overload the zkCli command

        Show
        Noble Paul added a comment - I don't expect a lot of commands to be exposed with this. This will be used when you can't use the command because the solr cliuster is not up and running . It will be an expert thing Having a dedicated script for this seems overklll. I would still prefer to overload the zkCli command
        Hide
        Mark Miller added a comment -

        I would like to make another proposal, zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https

        Something along these lines seems like the best current proposal to me. I don't think it really calls for anything more expansive.

        If we rename solr to solr.sh, then the command will be different on *NIX and unified documentation becomes more difficult.

        FWIW, I think dropping the accepted and normal usage of file extensions to aid in unified doc is a terrible idea.

        Show
        Mark Miller added a comment - I would like to make another proposal, zkcli.sh -zkhost 127.0.0.1:9983 -collection-action CLUSTERPROP -name urlScheme -val https Something along these lines seems like the best current proposal to me. I don't think it really calls for anything more expansive. If we rename solr to solr.sh, then the command will be different on *NIX and unified documentation becomes more difficult. FWIW, I think dropping the accepted and normal usage of file extensions to aid in unified doc is a terrible idea.
        Hide
        Per Steffensen added a comment - - edited

        I agree, but from time to time I want to add a (async) command to the overseer while the cluster is not running, expecting the overseer to pick it up and execute it when I start my cluster. If you would enable this tool to do this kind of stuff, then suddenly most of the cluster-commands become relevant for this tool - if it is able to both execute the command directly (if supported - e.g. the CLUSTERPROP command) or to leave the command for execution by the overseer.
        And, if you have numerous machines that might or might not currently run a Solr-node, maybe you actually want to be able to run the OVERSEERSTATUS command as a command-line just to get a "not running" response.

        Show
        Per Steffensen added a comment - - edited I agree, but from time to time I want to add a (async) command to the overseer while the cluster is not running, expecting the overseer to pick it up and execute it when I start my cluster. If you would enable this tool to do this kind of stuff, then suddenly most of the cluster-commands become relevant for this tool - if it is able to both execute the command directly (if supported - e.g. the CLUSTERPROP command) or to leave the command for execution by the overseer. And, if you have numerous machines that might or might not currently run a Solr-node, maybe you actually want to be able to run the OVERSEERSTATUS command as a command-line just to get a "not running" response.
        Hide
        Per Steffensen added a comment -

        With the current setup, you can use "bin/solr" at the commandline on *NIX and "bin\solr" on Windows, the only difference is the path separator, which will not be a surprise to most admins

        Well I think it might come as a surprise to most *NIX admins that the script is just called "solr" - and not e.g. solr.sh. But never mind, this JIRA is not about that. I just had a hard time writing solr CLUSTERPROP ..., because I would have to think twice to understand it myself

        and it should be handled in a separate issue

        Yes, definitely, no one talked about doing the renaming in this issue

        Show
        Per Steffensen added a comment - With the current setup, you can use "bin/solr" at the commandline on *NIX and "bin\solr" on Windows, the only difference is the path separator, which will not be a surprise to most admins Well I think it might come as a surprise to most *NIX admins that the script is just called "solr" - and not e.g. solr.sh. But never mind, this JIRA is not about that. I just had a hard time writing solr CLUSTERPROP ... , because I would have to think twice to understand it myself and it should be handled in a separate issue Yes, definitely, no one talked about doing the renaming in this issue
        Hide
        Hrishikesh Gadre added a comment -

        Mark Miller [~nobel.paul] Thanks a lot for clarification. I will submit a patch shortly.

        Show
        Hrishikesh Gadre added a comment - Mark Miller [~nobel.paul] Thanks a lot for clarification. I will submit a patch shortly.
        Hide
        Shawn Heisey added a comment -

        FWIW, I think dropping the accepted and normal usage of file extensions to aid in unified doc is a terrible idea.

        Fair enough, and you carry a lot more weight around here than I do.

        Show
        Shawn Heisey added a comment - FWIW, I think dropping the accepted and normal usage of file extensions to aid in unified doc is a terrible idea. Fair enough, and you carry a lot more weight around here than I do.
        Hide
        Hrishikesh Gadre added a comment -

        Here is a patch as per our earlier discussion. The only difference is I have added 'CLUSTERPROP' as a regular command instead of 'collection-action' suggested since the commons options library does not accept option name embedded with '' (in between collection and action). I thought of other alternatives, but couldn't come up with a concise name. I think it should be OK to use 'cmd' instead. Let me know if you have other suggestions.

        Show
        Hrishikesh Gadre added a comment - Here is a patch as per our earlier discussion. The only difference is I have added 'CLUSTERPROP' as a regular command instead of ' collection-action' suggested since the commons options library does not accept option name embedded with ' ' (in between collection and action). I thought of other alternatives, but couldn't come up with a concise name. I think it should be OK to use 'cmd' instead. Let me know if you have other suggestions.
        Hide
        Noble Paul added a comment -

        Hrishikesh Gadre Looks good. Why can't we just eliminate the overseer from the picture completely? As you are doing a compare and set , there is no need to send it to Overseer. This can be completely handled at CollectionsHandler

        Show
        Noble Paul added a comment - Hrishikesh Gadre Looks good. Why can't we just eliminate the overseer from the picture completely? As you are doing a compare and set , there is no need to send it to Overseer. This can be completely handled at CollectionsHandler
        Hide
        Hrishikesh Gadre added a comment -

        Sure it can be done. I didn't want to alter the existing design (just refactor the common code). Let me make that change and resubmit the patch.

        Show
        Hrishikesh Gadre added a comment - Sure it can be done. I didn't want to alter the existing design (just refactor the common code). Let me make that change and resubmit the patch.
        Hide
        Hrishikesh Gadre added a comment -

        Updated patch which removes the Overseer interaction for CLUSTERPROP API

        Show
        Hrishikesh Gadre added a comment - Updated patch which removes the Overseer interaction for CLUSTERPROP API
        Hide
        Noble Paul added a comment -

        the patch does not seem to be in sync with the trunk

        Show
        Noble Paul added a comment - the patch does not seem to be in sync with the trunk
        Hide
        Per Steffensen added a comment -

        Why can't we just eliminate the overseer from the picture completely?

        Not that it is very important in this case, but there is a problem in general with having several threads doing fetch -> update-locally -> store on state concurrently without locking (pessimistically or optimistically). Example, two threads running concurrently

        • Thread#1 wants to do the task of setting "urlScheme" to "http":
          • fetches {"urlScheme":"https", "autoAddReplicas": "true"}
          • changes it to {"urlScheme":"http", "autoAddReplicas": "true"}

            and stores it

        • Thread#2 wants to do the task of setting "autoAddReplicas" to "false":
          • fetches {"urlScheme":"https", "autoAddReplicas": "true"}
          • changes it to {"urlScheme":"https", "autoAddReplicas": "false"}

            and stores it

        Without locking they can run concurrently and you will end up with a wrong state

        • {"urlScheme":"http", "autoAddReplicas": "true"}
        • or {"urlScheme":"https", "autoAddReplicas": "false"}

        But you actually expected

        {"urlScheme":"http", "autoAddReplicas": "false"}

        I do not know what the initial thought was about the Overseer, but I think of it as a simple way to get around this locking - making sure that there is never more than one thread updating state.

        When that is said, if the above was the intention with the Overseer, it does not work today, because CollectionsHandler.handleProp is doing the fetch and update, and only leaves the storing to Overseer. I would like to see the entire job handed over to the Overseer, so that it does both fetch, update and store - so that you can avoid the concurrency scenario above. In general Overseer should execute entire admin-jobs and not only parts of them.

        Anyway, it is a reason not to do this kind of updates without taking locks, and Overseer is a primitive way of "taking lock", and maybe therefore "do not eliminate the Overseer". I am not sure it is especially important here.

        Show
        Per Steffensen added a comment - Why can't we just eliminate the overseer from the picture completely? Not that it is very important in this case, but there is a problem in general with having several threads doing fetch -> update-locally -> store on state concurrently without locking (pessimistically or optimistically). Example, two threads running concurrently Thread#1 wants to do the task of setting "urlScheme" to "http": fetches {"urlScheme":"https", "autoAddReplicas": "true"} changes it to {"urlScheme":"http", "autoAddReplicas": "true"} and stores it Thread#2 wants to do the task of setting "autoAddReplicas" to "false": fetches {"urlScheme":"https", "autoAddReplicas": "true"} changes it to {"urlScheme":"https", "autoAddReplicas": "false"} and stores it Without locking they can run concurrently and you will end up with a wrong state {"urlScheme":"http", "autoAddReplicas": "true"} or {"urlScheme":"https", "autoAddReplicas": "false"} But you actually expected {"urlScheme":"http", "autoAddReplicas": "false"} I do not know what the initial thought was about the Overseer, but I think of it as a simple way to get around this locking - making sure that there is never more than one thread updating state. When that is said, if the above was the intention with the Overseer, it does not work today, because CollectionsHandler.handleProp is doing the fetch and update, and only leaves the storing to Overseer. I would like to see the entire job handed over to the Overseer, so that it does both fetch, update and store - so that you can avoid the concurrency scenario above. In general Overseer should execute entire admin-jobs and not only parts of them. Anyway, it is a reason not to do this kind of updates without taking locks, and Overseer is a primitive way of "taking lock", and maybe therefore "do not eliminate the Overseer". I am not sure it is especially important here.
        Hide
        Noble Paul added a comment -

        Without locking they can run concurrently and you will end up with a wrong state

        No

        That is called compare and set in Zookeeper

        Show
        Noble Paul added a comment - Without locking they can run concurrently and you will end up with a wrong state No That is called compare and set in Zookeeper
        Hide
        Hrishikesh Gadre added a comment -

        Please find the attached patch working with latest trunk code.

        Show
        Hrishikesh Gadre added a comment - Please find the attached patch working with latest trunk code.
        Hide
        Per Steffensen added a comment - - edited

        Believe it is referred to as compare-and-swap, but anyway Overseer.handleProp does not use that feature (it always uses version=-1 when calling zk.setData). But actually I just realized that it is not true that CollectionsHandler.handleProp does the fetch-update and only leaves update to Overseer (as I claimed above) - it actually does forward the entire operation to Overseer so that the Overseer does fetch, update and store (just at I wanted it to). So you are right, that the problem I sketched above is not a problem in todays code, but it is not due to usage of compare-and-swap - it is because two threads will never try to do updates at the same time - they will ask the Overseer to do the updates (single-threaded = primitive pessimistic lock).

        So we cannot "just eliminate the overseer from the picture completely", if we still want to avoid the (more or less "theoretical") problem I sketched above.

        So the bullet-safe command-line solution should take this into consideration - e.g.

        • 1) Take the overseer-lock before performing the operation
        • 2) Use compare-and-swap (and make sure Overseer also does)

        I believe I would prefer 1) because it is the most generally useable solution to the problem. Compare-and-swap (even combined with ZK multi-op feature) will not always be sufficient for operations that want to update several znodes atomically - and who knows, maybe some day we also want to that kind of stuff using command-line. Taking a pessimistic lock (like the Overseer-lock) always will be sufficient.

        Show
        Per Steffensen added a comment - - edited Believe it is referred to as compare-and-swap, but anyway Overseer.handleProp does not use that feature (it always uses version=-1 when calling zk.setData). But actually I just realized that it is not true that CollectionsHandler.handleProp does the fetch-update and only leaves update to Overseer (as I claimed above) - it actually does forward the entire operation to Overseer so that the Overseer does fetch, update and store (just at I wanted it to). So you are right, that the problem I sketched above is not a problem in todays code, but it is not due to usage of compare-and-swap - it is because two threads will never try to do updates at the same time - they will ask the Overseer to do the updates (single-threaded = primitive pessimistic lock). So we cannot "just eliminate the overseer from the picture completely", if we still want to avoid the (more or less "theoretical") problem I sketched above. So the bullet-safe command-line solution should take this into consideration - e.g. 1) Take the overseer-lock before performing the operation 2) Use compare-and-swap (and make sure Overseer also does) I believe I would prefer 1) because it is the most generally useable solution to the problem. Compare-and-swap (even combined with ZK multi-op feature) will not always be sufficient for operations that want to update several znodes atomically - and who knows, maybe some day we also want to that kind of stuff using command-line. Taking a pessimistic lock (like the Overseer-lock) always will be sufficient.
        Hide
        Hrishikesh Gadre added a comment -

        >>I believe I would prefer 1) because it is the most generally useable solution to the problem. Compare-and-swap (even combined with ZK multi-op feature) will not always be sufficient for operations that want to update several znodes atomically - and who knows, maybe some day we also want to that kind of stuff using command-line. Taking a pessimistic lock (like the Overseer-lock) always will be sufficient.

        The original use-case for this feature is to have an ability to update the cluster properties even when Solr cluster is offline. Hence the fix for this use-case can not really depend upon the overseer lock. Also as others mentioned in the JIRA above, we are trying to address a very specific problem (i.e. ability to update contents of /clusterprops.json ZNODE). Typically these updates should be very infrequent (e.g. why would user flip between SSL/non-SSL mode frequently ?). So I believe using optimistic locking should be fine.

        Thoughts?

        Show
        Hrishikesh Gadre added a comment - >>I believe I would prefer 1) because it is the most generally useable solution to the problem. Compare-and-swap (even combined with ZK multi-op feature) will not always be sufficient for operations that want to update several znodes atomically - and who knows, maybe some day we also want to that kind of stuff using command-line. Taking a pessimistic lock (like the Overseer-lock) always will be sufficient. The original use-case for this feature is to have an ability to update the cluster properties even when Solr cluster is offline. Hence the fix for this use-case can not really depend upon the overseer lock. Also as others mentioned in the JIRA above, we are trying to address a very specific problem (i.e. ability to update contents of /clusterprops.json ZNODE). Typically these updates should be very infrequent (e.g. why would user flip between SSL/non-SSL mode frequently ?). So I believe using optimistic locking should be fine. Thoughts?
        Hide
        Noble Paul added a comment -

        optimistic locking is fine.
        But you can do a retry loop if an attempt to write fails

        Show
        Noble Paul added a comment - optimistic locking is fine. But you can do a retry loop if an attempt to write fails
        Hide
        Hrishikesh Gadre added a comment -

        Are you suggesting that CLI should run a retry loop? I would prefer not to retry automatically since it could overwrite some other directive of some other CLI invocation. In my current patch, I have added a check to see if the value is set correctly (after receiving an error). If it is, then the invocation is successful. If not, we fail the invocation. This also works in scenarios where two invocations attempt to set the same value.

        Adding retries would also raise questions like - how many retries are sufficient? Is there an exponential backoff policy? What is the time interval between two retries etc. All of this can be implemented outside Solr very easily if required (e.g. a script can run the zkcli command in a loop until it succeeds).

        Show
        Hrishikesh Gadre added a comment - Are you suggesting that CLI should run a retry loop? I would prefer not to retry automatically since it could overwrite some other directive of some other CLI invocation. In my current patch, I have added a check to see if the value is set correctly (after receiving an error). If it is, then the invocation is successful. If not, we fail the invocation. This also works in scenarios where two invocations attempt to set the same value. Adding retries would also raise questions like - how many retries are sufficient? Is there an exponential backoff policy? What is the time interval between two retries etc. All of this can be implemented outside Solr very easily if required (e.g. a script can run the zkcli command in a loop until it succeeds).
        Hide
        Per Steffensen added a comment -

        Hence the fix for this use-case can not really depend upon the overseer lock

        Sure you can. You can say that you have to own the Overseer-lock, in order to be able to perform this kind of admin-tasks. The CLI job can try to take the lock and then perform the operation. If it cannot get the lock (maybe retry for a period of time), ask the Overseer to do it (either by doing the corresponding HTTP request or by leaving it directly on the Overseer-queue) and wait synchronously for the Overseer to handle it (it must be running since it the lock is taken).
        But optimistic locking (compare-and-swap) is probably the best in this case. Only thing I fear is that if that approach is established as "the way it is done" it will be repeated in upcoming cases where it might not be sufficient. Sometime it is worth the effort to establish the best platform to build upon from the beginning.

        Show
        Per Steffensen added a comment - Hence the fix for this use-case can not really depend upon the overseer lock Sure you can. You can say that you have to own the Overseer-lock, in order to be able to perform this kind of admin-tasks. The CLI job can try to take the lock and then perform the operation. If it cannot get the lock (maybe retry for a period of time), ask the Overseer to do it (either by doing the corresponding HTTP request or by leaving it directly on the Overseer-queue) and wait synchronously for the Overseer to handle it (it must be running since it the lock is taken). But optimistic locking (compare-and-swap) is probably the best in this case. Only thing I fear is that if that approach is established as "the way it is done" it will be repeated in upcoming cases where it might not be sufficient. Sometime it is worth the effort to establish the best platform to build upon from the beginning.
        Hide
        Per Steffensen added a comment -

        New patch

        I would prefer not to retry automatically

        Adding "Try again" to error-msg from ZkCLI if KeeperException.BadVersionException (to hint the user that it will probably go well if he just tries again).

        Besides that

        • Also "// Don't update ZK unless absolutely necessary" in the remove-case
        • Sharing constants instead of introducing new similar ones. Because IMHO we shouldnt keep adding the same constants again and again, but more important: If HTTP is /admin/collections?action=CLUSTERPROP&name=<prop-name>&val=<prop-value>, then the command-line should be zkcli.sh … -cmd CLUSTERPROP -name <prop-name> -val <prop-value>, AND if the HTTP ever changes (e.g. using “clusterprop.set” instead of “CLUSTERPROP”) so should the command-line
        • Adding test of remove
        • Closing ZkStateReader in CollectionsHandler, ZkCLI and ZkCLITest
        Show
        Per Steffensen added a comment - New patch I would prefer not to retry automatically Adding "Try again" to error-msg from ZkCLI if KeeperException.BadVersionException (to hint the user that it will probably go well if he just tries again). Besides that Also "// Don't update ZK unless absolutely necessary" in the remove-case Sharing constants instead of introducing new similar ones. Because IMHO we shouldnt keep adding the same constants again and again, but more important: If HTTP is /admin/collections?action=CLUSTERPROP&name=<prop-name>&val=<prop-value> , then the command-line should be zkcli.sh … -cmd CLUSTERPROP -name <prop-name> -val <prop-value> , AND if the HTTP ever changes (e.g. using “clusterprop.set” instead of “CLUSTERPROP”) so should the command-line Adding test of remove Closing ZkStateReader in CollectionsHandler, ZkCLI and ZkCLITest
        Hide
        Hrishikesh Gadre added a comment -

        OK I see your point now. Yes this make sense.

        Show
        Hrishikesh Gadre added a comment - OK I see your point now. Yes this make sense.
        Hide
        Hrishikesh Gadre added a comment -

        >>Adding "Try again" to error-msg from ZkCLI if KeeperException.BadVersionException (to hint the user that it will probably go well if he just tries again).

        This would also be valid for NodeExistsException (would happen first time when the ZNODE is not yet created and two clients attempt to create it).

        >>Closing ZkStateReader in CollectionsHandler, ZkCLI and ZkCLITest

        I also thought about it. But it seems like when ZkStateReader does not own the ZK client (e.g. in CollectionsHandler), it shouldn't close it? Any thoughts?

        Show
        Hrishikesh Gadre added a comment - >>Adding "Try again" to error-msg from ZkCLI if KeeperException.BadVersionException (to hint the user that it will probably go well if he just tries again). This would also be valid for NodeExistsException (would happen first time when the ZNODE is not yet created and two clients attempt to create it). >>Closing ZkStateReader in CollectionsHandler, ZkCLI and ZkCLITest I also thought about it. But it seems like when ZkStateReader does not own the ZK client (e.g. in CollectionsHandler), it shouldn't close it? Any thoughts?
        Hide
        Per Steffensen added a comment -

        This would also be valid for NodeExistsException

        Hmmm, believe you are right

        I also thought about it. But it seems like when ZkStateReader does not own the ZK client (e.g. in CollectionsHandler), it shouldn't close it? Any thoughts?

        ZkStateReader does not own the ZK client if you use the ZkStateReader(SolrZkClient zkClient) constructor - and we do in all the mentioned caces. And it does not close the ZK client on close, when it does not own it. Hence in the mentioned cases calling close really does not do much, except setting closed=true. I just thought it was the right thing to do - call close when you create and finished using a closable component.

        Show
        Per Steffensen added a comment - This would also be valid for NodeExistsException Hmmm, believe you are right I also thought about it. But it seems like when ZkStateReader does not own the ZK client (e.g. in CollectionsHandler), it shouldn't close it? Any thoughts? ZkStateReader does not own the ZK client if you use the ZkStateReader(SolrZkClient zkClient) constructor - and we do in all the mentioned caces. And it does not close the ZK client on close, when it does not own it. Hence in the mentioned cases calling close really does not do much, except setting closed=true. I just thought it was the right thing to do - call close when you create and finished using a closable component.
        Hide
        Per Steffensen added a comment -

        Added "Try again" in error-msg, also on NodeExistsException

        Show
        Per Steffensen added a comment - Added "Try again" in error-msg, also on NodeExistsException
        Hide
        ASF subversion and git services added a comment -

        Commit 1675331 from Noble Paul in branch 'dev/trunk'
        [ https://svn.apache.org/r1675331 ]

        SOLR-7176: zkcli script can perfrom the CLUSTERPROP command without a running Solr cluster

        Show
        ASF subversion and git services added a comment - Commit 1675331 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1675331 ] SOLR-7176 : zkcli script can perfrom the CLUSTERPROP command without a running Solr cluster
        Hide
        ASF subversion and git services added a comment -

        Commit 1675335 from Noble Paul in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1675335 ]

        SOLR-7176: zkcli script can perfrom the CLUSTERPROP command without a running Solr cluster

        Show
        ASF subversion and git services added a comment - Commit 1675335 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1675335 ] SOLR-7176 : zkcli script can perfrom the CLUSTERPROP command without a running Solr cluster
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Noble Paul
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development