Whirr
  1. Whirr
  2. WHIRR-325

Reduce cloud provider-specific code in scripts

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Many of the bash functions have a cloud provider option (-c) that they use to determine how to find the local public/private IP address, or the local data directory (/mnt vs /data, for example).

      We should pass what is needed (e.g. IP addresses, directories), or at least put the code in a common function.

      1. WHIRR-325.patch
        66 kB
        Andrei Savu
      2. WHIRR-325.patch
        65 kB
        Andrei Savu
      3. WHIRR-325.patch
        21 kB
        Tom White
      4. WHIRR-325.patch
        20 kB
        Tom White
      5. WHIRR-325.patch
        13 kB
        Tom White
      6. WHIRR-325-rewrite-run-script.patch
        5 kB
        Andrei Savu

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon WHIRR-354 [ WHIRR-354 ]
          Gavin made changes -
          Link This issue depends on WHIRR-354 [ WHIRR-354 ]
          Gavin made changes -
          Link This issue depends upon WHIRR-327 [ WHIRR-327 ]
          Gavin made changes -
          Link This issue depends on WHIRR-327 [ WHIRR-327 ]
          Hide
          Tom White added a comment -

          Thanks for everyone's help on getting this in.

          Show
          Tom White added a comment - Thanks for everyone's help on getting this in.
          Andrei Savu made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.7.0 [ 12317571 ]
          Resolution Fixed [ 1 ]
          Hide
          Andrei Savu added a comment -

          Committed. Thanks Tom! Thanks David for reviewing.

          Show
          Andrei Savu added a comment - Committed. Thanks Tom! Thanks David for reviewing.
          Hide
          Andrei Savu added a comment -

          I'm going to commit this and we can fix any remaining issues in new JIRAs.

          Show
          Andrei Savu added a comment - I'm going to commit this and we can fix any remaining issues in new JIRAs.
          Hide
          David Alves added a comment -

          +1, the patch applies cleanly and my own tests are still passing.

          Show
          David Alves added a comment - +1, the patch applies cleanly and my own tests are still passing.
          Andrei Savu made changes -
          Attachment WHIRR-325.patch [ 12498598 ]
          Hide
          Andrei Savu added a comment -

          Updated patch for current trunk and added parallel configure script execution on all machines. I have executed all integration tests on cloudservers-us and hbase on ec2.

          @Adrian How is ComputeService.runScriptOnNode exposing ssh authentication issues? Is it a good idea to retry execution on IllegalStateException?

          Show
          Andrei Savu added a comment - Updated patch for current trunk and added parallel configure script execution on all machines. I have executed all integration tests on cloudservers-us and hbase on ec2. @Adrian How is ComputeService.runScriptOnNode exposing ssh authentication issues? Is it a good idea to retry execution on IllegalStateException?
          Hide
          Andrei Savu added a comment -

          I have discussed with Adrian about a fix in jclouds. I'm preparing a pull request today so maybe we can use that as we upgrade.

          Show
          Andrei Savu added a comment - I have discussed with Adrian about a fix in jclouds. I'm preparing a pull request today so maybe we can use that as we upgrade.
          Hide
          Tom White added a comment -

          +1 looks good.

          I agree with Paul that sleep times should be tunable in general, but in this case it's a local process (i.e. not calling out to other instances) and it only serves to suppress an error message as far as I can tell. So it's worth adding until it's fixed in jclouds, since the message is misleading for users, but I'm not sure if it needs to be tunable in this case.

          Show
          Tom White added a comment - +1 looks good. I agree with Paul that sleep times should be tunable in general, but in this case it's a local process (i.e. not calling out to other instances) and it only serves to suppress an error message as far as I can tell. So it's worth adding until it's fixed in jclouds, since the message is misleading for users, but I'm not sure if it needs to be tunable in this case.
          Andrei Savu made changes -
          Attachment WHIRR-325.patch [ 12498089 ]
          Hide
          Andrei Savu added a comment -

          In this patch I have updated all service to use the env variables. All integration tests pass on rackspace.

          Next: test on aws, look into reliability issues as Paul suggested (thanks!).

          Feedback is highly appreciated.

          Show
          Andrei Savu added a comment - In this patch I have updated all service to use the env variables. All integration tests pass on rackspace. Next: test on aws, look into reliability issues as Paul suggested (thanks!). Feedback is highly appreciated.
          Hide
          Paul Baclace added a comment -

          Any time sleep is used for consistency the amount of time to pre-wait should be a tunable parameter because it is a heuristic that depends on system load. For instance, on EC2, different regions at different times might require different values to have 99% probability of being ready.

          Moreover, if no retries are implemented yet for a particular operation, then it is even more important to be tunable.

          The tunable property should be suggested in the error msg that is seen when the operation fails.

          Show
          Paul Baclace added a comment - Any time sleep is used for consistency the amount of time to pre-wait should be a tunable parameter because it is a heuristic that depends on system load. For instance, on EC2, different regions at different times might require different values to have 99% probability of being ready. Moreover, if no retries are implemented yet for a particular operation, then it is even more important to be tunable. The tunable property should be suggested in the error msg that is seen when the operation fails.
          Hide
          Andrei Savu added a comment -

          I've decided that I'm going to add a 'sleep 4' statement as a workaround. Unfortunately the zookeeper integration tests are failing for me on ec2 with this change. Will try again tomorrow morning.

          Show
          Andrei Savu added a comment - I've decided that I'm going to add a 'sleep 4' statement as a workaround. Unfortunately the zookeeper integration tests are failing for me on ec2 with this change. Will try again tomorrow morning.
          Hide
          Andrei Savu added a comment -

          The problem occurred with scripts with a very short execution time.

          Do you think we should open an issue in jclouds for this? Adrian? I will try to add a test in jclouds to replicate the failure.

          Show
          Andrei Savu added a comment - The problem occurred with scripts with a very short execution time. Do you think we should open an issue in jclouds for this? Adrian? I will try to add a test in jclouds to replicate the failure.
          Hide
          Karel Vervaeke added a comment -

          I have seen the 'did not start' error before while working on run-script for BYON.
          The problem occurred with scripts with a very short execution time. Worked around it by adding 'sleep 2' to those scripts, or setting wrapInInitScript=false.

          Show
          Karel Vervaeke added a comment - I have seen the 'did not start' error before while working on run-script for BYON. The problem occurred with scripts with a very short execution time. Worked around it by adding 'sleep 2' to those scripts, or setting wrapInInitScript=false.
          Andrei Savu made changes -
          Assignee Andrei Savu [ savu.andrei ]
          Tom White made changes -
          Attachment WHIRR-325.patch [ 12497670 ]
          Hide
          Tom White added a comment -

          I updated the patch to the latest trunk. The ZK integration test passes on EC2 and Rackspace. However, the following message gets printed to the console, even though there is no problem starting the service:

          Error running script:
          aborting: jclouds-script-1317748237929 did not start
          

          I'm not sure why this is happening - it seems to be in the jclouds "forget" bash function.

          Andrei - it would be great if you could take a look/takeover the patch. Thanks!

          Show
          Tom White added a comment - I updated the patch to the latest trunk. The ZK integration test passes on EC2 and Rackspace. However, the following message gets printed to the console, even though there is no problem starting the service: Error running script: aborting: jclouds-script-1317748237929 did not start I'm not sure why this is happening - it seems to be in the jclouds "forget" bash function. Andrei - it would be great if you could take a look/takeover the patch. Thanks!
          Hide
          Andrei Savu added a comment -

          Tom are you still working on this? I can takeover if you want. It would be great if we could ship this in 0.7.0.

          Show
          Andrei Savu added a comment - Tom are you still working on this? I can takeover if you want. It would be great if we could ship this in 0.7.0.
          Tom White made changes -
          Attachment WHIRR-325.patch [ 12492858 ]
          Hide
          Tom White added a comment -

          Updated version of the patch with an update to ZooKeeper, which removes some code. The patch is not ready for commit since it's not working reliably for me yet.

          Show
          Tom White added a comment - Updated version of the patch with an update to ZooKeeper, which removes some code. The patch is not ready for commit since it's not working reliably for me yet.
          Hide
          Andrei Savu added a comment -

          Looks good to me.

          Show
          Andrei Savu added a comment - Looks good to me.
          Tom White made changes -
          Attachment WHIRR-325.patch [ 12490468 ]
          Hide
          Tom White added a comment -

          This patch show the changes needed to StatementBuilder to set environment variables with the instance data (such as IP address). It still needs work to update the service scripts to use the variables.

          Show
          Tom White added a comment - This patch show the changes needed to StatementBuilder to set environment variables with the instance data (such as IP address). It still needs work to update the service scripts to use the variables.
          Tom White made changes -
          Link This issue depends on WHIRR-354 [ WHIRR-354 ]
          Hide
          Andrei Savu added a comment -

          We can resume work on this one as soon as we upgrade to jclouds 1.1.0.

          Show
          Andrei Savu added a comment - We can resume work on this one as soon as we upgrade to jclouds 1.1.0.
          Hide
          Tom White added a comment -

          > We could rewrite "whirr run-script" in terms of runScriptOnNode

          Yes, this would work.

          Rather than adding methods to ClusterController, I think we need a RunScriptClusterAction which encapsulates this logic. This would also address WHIRR-324.

          Show
          Tom White added a comment - > We could rewrite "whirr run-script" in terms of runScriptOnNode Yes, this would work. Rather than adding methods to ClusterController, I think we need a RunScriptClusterAction which encapsulates this logic. This would also address WHIRR-324 .
          Hide
          Andrei Savu added a comment -

          I have create an issue for solving this: http://code.google.com/p/jclouds/issues/detail?id=597

          Show
          Andrei Savu added a comment - I have create an issue for solving this: http://code.google.com/p/jclouds/issues/detail?id=597
          Andrei Savu made changes -
          Attachment WHIRR-325-rewrite-run-script.patch [ 12481703 ]
          Hide
          Andrei Savu added a comment -

          I've done some work on this one (trying to rewrite "whirr run-script") and I believe I've discovered a bug in jclouds.

          It seems like runScriptOnNode ignores the new credentials provided as RunScriptOptions.

          I believe the error should be fixed in BaseComputeService.runScriptOnNode and the code should be similar to BaseComputeService.TransformNodesIntoInitializedScriptRunners.apply

          Show
          Andrei Savu added a comment - I've done some work on this one (trying to rewrite "whirr run-script") and I believe I've discovered a bug in jclouds. It seems like runScriptOnNode ignores the new credentials provided as RunScriptOptions. I believe the error should be fixed in BaseComputeService.runScriptOnNode and the code should be similar to BaseComputeService.TransformNodesIntoInitializedScriptRunners.apply
          Hide
          Andrei Savu added a comment -

          > We could use this along with runScriptOnNode() to pass IP addresses, etc. Is this what you had in mind?

          Exactly.

          > One possible downside to this approach is that using "whirr run-script" wouldn't be able to set these environment variables

          We could rewrite "whirr run-script" in terms of runScriptOnNode.

          Actually I think it would be useful to have a runScriptOnNode method on the ClusterController that would set all the required environment variables and we could call that by using an executor service.

          Show
          Andrei Savu added a comment - > We could use this along with runScriptOnNode() to pass IP addresses, etc. Is this what you had in mind? Exactly. > One possible downside to this approach is that using "whirr run-script" wouldn't be able to set these environment variables We could rewrite "whirr run-script" in terms of runScriptOnNode. Actually I think it would be useful to have a runScriptOnNode method on the ClusterController that would set all the required environment variables and we could call that by using an executor service.
          Hide
          Tom White added a comment -

          > I would use ScriptBuilder.addEnvironmentVariableScope for this.

          I hadn't seen this before. We could use this along with runScriptOnNode() to pass IP addresses, etc. Is this what you had in mind?

          One possible downside to this approach is that using "whirr run-script" wouldn't be able to set these environment variables, so any scripts that take advantage of them would could not be run like this.

          Show
          Tom White added a comment - > I would use ScriptBuilder.addEnvironmentVariableScope for this. I hadn't seen this before. We could use this along with runScriptOnNode() to pass IP addresses, etc. Is this what you had in mind? One possible downside to this approach is that using "whirr run-script" wouldn't be able to set these environment variables, so any scripts that take advantage of them would could not be run like this.
          Hide
          Andrei Savu added a comment -

          I would use ScriptBuilder.addEnvironmentVariableScope for this.

          Show
          Andrei Savu added a comment - I would use ScriptBuilder.addEnvironmentVariableScope for this.
          Tom White made changes -
          Field Original Value New Value
          Link This issue depends on WHIRR-327 [ WHIRR-327 ]
          Hide
          Tom White added a comment -

          Perfect. I think this is the way to go - using an executor service to run things concurrently.

          Show
          Tom White added a comment - Perfect. I think this is the way to go - using an executor service to run things concurrently.
          Hide
          Adrian Cole added a comment -

          when we move to jclouds 1.0.0, we can use the single-node feature avoiding the overhead you mention.

          ExecResponse runScriptOnNode(String id, String runScript);

          Cheers!

          Show
          Adrian Cole added a comment - when we move to jclouds 1.0.0, we can use the single-node feature avoiding the overhead you mention. ExecResponse runScriptOnNode(String id, String runScript); Cheers!
          Hide
          Tom White added a comment -

          Passing node-specific scripts (which is what is needed to pass each node its IP addresses) would need a modification to jclouds, since ComputeService#runScriptOnNodesMatching runs the same script on all nodes. Calling this API for each node in turn with a different script is inefficient since jclouds queries for all the running nodes and filters each time.

          Show
          Tom White added a comment - Passing node-specific scripts (which is what is needed to pass each node its IP addresses) would need a modification to jclouds, since ComputeService#runScriptOnNodesMatching runs the same script on all nodes. Calling this API for each node in turn with a different script is inefficient since jclouds queries for all the running nodes and filters each time.
          Tom White created issue -

            People

            • Assignee:
              Andrei Savu
              Reporter:
              Tom White
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development