Details

      Description

      If the installation tarball URL is a file:// URL then Whirr could upload it to a blobstore and automatically download it onto the instances before running the install script. This change would make it very easy to test development versions of Hadoop, or other services.

      1. BlobCopyOperation.java
        3 kB
        Tom White
      2. WHIRR-220.patch
        22 kB
        Andrei Savu
      3. WHIRR-220.patch
        3 kB
        Andrei Savu
      4. WHIRR-220.patch
        87 kB
        Andrei Savu
      5. WHIRR-220.patch
        67 kB
        Andrei Savu
      6. WHIRR-220.patch
        33 kB
        Andrei Savu
      7. WHIRR-220.patch
        28 kB
        Andrei Savu
      8. WHIRR-220.patch
        14 kB
        Andrei Savu
      9. WHIRR-220.patch
        12 kB
        Andrei Savu
      10. WHIRR-220.patch
        15 kB
        Andrei Savu
      11. WHIRR-220.patch
        10 kB
        Andrei Savu
      12. WHIRR-220-for-trunk.patch
        32 kB
        Andrei Savu
      13. WHIRR-220-trunk.patch
        51 kB
        Andrei Savu

        Issue Links

          Activity

          Hide
          Andrei Savu added a comment -

          I've just committed this. I have also removed all star and unused imports. Thanks guys for all the help!

          Show
          Andrei Savu added a comment - I've just committed this. I have also removed all star and unused imports. Thanks guys for all the help!
          Hide
          Tom White added a comment -

          +1

          There are a few stray star imports that should be removed before committing.

          Show
          Tom White added a comment - +1 There are a few stray star imports that should be removed before committing.
          Hide
          Andrei Savu added a comment -

          Updated patch for all services and tested it on AWS. I've also updated the code to use the new FirewallManager. I believe it's ready.

          Show
          Andrei Savu added a comment - Updated patch for all services and tested it on AWS. I've also updated the code to use the new FirewallManager. I believe it's ready.
          Hide
          Adrian Cole added a comment -

          +1 thanks for clearing this up!

          Show
          Adrian Cole added a comment - +1 thanks for clearing this up!
          Hide
          Tom White added a comment -

          +1 This looks good to me.

          Show
          Tom White added a comment - +1 This looks good to me.
          Hide
          Andrei Savu added a comment -

          I've created a minimal patch with a new approach. Let me know what you think (only works for zookeeper: whirr.zookeeper.tarball.url).

          I will fix the remaining issues and resubmit the patch based on your feedback (I don't want to go through the process of re-implementing this feature again).

          Show
          Andrei Savu added a comment - I've created a minimal patch with a new approach. Let me know what you think (only works for zookeeper: whirr.zookeeper.tarball.url). I will fix the remaining issues and resubmit the patch based on your feedback (I don't want to go through the process of re-implementing this feature again).
          Hide
          Andrei Savu added a comment -

          Thanks Tom for reviewing. I will rebuild the patch by splitting the work in multiple JIRAs so that it's easier to review and commit (e.g refactoring work, removing deprecated code, allowing users to specify tarurl for zookeeper etc.).

          Show
          Andrei Savu added a comment - Thanks Tom for reviewing. I will rebuild the patch by splitting the work in multiple JIRAs so that it's easier to review and commit (e.g refactoring work, removing deprecated code, allowing users to specify tarurl for zookeeper etc.).
          Hide
          Tom White added a comment -

          Thanks for the new patch Andrei. I'm not sure about adding the HttpRequest to the Configuration object. I think we should avoid adding complex objects to configuration (even though it's only used internally), since configurations should be easily serialized. How about having the cluster action handlers upload the tarball, and add the retrieve statement to the statement list? The blobstore container name can be added to the configuration (as a string), and deleted on teardown.

          A few other comments:

          • TemporaryBlobStore#uploadLocalFiles will upload anything whose value begins with "file://". We should really only upload known artifacts, governed by key. This would not be a problem with the approach described above.
          • Unless there's a strong reason to do it here, it might be simpler to move InstanceTemplate to a top-level class in a separate JIRA.
          • Add the new blobstore files in ClusterSpec to its equals, hashCode and toString methods.
          Show
          Tom White added a comment - Thanks for the new patch Andrei. I'm not sure about adding the HttpRequest to the Configuration object. I think we should avoid adding complex objects to configuration (even though it's only used internally), since configurations should be easily serialized. How about having the cluster action handlers upload the tarball, and add the retrieve statement to the statement list? The blobstore container name can be added to the configuration (as a string), and deleted on teardown. A few other comments: TemporaryBlobStore#uploadLocalFiles will upload anything whose value begins with "file://". We should really only upload known artifacts, governed by key. This would not be a problem with the approach described above. Unless there's a strong reason to do it here, it might be simpler to move InstanceTemplate to a top-level class in a separate JIRA. Add the new blobstore files in ClusterSpec to its equals, hashCode and toString methods.
          Hide
          Andrei Savu added a comment -

          Updated patch to use signed requests to download uploaded files. Unfortunately the HBase integration tests are failing and I don't understand why (yet).

          It would be great if you could review the patch and give feedback on code structure.

          Show
          Andrei Savu added a comment - Updated patch to use signed requests to download uploaded files. Unfortunately the HBase integration tests are failing and I don't understand why (yet). It would be great if you could review the patch and give feedback on code structure.
          Hide
          Andrei Savu added a comment -

          Now I understand the motivation. I will update the patch tomorrow to use signed URLs.

          Show
          Andrei Savu added a comment - Now I understand the motivation. I will update the patch tomorrow to use signed URLs.
          Hide
          Adrian Cole added a comment - - edited

          guessing urls is a losing battle, right? in order to make something public in rackspace you have to use their cdn system, which makes oddly named urls.

          Biggest problem with public urls are:
          1. you incur a billing event in rackspace by using the CDN even w/in their cloud.
          2. not all public clouds support public urls
          3. this probably won't work at all in private cloud
          4. accidentally leaving your files open to the public will run up your bill

          This isn't to say that request signing is w/o warts, just that it might be better to attack the private blob problem, either via request signing, or installing a local file manager.

          Show
          Adrian Cole added a comment - - edited guessing urls is a losing battle, right? in order to make something public in rackspace you have to use their cdn system, which makes oddly named urls. Biggest problem with public urls are: 1. you incur a billing event in rackspace by using the CDN even w/in their cloud. 2. not all public clouds support public urls 3. this probably won't work at all in private cloud 4. accidentally leaving your files open to the public will run up your bill This isn't to say that request signing is w/o warts, just that it might be better to attack the private blob problem, either via request signing, or installing a local file manager.
          Hide
          Andrei Savu added a comment -

          Tom, I've fixed all previous issues and I've done some refactoring for the ClusterSpec class.

          I'm still making the uploaded assets public. Do we need to fix this now (it's almost impossible to guess the container url)?

          Show
          Andrei Savu added a comment - Tom, I've fixed all previous issues and I've done some refactoring for the ClusterSpec class. I'm still making the uploaded assets public. Do we need to fix this now (it's almost impossible to guess the container url)?
          Hide
          Tom White added a comment -

          > Yes. I've updated the patch for WHIRR-222. +1 for committing that patch first.

          Done.

          > What if I we keep the current behaviour as default and make the new configuration parameters optional?

          That would be OK, but long term it would be better to have jclouds handle the mapping.

          > I was going to move the code to a new class. Any ideas for a good name?

          BlobStoreManager? It can be package private since it's only used by Service.

          > Let's not worry about this now. We could handle this while solving the cluster state persistence problem.

          I agree.

          Show
          Tom White added a comment - > Yes. I've updated the patch for WHIRR-222 . +1 for committing that patch first. Done. > What if I we keep the current behaviour as default and make the new configuration parameters optional? That would be OK, but long term it would be better to have jclouds handle the mapping. > I was going to move the code to a new class. Any ideas for a good name? BlobStoreManager? It can be package private since it's only used by Service. > Let's not worry about this now. We could handle this while solving the cluster state persistence problem. I agree.
          Hide
          Andrei Savu added a comment -

          > [...] would it be easier to commit WHIRR-222 then do this one?

          Yes. I've updated the patch for WHIRR-222. +1 for committing that patch first.

          > It would be better to add a blobstore provider property, rather than inferring it from the provider.

          What if I we keep the current behaviour as default and make the new configuration parameters optional?

          > ClusterSpec should really be little more than a configuration holder, with little to no logic.

          I agree. I was going to move the code to a new class. Any ideas for a good name?

          > Should uploaded files be removed at cluster destroy time?

          Let's not worry about this now. We could handle this while solving the cluster state persistence problem.

          > You can copy the remote resource to the local filesystem on each node using a SaveHttpResponseTo statement.

          Sounds like a good approach.

          Show
          Andrei Savu added a comment - > [...] would it be easier to commit WHIRR-222 then do this one? Yes. I've updated the patch for WHIRR-222 . +1 for committing that patch first. > It would be better to add a blobstore provider property, rather than inferring it from the provider. What if I we keep the current behaviour as default and make the new configuration parameters optional? > ClusterSpec should really be little more than a configuration holder, with little to no logic. I agree. I was going to move the code to a new class. Any ideas for a good name? > Should uploaded files be removed at cluster destroy time? Let's not worry about this now. We could handle this while solving the cluster state persistence problem. > You can copy the remote resource to the local filesystem on each node using a SaveHttpResponseTo statement. Sounds like a good approach.
          Hide
          Tom White added a comment -

          Overall, this is looking good. A few comments:

          • Does this subsume WHIRR-222? If so, this patch could incorporate that code - e.g. not checking for a null tarball, since whirr.<service>.tarball.url is always set. Or would it be easier to commit WHIRR-222 then do this one?
          • It would be better to add a blobstore provider property, rather than inferring it from the provider. One may want to run on one one provider and use a different provider's blobstore. It should be possible to specify different credentials for compute and blobstore.
          • ClusterSpec should really be little more than a configuration holder, with little to no logic. The logic for uploading the files to the blobstore is complex enough to warrant its own class (and unit tests).
          • Should uploaded files be removed at cluster destroy time? This might be a better approach for expanding a cluster, since the tarball wouldn't need uploading again. However, there's a question of how to persist the tarball URL, so perhaps we shouldn't worry about this yet.

          > it's not trivial to push custom headers to bash install scripts

          You can copy the remote resource to the local filesystem on each node using a SaveHttpResponseTo statement. Then set the URL to a file:/// URL pointing to the local file. The advantage to doing this would be that you can use a signed request, which is portable across providers.

          I've attached a class that I knocked up to do this which might be useful. (Note that SaveHttpResponseTo is in jclouds beta 9, so a copy is not needed.)

          Show
          Tom White added a comment - Overall, this is looking good. A few comments: Does this subsume WHIRR-222 ? If so, this patch could incorporate that code - e.g. not checking for a null tarball, since whirr.<service>.tarball.url is always set. Or would it be easier to commit WHIRR-222 then do this one? It would be better to add a blobstore provider property, rather than inferring it from the provider. One may want to run on one one provider and use a different provider's blobstore. It should be possible to specify different credentials for compute and blobstore. ClusterSpec should really be little more than a configuration holder, with little to no logic. The logic for uploading the files to the blobstore is complex enough to warrant its own class (and unit tests). Should uploaded files be removed at cluster destroy time? This might be a better approach for expanding a cluster, since the tarball wouldn't need uploading again. However, there's a question of how to persist the tarball URL, so perhaps we shouldn't worry about this yet. > it's not trivial to push custom headers to bash install scripts You can copy the remote resource to the local filesystem on each node using a SaveHttpResponseTo statement. Then set the URL to a file:/// URL pointing to the local file. The advantage to doing this would be that you can use a signed request, which is portable across providers. I've attached a class that I knocked up to do this which might be useful. (Note that SaveHttpResponseTo is in jclouds beta 9, so a copy is not needed.)
          Hide
          Andrei Savu added a comment -

          In this patch I've added an integration test and checked that it works on cloudservers.

          From a functional point of view the patch is almost complete and it only needs cleanup.

          Adrian, when are you guys going to release jclouds 1.0-beta-10?

          Show
          Andrei Savu added a comment - In this patch I've added an integration test and checked that it works on cloudservers. From a functional point of view the patch is almost complete and it only needs cleanup. Adrian, when are you guys going to release jclouds 1.0-beta-10?
          Hide
          Adrian Cole added a comment -

          fyi public blobstore functionality is now in jclouds, headed for beta-10 http://code.google.com/p/jclouds/issues/detail?id=80

          Show
          Adrian Cole added a comment - fyi public blobstore functionality is now in jclouds, headed for beta-10 http://code.google.com/p/jclouds/issues/detail?id=80
          Hide
          Andrei Savu added a comment -

          Updated patch:

          • fixed some bugs: elasticsearch change dir in configure function, tarball url override for HBase
          • whirr.cassandra.version.major is now optional with 0.7 as a default value
          • all services accept an option called whirr.[service].tarball.url
          • upgraded zookeeper to 3.3.3

          I've been able to start a HBase cluster using only local tarballs.

          Remaining issues:

          • integration tests
          • test on cloudservers
          Show
          Andrei Savu added a comment - Updated patch: fixed some bugs: elasticsearch change dir in configure function, tarball url override for HBase whirr.cassandra.version.major is now optional with 0.7 as a default value all services accept an option called whirr. [service] .tarball.url upgraded zookeeper to 3.3.3 I've been able to start a HBase cluster using only local tarballs. Remaining issues: integration tests test on cloudservers
          Hide
          Andrei Savu added a comment -

          Updated patch. Tested on AWS for cassandra and elasticsearch. Remaining issues:

          • it should be possible to set the tarball url for hadoop and zookeeper
          • add integration tests
          • test on cloudservers

          Adrian, I'm still setting the ACLs on uploaded files to public-read because it's not trivial to push custom headers to bash install scripts.

          Show
          Andrei Savu added a comment - Updated patch. Tested on AWS for cassandra and elasticsearch. Remaining issues: it should be possible to set the tarball url for hadoop and zookeeper add integration tests test on cloudservers Adrian, I'm still setting the ACLs on uploaded files to public-read because it's not trivial to push custom headers to bash install scripts.
          Hide
          Andrei Savu added a comment -

          Working patch (only tested on AWS). I have been able to start an elasticsearch instance using a local archive.

          Show
          Andrei Savu added a comment - Working patch (only tested on AWS). I have been able to start an elasticsearch instance using a local archive.
          Hide
          Adrian Cole added a comment -

          signing often modifies both the headers and the request line. Have a look at a couple features in jclouds that take this into consideration:
          ComputeServiceUtils.execHttpResponse(HttpRequest request)
          Statement extractTargzIntoDirectory(HttpRequest targz, String directory)
          Statement extractZipIntoDirectory(HttpRequest zip, String directory)

          Show
          Adrian Cole added a comment - signing often modifies both the headers and the request line. Have a look at a couple features in jclouds that take this into consideration: ComputeServiceUtils.execHttpResponse(HttpRequest request) Statement extractTargzIntoDirectory(HttpRequest targz, String directory) Statement extractZipIntoDirectory(HttpRequest zip, String directory)
          Hide
          Andrei Savu added a comment -

          A signed URL should be good enough. I have tried to use BlobSigning like this:

          HttpRequest request = context.getSigner().signGetBlob(bucketName, forUpload.getName());
          LOG.info("Uploaded to {}", request.getEndpoint().toString());
          

          I'm getting an unsigned URL. Am I missing something? I will investigate more today.

          Show
          Andrei Savu added a comment - A signed URL should be good enough. I have tried to use BlobSigning like this: HttpRequest request = context.getSigner().signGetBlob(bucketName, forUpload.getName()); LOG.info( "Uploaded to {}" , request.getEndpoint().toString()); I'm getting an unsigned URL. Am I missing something? I will investigate more today.
          Hide
          Adrian Cole added a comment -

          Just curious... Is there a requirement to make blobs public, if the contents ard removed after setup? Maybe blob signing would work just as well. BlobSigning is already supported in jclouds and can be used with curl, wget, etc.

          Show
          Adrian Cole added a comment - Just curious... Is there a requirement to make blobs public, if the contents ard removed after setup? Maybe blob signing would work just as well. BlobSigning is already supported in jclouds and can be used with curl, wget, etc.
          Hide
          Andrei Savu added a comment -

          In this patch I've added code (provider specific) that will set public-read ACLs on the newly created blobs. Adrian, I'm planning to remove this code when jclouds will support ACLs. I hope to make this patch work tomorrow and fix the test suite.

          Some of the remaining issues:

          • generate pseudo-random bucket name for temporary files
          • remove uploaded files after bootstrap
          • set blob store default zone (if possible)
          • all services should accept whirr.[service].tarball.url as a parameter
          Show
          Andrei Savu added a comment - In this patch I've added code (provider specific) that will set public-read ACLs on the newly created blobs. Adrian, I'm planning to remove this code when jclouds will support ACLs. I hope to make this patch work tomorrow and fix the test suite. Some of the remaining issues: generate pseudo-random bucket name for temporary files remove uploaded files after bootstrap set blob store default zone (if possible) all services should accept whirr. [service] .tarball.url as a parameter
          Hide
          Andrei Savu added a comment -

          That would be great! I will take a look at the jclouds integration tests for blob stores.

          Show
          Andrei Savu added a comment - That would be great! I will take a look at the jclouds integration tests for blob stores.
          Hide
          Adrian Cole added a comment -

          @andrei I can help work with you on http://code.google.com/p/jclouds/issues/detail?id=80 which is what you want wrt public acls. That's going to be better tested in the long run. wrt bucket name generation, we have this in our test classes. generally, you create some fake name with maybe the current user as the name plus some random thing, or consistent naming. This code is in jclouds BaseBlobStoreIntegrationTest

          Show
          Adrian Cole added a comment - @andrei I can help work with you on http://code.google.com/p/jclouds/issues/detail?id=80 which is what you want wrt public acls. That's going to be better tested in the long run. wrt bucket name generation, we have this in our test classes. generally, you create some fake name with maybe the current user as the name plus some random thing, or consistent naming. This code is in jclouds BaseBlobStoreIntegrationTest
          Hide
          Andrei Savu added a comment -

          First try (not yet working). Adrian, I've got a few questions:

          • does jclouds provides a way of generating buckets with random names or I need to do this in my code?
          • where can I find some code samples showing how to make blobs public (for reading) on AWS and Rackspace?

          I'm open to any suggestions.

          Show
          Andrei Savu added a comment - First try (not yet working). Adrian, I've got a few questions: does jclouds provides a way of generating buckets with random names or I need to do this in my code? where can I find some code samples showing how to make blobs public (for reading) on AWS and Rackspace? I'm open to any suggestions.
          Hide
          Tom White added a comment -

          In standalone mode (WHIRR-239) it should do nothing, since the tarball is available on the local filesystem.

          Show
          Tom White added a comment - In standalone mode ( WHIRR-239 ) it should do nothing, since the tarball is available on the local filesystem.

            People

            • Assignee:
              Andrei Savu
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development