Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6.0
    • Component/s: core
    • Labels:
      None

      Description

      This patch improves the automatic OS image selection. The idea is to develop against stable, vanilla OS installs. Now it should automatically select Ubuntu 10.04 LTS packaged by Canonical on Amazon EC2.

      1. WHIRR-341.patch
        5 kB
        Andrei Savu
      2. WHIRR-341.patch
        0.7 kB
        Adrian Cole

        Issue Links

          Activity

          Hide
          Andrei Savu added a comment -

          Hard coded image IDs as discussed and documented this change. Only tested for ZooKeeper with both OS families on EC2.

          Show
          Andrei Savu added a comment - Hard coded image IDs as discussed and documented this change. Only tested for ZooKeeper with both OS families on EC2.
          Hide
          Adrian Cole added a comment -

          I don't agree with this change, but I do think we should document the template parameters that passed tests (and led to a release of whirr).

          We should be optimizing for predictability across images, not honing for only a single version with a set of patches frozen in time. Users will certainly want to use whirr on more platforms as time progresses. I agree that locking in image ids will make things easy for us in the short term, but not the long term as users will encounter problems we didn't catch. Besides, devs can already set whirr test properties, so if one of us wants to only use a favorite image, we can already do that.

          Whirr needs to work as new images release. Amazon, canonical etc. regularly release new versions of AMIs for a specific os/version id. There are good reasons for them to update these, for example security patches. That said, sometimes image producers change an image without an id update (bad practice imho), so imageId isn't always going to get the stability you desire. Finally, the maintenance of image id per provider/region/os mix is a pretty big job and requires constant attention. This isn't a legacy I'd recommend us entering.

          If image updates become troublesome, I'd instead recommend fortification. For example, automated forensics gathering, or hardening configuration scripts to reveal dependencies needed or incompatible with a specific service role.

          Regardless of what we do to be more bullet-proof wrt image updates, I do believe we should document the configuration tested during a release. This should include all facets of the configuration including the hardware and location id; basically the ids in the template, not just image ids.

          Show
          Adrian Cole added a comment - I don't agree with this change, but I do think we should document the template parameters that passed tests (and led to a release of whirr). We should be optimizing for predictability across images, not honing for only a single version with a set of patches frozen in time. Users will certainly want to use whirr on more platforms as time progresses. I agree that locking in image ids will make things easy for us in the short term, but not the long term as users will encounter problems we didn't catch. Besides, devs can already set whirr test properties, so if one of us wants to only use a favorite image, we can already do that. Whirr needs to work as new images release. Amazon, canonical etc. regularly release new versions of AMIs for a specific os/version id. There are good reasons for them to update these, for example security patches. That said, sometimes image producers change an image without an id update (bad practice imho), so imageId isn't always going to get the stability you desire. Finally, the maintenance of image id per provider/region/os mix is a pretty big job and requires constant attention. This isn't a legacy I'd recommend us entering. If image updates become troublesome, I'd instead recommend fortification. For example, automated forensics gathering, or hardening configuration scripts to reveal dependencies needed or incompatible with a specific service role. Regardless of what we do to be more bullet-proof wrt image updates, I do believe we should document the configuration tested during a release. This should include all facets of the configuration including the hardware and location id; basically the ids in the template, not just image ids.
          Hide
          Andrei Savu added a comment -

          > we should document the template parameters that passed tests (and led to a release of whirr)

          +1

          > We should be optimizing for predictability across images, not honing for only a single version with a set of patches frozen in time.

          I agree but I'm not sure this task is doable without slowing down the development even more. This is why I am proposing this change.

          > Besides, devs can already set whirr test properties, so if one of us wants to only use a favorite image, we can already do that.

          I don't understand how this simplifies things if we still want to support as many random AMIs?

          > Finally, the maintenance of image id per provider/region/os mix is a pretty big job and requires constant attention. This isn't a legacy I'd recommend us entering.

          IMO this is better than having tests that randomly fail as new images are added or updated.

          > If image updates become troublesome, I'd instead recommend fortification. For example, automated forensics gathering, or hardening configuration scripts to reveal dependencies needed or incompatible with a specific service role.

          +1 is anyone available to work on this?

          Show
          Andrei Savu added a comment - > we should document the template parameters that passed tests (and led to a release of whirr) +1 > We should be optimizing for predictability across images, not honing for only a single version with a set of patches frozen in time. I agree but I'm not sure this task is doable without slowing down the development even more. This is why I am proposing this change. > Besides, devs can already set whirr test properties, so if one of us wants to only use a favorite image, we can already do that. I don't understand how this simplifies things if we still want to support as many random AMIs? > Finally, the maintenance of image id per provider/region/os mix is a pretty big job and requires constant attention. This isn't a legacy I'd recommend us entering. IMO this is better than having tests that randomly fail as new images are added or updated. > If image updates become troublesome, I'd instead recommend fortification. For example, automated forensics gathering, or hardening configuration scripts to reveal dependencies needed or incompatible with a specific service role. +1 is anyone available to work on this?
          Hide
          Adrian Cole added a comment -

          I think we should use this issue to document the "random changing id" problem before establishing a new legacy of maintenance. If the problem of a changed image breaking everything occurs as often, randomly, and severely as you suggest, we'll get evidence to support this move quickly.

          When that does, I'd suggest recognizing the image ids that are attached in the patch. We can use these to compare against the random id that comes back and breaks the test.

          P.s. note that very few clouds outside Amazon have this problem as most base images are cloud provider maintained.

          Show
          Adrian Cole added a comment - I think we should use this issue to document the "random changing id" problem before establishing a new legacy of maintenance. If the problem of a changed image breaking everything occurs as often, randomly, and severely as you suggest, we'll get evidence to support this move quickly. When that does, I'd suggest recognizing the image ids that are attached in the patch. We can use these to compare against the random id that comes back and breaks the test. P.s. note that very few clouds outside Amazon have this problem as most base images are cloud provider maintained.
          Hide
          Adrian Cole added a comment -

          Here's a thought: summary of my objections are:
          1. introduces maintenance and manual discovery of updated image ids
          2. not a well documented problem (ex. did this happen during a jclouds upgrade or really randomly)
          3. doesn't cover the inputs that can produce problems (ex. location+image+hardware+options such as spot pricing are better)

          I think we can get more data about change when it happens, and also have an additional benefit of documentation of what was last tested. I think we can accomplish the goal of understanding change without preventing it or making change very manual.

          Apologies for not offering a solution before, that was lazy of me how's this?

          We can create a code helper to check template values against last tested file before running a test. When the values change, warn and overwrite the file. On some flag, forcibly use old values.

          In this case, we should get our documentation automatically, only implying a check-in. We also don't need to discover new ids as they will come in automatically. Finally, on error, testing is easy as you just run with the flag that uses last tested.

          (note I don't care json vs yaml)

          use basedir to establish lasttested directory (ex. services/cassandra)

          serialize inputs to templateBuilder to a string and lookup its corresponding json file. (ex. gogrid/default.json or aws-ec2/ubuntu-10.04.json)

          build the template and make a map of ids, check this vs what's in that file

          ex.
          map.put("locationId", currentTemplate.getLocation().getId())
          map.put("hardwareId", currentTemplate.getHardware().getId())
          map.put("imageId", currentTemplate.getImage().getId())

          if (!toJson(map).equals(lastTested.toString())
          warn me that we last tested something different

          if (useOnlyLastTestedFlag)

          { map = fromJson(lastTested) template = templateBuilder.imageId(map.get("imageId")... }

          if (!toJson(map).equals(lastTested.toString())
          // serialize to disk

          Note that in clouds such as vCloud, or any private cloud image and location ids are different per-user, so we'll probably have to think about this more. However, this should just "work" with gogrid, rackspace, aws-ec2, elastichosts or any other cloud with public scoped image ids. If we find "thrashing id" problem on public clouds, it is a sign we should revise our templateBuilder expression. FWIW: I'm happy to also implement this on the jclouds side so that when whirr goes to next version, you can inspect the last ids jclouds tested against.

          Show
          Adrian Cole added a comment - Here's a thought: summary of my objections are: 1. introduces maintenance and manual discovery of updated image ids 2. not a well documented problem (ex. did this happen during a jclouds upgrade or really randomly) 3. doesn't cover the inputs that can produce problems (ex. location+image+hardware+options such as spot pricing are better) I think we can get more data about change when it happens, and also have an additional benefit of documentation of what was last tested. I think we can accomplish the goal of understanding change without preventing it or making change very manual. Apologies for not offering a solution before, that was lazy of me how's this? We can create a code helper to check template values against last tested file before running a test. When the values change, warn and overwrite the file. On some flag, forcibly use old values. In this case, we should get our documentation automatically, only implying a check-in. We also don't need to discover new ids as they will come in automatically. Finally, on error, testing is easy as you just run with the flag that uses last tested. (note I don't care json vs yaml) use basedir to establish lasttested directory (ex. services/cassandra) serialize inputs to templateBuilder to a string and lookup its corresponding json file. (ex. gogrid/default.json or aws-ec2/ubuntu-10.04.json) build the template and make a map of ids, check this vs what's in that file ex. map.put("locationId", currentTemplate.getLocation().getId()) map.put("hardwareId", currentTemplate.getHardware().getId()) map.put("imageId", currentTemplate.getImage().getId()) if (!toJson(map).equals(lastTested.toString()) warn me that we last tested something different if (useOnlyLastTestedFlag) { map = fromJson(lastTested) template = templateBuilder.imageId(map.get("imageId")... } if (!toJson(map).equals(lastTested.toString()) // serialize to disk Note that in clouds such as vCloud, or any private cloud image and location ids are different per-user, so we'll probably have to think about this more. However, this should just "work" with gogrid, rackspace, aws-ec2, elastichosts or any other cloud with public scoped image ids. If we find "thrashing id" problem on public clouds, it is a sign we should revise our templateBuilder expression. FWIW: I'm happy to also implement this on the jclouds side so that when whirr goes to next version, you can inspect the last ids jclouds tested against.
          Hide
          Andrei Savu added a comment -

          Adrian let me try to clarify this misunderstanding . The main reasons I'm trying to propose this change are:

          • we should develop against a vanilla OS install so that install / configure scripts do not depend on packages that are not available by default (the Rightscale EBS AMI selected by default is far from this). This should make the code portable enough to work across a wide range of arbitrary AMIs.
          • EBS images add an extra costs (Amazon specific)
          • we should have a flag that we can use to change between testing against apt-get / yum based systems
          • we should be able to publish for each release the AMIs we used for testing

          As you can see we want almost the same things! I agree that probably my change is not the best way to do this and I'm open to alternatives.

          Note that in clouds such as vCloud, or any private cloud image and location ids are different per-user, so we'll probably have to think about this more

          If you look at the code you will see that if the cloud provider is unknown will use the jclouds template selection mechanism.

          Show
          Andrei Savu added a comment - Adrian let me try to clarify this misunderstanding . The main reasons I'm trying to propose this change are: we should develop against a vanilla OS install so that install / configure scripts do not depend on packages that are not available by default (the Rightscale EBS AMI selected by default is far from this). This should make the code portable enough to work across a wide range of arbitrary AMIs. EBS images add an extra costs (Amazon specific) we should have a flag that we can use to change between testing against apt-get / yum based systems we should be able to publish for each release the AMIs we used for testing As you can see we want almost the same things! I agree that probably my change is not the best way to do this and I'm open to alternatives. Note that in clouds such as vCloud, or any private cloud image and location ids are different per-user, so we'll probably have to think about this more If you look at the code you will see that if the cloud provider is unknown will use the jclouds template selection mechanism.
          Hide
          Adrian Cole added a comment -

          hey andrei, I totally agree with you. the problem is that we cannot supply precise enough qualifications as-is. I really appreciate your elaborating on this, and can 100% agree with this plan to use image ids until we can qualify what we want. I'd suggest we note in comments or something the reasons why, such as you did in this comment. That way, it is more transparent what we are looking for.

          Can you do me a favor and add your commentary above to one or all of threads below? Last time I tried to rally for support more these fine-grained qualifications, we didn't get quite enough buy-in to implement:

          http://groups.google.com/group/jclouds-dev/browse_thread/thread/71917fe9a015b3a8/b870c2a7ad870b3d?lnk=gst&q=ownerid#b870c2a7ad870b3d <- better selections
          http://groups.google.com/group/jclouds-dev/browse_thread/thread/b83becdd57ca1415 <- test against apt-get/yum based as core + include relevant criteria per-os
          http://code.google.com/p/jclouds/issues/detail?id=539 <- prefer s3 backed image

          Show
          Adrian Cole added a comment - hey andrei, I totally agree with you. the problem is that we cannot supply precise enough qualifications as-is. I really appreciate your elaborating on this, and can 100% agree with this plan to use image ids until we can qualify what we want. I'd suggest we note in comments or something the reasons why, such as you did in this comment. That way, it is more transparent what we are looking for. Can you do me a favor and add your commentary above to one or all of threads below? Last time I tried to rally for support more these fine-grained qualifications, we didn't get quite enough buy-in to implement: http://groups.google.com/group/jclouds-dev/browse_thread/thread/71917fe9a015b3a8/b870c2a7ad870b3d?lnk=gst&q=ownerid#b870c2a7ad870b3d <- better selections http://groups.google.com/group/jclouds-dev/browse_thread/thread/b83becdd57ca1415 <- test against apt-get/yum based as core + include relevant criteria per-os http://code.google.com/p/jclouds/issues/detail?id=539 <- prefer s3 backed image
          Hide
          Adrian Cole added a comment -

          also, not to make our lives difficult, but s3 backed image won't work on the t1.micro, so probably doesn't save us any money on services that will run on t1.micro. Also, the cluster compute images are tricky as you cannot use normal amis, moreover troublesome as cluster compute is only in 1 region. In all, we have some extremes where hardware profiles will suggest a different image. ex t1.micro and cluster have more specific choices whereas all the midrange sizes could safely use the same image.

          You might want to look at the enhanced image query syntax for aws-ec2 scheduled for jclouds 1.1.0
          http://code.google.com/p/jclouds/issues/detail?id=613

          Show
          Adrian Cole added a comment - also, not to make our lives difficult, but s3 backed image won't work on the t1.micro, so probably doesn't save us any money on services that will run on t1.micro. Also, the cluster compute images are tricky as you cannot use normal amis, moreover troublesome as cluster compute is only in 1 region. In all, we have some extremes where hardware profiles will suggest a different image. ex t1.micro and cluster have more specific choices whereas all the midrange sizes could safely use the same image. You might want to look at the enhanced image query syntax for aws-ec2 scheduled for jclouds 1.1.0 http://code.google.com/p/jclouds/issues/detail?id=613
          Hide
          Andrei Savu added a comment -

          not to make our lives difficult, but s3 backed image won't work on the t1.micro, so probably doesn't save us any money on services that will run on t1.micro

          I know and I'm still thinking about how we should handle that case.

          You might want to look at the enhanced image query syntax for aws-ec2 scheduled for jclouds 1.1.0

          Looks really nice!

          Show
          Andrei Savu added a comment - not to make our lives difficult, but s3 backed image won't work on the t1.micro, so probably doesn't save us any money on services that will run on t1.micro I know and I'm still thinking about how we should handle that case. You might want to look at the enhanced image query syntax for aws-ec2 scheduled for jclouds 1.1.0 Looks really nice!
          Hide
          Adrian Cole added a comment -

          patch that makes sure we use canonical amis in aws-ec2, but not the testing or daily ones

          Show
          Adrian Cole added a comment - patch that makes sure we use canonical amis in aws-ec2, but not the testing or daily ones
          Hide
          Andrei Savu added a comment -

          I think we should commit this in 0.6.0 (the patch created by Adrian). It makes tests more predictable.

          Show
          Andrei Savu added a comment - I think we should commit this in 0.6.0 (the patch created by Adrian). It makes tests more predictable.
          Hide
          Adrian Cole added a comment -

          +1 except the jclouds version in this patch needs to change once we get 1.1.0 out

          Show
          Adrian Cole added a comment - +1 except the jclouds version in this patch needs to change once we get 1.1.0 out
          Hide
          Andrei Savu added a comment -

          Updated title and description to match the patch.

          Show
          Andrei Savu added a comment - Updated title and description to match the patch.
          Hide
          Andrei Savu added a comment -

          I've just committed this. Thanks Adrian!

          Show
          Andrei Savu added a comment - I've just committed this. Thanks Adrian!

            People

            • Assignee:
              Adrian Cole
              Reporter:
              Andrei Savu
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development