Whirr
  1. Whirr
  2. WHIRR-415

Let users specify the CDH release (cdh3u1, cdh3u2)

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: service/cdh
    • Labels:
      None

      Description

      Allow explicit selection of the CDH release you want to use. The current behaviour is to use the latest one.

      1. WHIRR-415-whirr.env.patch
        13 kB
        Alex Heneveld
      2. WHIRR-415-env.patch
        12 kB
        Alex Heneveld
      3. WHIRR-415.patch
        2 kB
        Tom White
      4. WHIRR-415.patch
        14 kB
        Andrei Savu

        Activity

        Hide
        Swathi V added a comment -

        Will the same thing work for cdh4 for specifying different releases?

        Show
        Swathi V added a comment - Will the same thing work for cdh4 for specifying different releases?
        Hide
        Andrei Savu added a comment -

        Committed. Thanks Alex!

        Show
        Andrei Savu added a comment - Committed. Thanks Alex!
        Hide
        Alex Heneveld added a comment -

        +1

        Had the chance to review patch, looks good, and integration tests all passing for me on AWS. Hurrah!

        Show
        Alex Heneveld added a comment - +1 Had the chance to review patch, looks good, and integration tests all passing for me on AWS. Hurrah!
        Hide
        Alex Heneveld added a comment -

        Patch looks sensible, might not have the chance to test it for a few days.

        Combining version test with regular test is simpler, although my thinking was that it was better to have integration test which tests a specific version which is testably distinct from what you get when you request the latest.

        Although that said, I was (with all the WHIRR-427 errors) wondering whether we might hit version errors at the pom level – locally we are locked to one version for the project, so could be problematic testing multiple versions on the other end if they aren't compatible with the local one.

        So fine either way.

        Show
        Alex Heneveld added a comment - Patch looks sensible, might not have the chance to test it for a few days. Combining version test with regular test is simpler, although my thinking was that it was better to have integration test which tests a specific version which is testably distinct from what you get when you request the latest. Although that said, I was (with all the WHIRR-427 errors) wondering whether we might hit version errors at the pom level – locally we are locked to one version for the project, so could be problematic testing multiple versions on the other end if they aren't compatible with the local one. So fine either way.
        Hide
        Andrei Savu added a comment -

        Also working fine with cloudservers. +1 from me. We can address any comments from Tom regarding adding the ability to set arbitrary environment variables as part of 0.8.0.

        Show
        Andrei Savu added a comment - Also working fine with cloudservers. +1 from me. We can address any comments from Tom regarding adding the ability to set arbitrary environment variables as part of 0.8.0.
        Hide
        Andrei Savu added a comment -

        PS: integration tests working as expected on aws-ec2 now testing on cloudservers.

        Show
        Andrei Savu added a comment - PS: integration tests working as expected on aws-ec2 now testing on cloudservers.
        Hide
        Andrei Savu added a comment -

        Alex I have made the following changes to your patch:

        • moved the version test to CdhHadoopServiceTest to avoid starting a new cluster from scratch - it takes too long
        • locked all cdh tests to chd3u2
        • change hadoop-jobtracker+hadoop-namenode to hadoop-namenode+hadoop-jobtracker to make the integration test work

        Let me know what you think.

        Show
        Andrei Savu added a comment - Alex I have made the following changes to your patch: moved the version test to CdhHadoopServiceTest to avoid starting a new cluster from scratch - it takes too long locked all cdh tests to chd3u2 change hadoop-jobtracker+hadoop-namenode to hadoop-namenode+hadoop-jobtracker to make the integration test work Let me know what you think.
        Hide
        Andrei Savu added a comment -

        Now working on this. I'm trying also to find the root cause for WHIRR-427.

        Show
        Andrei Savu added a comment - Now working on this. I'm trying also to find the root cause for WHIRR-427 .
        Hide
        Andrei Savu added a comment -

        Looks good. Why do we need a new test class CdhHadoopServiceVersionTest? I think we can add just a new method to CdhHadoopServiceTest and make setUp @BeforeClass and tearDown @AfterClass. What do you think?

        Tom you've raised some concerns about adding the ability to setup arbitrary environment variables. Can you elaborate?

        Show
        Andrei Savu added a comment - Looks good. Why do we need a new test class CdhHadoopServiceVersionTest? I think we can add just a new method to CdhHadoopServiceTest and make setUp @BeforeClass and tearDown @AfterClass. What do you think? Tom you've raised some concerns about adding the ability to setup arbitrary environment variables. Can you elaborate?
        Hide
        Alex Heneveld added a comment -

        revised patch uses whirr.env.XXX syntax, and tests an explicit version separately with explicit CDH versions in pom ... though integration test ultimately still fails due to WHIRR-427

        Show
        Alex Heneveld added a comment - revised patch uses whirr. env.XXX syntax, and tests an explicit version separately with explicit CDH versions in pom ... though integration test ultimately still fails due to WHIRR-427
        Hide
        Andrei Savu added a comment -

        I think it makes sense to have the ability to add arbitrary environment variables in the .properties file (e.g. whirr.env.REPO=chd3u1) but I would add this in another JIRA. For this one I would go for something like whirr.cdh.repo=cdh3u1.

        Guys what do you think if we move this to 0.8.0 (next release) and only implement env vars because that feature can be used as a workaround so that Paul can do his work as needed? Tom why do you worry about setting arbitrary env variables?

        Show
        Andrei Savu added a comment - I think it makes sense to have the ability to add arbitrary environment variables in the .properties file (e.g. whirr.env.REPO=chd3u1) but I would add this in another JIRA. For this one I would go for something like whirr.cdh.repo=cdh3u1. Guys what do you think if we move this to 0.8.0 (next release) and only implement env vars because that feature can be used as a workaround so that Paul can do his work as needed? Tom why do you worry about setting arbitrary env variables?
        Hide
        Alex Heneveld added a comment -

        A whirr property would be nicer, I quite agree – something in me just objects to three new roles+ClusterActionHandler classes for CDH flavours of hadoop/hbase/zookeep, just to have the property called something nicer. WDYT?

        Show
        Alex Heneveld added a comment - A whirr property would be nicer, I quite agree – something in me just objects to three new roles+ClusterActionHandler classes for CDH flavours of hadoop/hbase/zookeep, just to have the property called something nicer. WDYT?
        Hide
        Tom White added a comment -

        The tarball.url overload was a hack to get ideas on how to do this better! Alex, your patch is much better.

        However, the decision to use environment variables vs. properties is an implementation detail, which I'm not sure is something the user cares about. (I also worry about the ability to set arbitrary environment variables.) E.g. in this case the setting should be "whirr.hadoop.repo=cdh3u1" or similar. Perhaps we can just have another property for that?

        Show
        Tom White added a comment - The tarball.url overload was a hack to get ideas on how to do this better! Alex, your patch is much better. However, the decision to use environment variables vs. properties is an implementation detail, which I'm not sure is something the user cares about. (I also worry about the ability to set arbitrary environment variables.) E.g. in this case the setting should be "whirr.hadoop.repo=cdh3u1" or similar. Perhaps we can just have another property for that?
        Hide
        Alex Heneveld added a comment -

        See WHIRR-427 for failing CDH integration tests.

        Show
        Alex Heneveld added a comment - See WHIRR-427 for failing CDH integration tests.
        Hide
        Alex Heneveld added a comment - - edited

        This attachment whirr-415-env implements the env.someVar syntax above, with unit test, and with a corresponding integraton test for cdh3u1 (although I am having trouble getting any of the CDH integration tests to pass so not sure if this integration test works, although I have confirmed it is installing the correct 3u1 version).

        Show
        Alex Heneveld added a comment - - edited This attachment whirr-415-env implements the env.someVar syntax above, with unit test, and with a corresponding integraton test for cdh3u1 (although I am having trouble getting any of the CDH integration tests to pass so not sure if this integration test works, although I have confirmed it is installing the correct 3u1 version).
        Hide
        Alex Heneveld added a comment -

        Personally I don't like overloading tarball.url in this way.

        What if we had a more general mechanism for allowing environment variables to be specified in the config file?

        I am thinking something like:

        env.someVar=value
        

        Then SOME_VAR=value is exported and available to the scripts, so we could say env.repo=cdh3u1 to solve this problem.

        Interested to know people's thoughts, if this is a good idea, if env.SOME_VAR syntax is more intuitive (and if this should be a separate jira).

        Show
        Alex Heneveld added a comment - Personally I don't like overloading tarball.url in this way. What if we had a more general mechanism for allowing environment variables to be specified in the config file? I am thinking something like: env.someVar=value Then SOME_VAR=value is exported and available to the scripts, so we could say env.repo=cdh3u1 to solve this problem. Interested to know people's thoughts, if this is a good idea, if env.SOME_VAR syntax is more intuitive (and if this should be a separate jira).
        Hide
        Tom White added a comment -

        The REPO variable is the right way to do it, but unfortunately it's not being passed through. The current workaround would be to copy install_cdh_hadoop.sh and edit it (see http://whirr.apache.org/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts).

        Here's an untested patch that should allow you to specify the repo via the "tarball" property:

        whirr.hadoop.tarball.url=cdh3u1
        

        However this overloads the "tarball" property, so we might want to come up with a better way.

        Show
        Tom White added a comment - The REPO variable is the right way to do it, but unfortunately it's not being passed through. The current workaround would be to copy install_cdh_hadoop.sh and edit it (see http://whirr.apache.org/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts ). Here's an untested patch that should allow you to specify the repo via the "tarball" property: whirr.hadoop.tarball.url=cdh3u1 However this overloads the "tarball" property, so we might want to come up with a better way.
        Hide
        Paul Baclace added a comment - - edited

        setting:

        export REPO=cdh3u1

        and then running whirr 0.6.0 was insufficient, at least for Debian flavor. This must be on the right track, however.

        Show
        Paul Baclace added a comment - - edited setting: export REPO=cdh3u1 and then running whirr 0.6.0 was insufficient, at least for Debian flavor. This must be on the right track, however.
        Hide
        Alex Heneveld added a comment - - edited

        From the code it looks like the functions/*_cdh_*.sh pay attention to the environment variable REPO which defaults to cdh3 (version 3, symlinked to 3u2; other versions listed e.g. at http://archive.cloudera.com/redhat/cdh/).

        Does it work to, say force 3u1, if you export REPO=cdh3u1 before whirring?

        Show
        Alex Heneveld added a comment - - edited From the code it looks like the functions/*_cdh_*.sh pay attention to the environment variable REPO which defaults to cdh3 (version 3 , symlinked to 3u2 ; other versions listed e.g. at http://archive.cloudera.com/redhat/cdh/ ). Does it work to, say force 3u1, if you export REPO=cdh3u1 before whirring?
        Hide
        Paul Baclace added a comment -

        This is essential for automation reasons. Whirr cannot be used reproducibly unless the launched bits are known from prior testing.

        What is needed to get this fixed?

        Show
        Paul Baclace added a comment - This is essential for automation reasons. Whirr cannot be used reproducibly unless the launched bits are known from prior testing. What is needed to get this fixed?
        Hide
        Andrei Savu added a comment -

        Tom can you point me to some relevant documentation pages?

        Show
        Andrei Savu added a comment - Tom can you point me to some relevant documentation pages?
        Hide
        Andrei Savu added a comment -

        It shouldn't be that hard to address this as part of 0.7.0.

        Show
        Andrei Savu added a comment - It shouldn't be that hard to address this as part of 0.7.0.
        Hide
        Adrian Cole added a comment -

        marking this as critical as it is very important to paul

        Show
        Adrian Cole added a comment - marking this as critical as it is very important to paul
        Hide
        Paul Baclace added a comment -

        This issue should be Priority==Critical.

        Really.

        Show
        Paul Baclace added a comment - This issue should be Priority==Critical. Really.

          People

          • Assignee:
            Alex Heneveld
            Reporter:
            Andrei Savu
          • Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development