Allow explicit selection of the CDH release you want to use. The current behaviour is to use the latest one.
Will the same thing work for cdh4 for specifying different releases?
Committed. Thanks Alex!
Had the chance to review patch, looks good, and integration tests all passing for me on AWS. Hurrah!
Patch looks sensible, might not have the chance to test it for a few days.
Combining version test with regular test is simpler, although my thinking was that it was better to have integration test which tests a specific version which is testably distinct from what you get when you request the latest.
Although that said, I was (with all the WHIRR-427 errors) wondering whether we might hit version errors at the pom level – locally we are locked to one version for the project, so could be problematic testing multiple versions on the other end if they aren't compatible with the local one.
So fine either way.
Also working fine with cloudservers. +1 from me. We can address any comments from Tom regarding adding the ability to set arbitrary environment variables as part of 0.8.0.
PS: integration tests working as expected on aws-ec2 now testing on cloudservers.
Alex I have made the following changes to your patch:
Let me know what you think.
Now working on this. I'm trying also to find the root cause for WHIRR-427.
Looks good. Why do we need a new test class CdhHadoopServiceVersionTest? I think we can add just a new method to CdhHadoopServiceTest and make setUp @BeforeClass and tearDown @AfterClass. What do you think?
Tom you've raised some concerns about adding the ability to setup arbitrary environment variables. Can you elaborate?
revised patch uses whirr.env.XXX syntax, and tests an explicit version separately with explicit CDH versions in pom ... though integration test ultimately still fails due to WHIRR-427
I think it makes sense to have the ability to add arbitrary environment variables in the .properties file (e.g. whirr.env.REPO=chd3u1) but I would add this in another JIRA. For this one I would go for something like whirr.cdh.repo=cdh3u1.
Guys what do you think if we move this to 0.8.0 (next release) and only implement env vars because that feature can be used as a workaround so that Paul can do his work as needed? Tom why do you worry about setting arbitrary env variables?
A whirr property would be nicer, I quite agree – something in me just objects to three new roles+ClusterActionHandler classes for CDH flavours of hadoop/hbase/zookeep, just to have the property called something nicer. WDYT?
The tarball.url overload was a hack to get ideas on how to do this better! Alex, your patch is much better.
However, the decision to use environment variables vs. properties is an implementation detail, which I'm not sure is something the user cares about. (I also worry about the ability to set arbitrary environment variables.) E.g. in this case the setting should be "whirr.hadoop.repo=cdh3u1" or similar. Perhaps we can just have another property for that?
See WHIRR-427 for failing CDH integration tests.
This attachment whirr-415-env implements the env.someVar syntax above, with unit test, and with a corresponding integraton test for cdh3u1 (although I am having trouble getting any of the CDH integration tests to pass so not sure if this integration test works, although I have confirmed it is installing the correct 3u1 version).
Personally I don't like overloading tarball.url in this way.
What if we had a more general mechanism for allowing environment variables to be specified in the config file?
I am thinking something like:
Then SOME_VAR=value is exported and available to the scripts, so we could say env.repo=cdh3u1 to solve this problem.
Interested to know people's thoughts, if this is a good idea, if env.SOME_VAR syntax is more intuitive (and if this should be a separate jira).
The REPO variable is the right way to do it, but unfortunately it's not being passed through. The current workaround would be to copy install_cdh_hadoop.sh and edit it (see http://whirr.apache.org/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts).
Here's an untested patch that should allow you to specify the repo via the "tarball" property:
However this overloads the "tarball" property, so we might want to come up with a better way.
and then running whirr 0.6.0 was insufficient, at least for Debian flavor. This must be on the right track, however.
From the code it looks like the functions/*_cdh_*.sh pay attention to the environment variable REPO which defaults to cdh3 (version 3, symlinked to 3u2; other versions listed e.g. at http://archive.cloudera.com/redhat/cdh/).
Does it work to, say force 3u1, if you export REPO=cdh3u1 before whirring?
This is essential for automation reasons. Whirr cannot be used reproducibly unless the launched bits are known from prior testing.
What is needed to get this fixed?
Tom can you point me to some relevant documentation pages?
It shouldn't be that hard to address this as part of 0.7.0.
marking this as critical as it is very important to paul
This issue should be Priority==Critical.