[WHIRR-676] Convert CDH services to use Bigtop artifacts - ASF JIRA

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 0.9.0
Component/s: None
Labels:
None

Description

The CDH services predate Bigtop, but now that it's available we should change them to use Bigtop artifacts, and then vendor distributions can extend these as needed (and should be hosted outside Apache).

Attachments

Activity

Ascending order - Click to sort in descending order

Steve Loughran added a comment - 15/Nov/12 10:47

Here are some things I'd like to see from a new service, one which I could subclass in Java to add some specific features

ability for service to specify preferred OS family for installation (deb, yum), maybe some other params.
a specific preflight script that does pre-deployment checks of system (network state &c)
explicit register_repositories script
explicit checks for NN in the HDFS layer, JT in the MR layer, rather than just stack traces on the lookups
make it possible to select a different set of default properties from "whirr-hadoop-default.properties";

What I'd really, really like is to be able to chain together two or more in-JAR .properties files, so that the default hadoop one was read in, then the installation specific one. This would let the vendor one provide a different set of default action (repos, scripts, default config options) that don't need to be set up correctly by everyone in their own local .properties file in order to use that installation.

The current two-level installer forces everyone to override the scripts in their target properties file, and keep in sync with any other cluster options.

Steve Loughran added a comment - 15/Nov/12 10:47 Here are some things I'd like to see from a new service, one which I could subclass in Java to add some specific features ability for service to specify preferred OS family for installation (deb, yum), maybe some other params. a specific preflight script that does pre-deployment checks of system (network state &c) explicit register_repositories script explicit checks for NN in the HDFS layer, JT in the MR layer, rather than just stack traces on the lookups make it possible to select a different set of default properties from "whirr-hadoop-default.properties"; What I'd really, really like is to be able to chain together two or more in-JAR .properties files, so that the default hadoop one was read in, then the installation specific one. This would let the vendor one provide a different set of default action (repos, scripts, default config options) that don't need to be set up correctly by everyone in their own local .properties file in order to use that installation. The current two-level installer forces everyone to override the scripts in their target properties file, and keep in sync with any other cluster options.

Steve Loughran added a comment - 15/Nov/12 12:07

oh, and postflight checking that codecs went in: ~~HADOOP-9044~~

Steve Loughran added a comment - 15/Nov/12 12:07 oh, and postflight checking that codecs went in: HADOOP-9044

Roman Shaposhnik added a comment - 16/Nov/12 02:26

Steve, I've been thinking about exactly the same thing – thanks for filing a JIRA!

In the ideal world, I'd love to combine the configuration management functionality that Bigtop's puppet code provides within each individual node with an orchestration capabilities (some of which you've mentioned) of Whirr. I filed ~~WHIRR-681~~ to address this.

Once the ~~WHIRR-681~~ gets fixed, the question then becomes whether using the Whirr's puppet service working off of the Bigtop's puppet code would be the best way to accomplish the task of deploying Bigtop in such a way that vendors then can subclass it and arrive at [TheirDistro]Service.

What do you think?

Roman Shaposhnik added a comment - 16/Nov/12 02:26 Steve, I've been thinking about exactly the same thing – thanks for filing a JIRA! In the ideal world, I'd love to combine the configuration management functionality that Bigtop's puppet code provides within each individual node with an orchestration capabilities (some of which you've mentioned) of Whirr. I filed WHIRR-681 to address this. Once the WHIRR-681 gets fixed, the question then becomes whether using the Whirr's puppet service working off of the Bigtop's puppet code would be the best way to accomplish the task of deploying Bigtop in such a way that vendors then can subclass it and arrive at [TheirDistro] Service. What do you think?

Steve Loughran added a comment - 16/Nov/12 15:38

-1 to any requirement on puppet. To brittle in my experience -and hard to install itself

Steve Loughran added a comment - 16/Nov/12 15:38 -1 to any requirement on puppet. To brittle in my experience -and hard to install itself

Thomas White added a comment - 16/Nov/12 15:54

Steve, I did a prototype to use Puppet to install Whirr services and it worked very well (WHIRR-516). It is neither brittle nor hard to install Puppet. Can you give a technical justification of your -1 please?

Thomas White added a comment - 16/Nov/12 15:54 Steve, I did a prototype to use Puppet to install Whirr services and it worked very well ( WHIRR-516 ). It is neither brittle nor hard to install Puppet. Can you give a technical justification of your -1 please?

Steve Loughran added a comment - 16/Nov/12 16:02

I'm happy for there to a puppet based option, but I'm against it replacing any RPM-driven installer.

Puppet contains a lot of assumptions about the network that don't hold in a lot of VM environments, and gets confused easily. RPM installs don't have that issue

Steve Loughran added a comment - 16/Nov/12 16:02 I'm happy for there to a puppet based option, but I'm against it replacing any RPM-driven installer. Puppet contains a lot of assumptions about the network that don't hold in a lot of VM environments, and gets confused easily. RPM installs don't have that issue

Roman Shaposhnik added a comment - 17/Nov/12 00:49

Steve, you seem to be confusing something here – Puppet still uses RPM/DEB packages to do the installation. Also, you seem to be confusing a classical puppet server model, with master-less puppet style that Whirr actually uses.

Now, with that – I'm not even sure how to interpret your "I'm against it replacing any RPM-driven installer". What RPM-driven installer are you talking about?

Roman Shaposhnik added a comment - 17/Nov/12 00:49 Steve, you seem to be confusing something here – Puppet still uses RPM/DEB packages to do the installation. Also, you seem to be confusing a classical puppet server model, with master-less puppet style that Whirr actually uses. Now, with that – I'm not even sure how to interpret your "I'm against it replacing any RPM-driven installer". What RPM-driven installer are you talking about?

Steve Loughran added a comment - 17/Nov/12 11:39

OK, I'll clarify

bad experiences with distributed puppet in a VM world, as many of its assumptions about the network are often wrong
some of things I've tried to install are puppet-related, and it's own set of dependencies are pretty tricky to get right -need to get the relevant EPEL repos up, otherwise the wrong versions of things get in.

The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks:

https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/install_hdp_hadoop.sh
https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/configure_hdp_hadoop.sh

That code would benefit more from detection of non-zero exit codes than puppet

Steve Loughran added a comment - 17/Nov/12 11:39 OK, I'll clarify bad experiences with distributed puppet in a VM world, as many of its assumptions about the network are often wrong some of things I've tried to install are puppet-related, and it's own set of dependencies are pretty tricky to get right -need to get the relevant EPEL repos up, otherwise the wrong versions of things get in. The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks: https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/install_hdp_hadoop.sh https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/configure_hdp_hadoop.sh That code would benefit more from detection of non-zero exit codes than puppet

Bruno Mahé added a comment - 17/Nov/12 22:07

I have been watching this thread and I am still wondering how are these issues blockers to the usage of puppet?
Whether one would use puppet or not, most of these issues would still be there (ie. hostname misconfigured, creating hdfs users). And using puppet does not prevent any solution to be implemented for these issues (I would argue it is simpler and more maintainable to solve them with puppet than bash, but that's outside the scope of this ticket).

That said, I have deployed many times Apache Bigtop clusters over several cloud providers with Apache Bigtop puppet's recipes and never had any complain on the puppet side. It just works.

Bruno Mahé added a comment - 17/Nov/12 22:07 I have been watching this thread and I am still wondering how are these issues blockers to the usage of puppet? Whether one would use puppet or not, most of these issues would still be there (ie. hostname misconfigured, creating hdfs users). And using puppet does not prevent any solution to be implemented for these issues (I would argue it is simpler and more maintainable to solve them with puppet than bash, but that's outside the scope of this ticket). That said, I have deployed many times Apache Bigtop clusters over several cloud providers with Apache Bigtop puppet's recipes and never had any complain on the puppet side. It just works.

Konstantin I Boudnik added a comment - 20/Nov/12 17:57

The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks:

I would argue that missing symlinks suggest about the quality of the installed package. The final state of a page should be "READY TO USE" not "DON'T FORGET TO CREATE THESE SYMLINKS"

Konstantin I Boudnik added a comment - 20/Nov/12 17:57 The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks: I would argue that missing symlinks suggest about the quality of the installed package. The final state of a page should be "READY TO USE" not "DON'T FORGET TO CREATE THESE SYMLINKS"

Steve Loughran added a comment - 20/Nov/12 19:10

"I would argue that missing symlinks suggest about the quality of the installed package"

Good point. I think the only symlink I set up is putting snappy into lib/native/Linux-AMD64-whatever. The purest way would be to have a hadoop-snappy-symlink RPM that contained nothing but the link; that would make the install fully declarative.

Steve Loughran added a comment - 20/Nov/12 19:10 "I would argue that missing symlinks suggest about the quality of the installed package" Good point. I think the only symlink I set up is putting snappy into lib/native/Linux-AMD64-whatever. The purest way would be to have a hadoop-snappy-symlink RPM that contained nothing but the link; that would make the install fully declarative.

People

Assignee:: Unassigned

Reporter:: Thomas White

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Oct/12 23:12

Updated:: 20/Nov/12 19:10

Agile

View on Board

Slack

Issue deployment

Apache Whirr (retired)