Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 0.9.0
    • None
    • None

    Description

      The CDH services predate Bigtop, but now that it's available we should change them to use Bigtop artifacts, and then vendor distributions can extend these as needed (and should be hosted outside Apache).

      Attachments

        Activity

          stevel@apache.org Steve Loughran added a comment -

          Here are some things I'd like to see from a new service, one which I could subclass in Java to add some specific features

          1. ability for service to specify preferred OS family for installation (deb, yum), maybe some other params.
          2. a specific preflight script that does pre-deployment checks of system (network state &c)
          3. explicit register_repositories script
          4. explicit checks for NN in the HDFS layer, JT in the MR layer, rather than just stack traces on the lookups
          5. make it possible to select a different set of default properties from "whirr-hadoop-default.properties";

          What I'd really, really like is to be able to chain together two or more in-JAR .properties files, so that the default hadoop one was read in, then the installation specific one. This would let the vendor one provide a different set of default action (repos, scripts, default config options) that don't need to be set up correctly by everyone in their own local .properties file in order to use that installation.

          The current two-level installer forces everyone to override the scripts in their target properties file, and keep in sync with any other cluster options.

          stevel@apache.org Steve Loughran added a comment - Here are some things I'd like to see from a new service, one which I could subclass in Java to add some specific features ability for service to specify preferred OS family for installation (deb, yum), maybe some other params. a specific preflight script that does pre-deployment checks of system (network state &c) explicit register_repositories script explicit checks for NN in the HDFS layer, JT in the MR layer, rather than just stack traces on the lookups make it possible to select a different set of default properties from "whirr-hadoop-default.properties"; What I'd really, really like is to be able to chain together two or more in-JAR .properties files, so that the default hadoop one was read in, then the installation specific one. This would let the vendor one provide a different set of default action (repos, scripts, default config options) that don't need to be set up correctly by everyone in their own local .properties file in order to use that installation. The current two-level installer forces everyone to override the scripts in their target properties file, and keep in sync with any other cluster options.
          stevel@apache.org Steve Loughran added a comment -

          oh, and postflight checking that codecs went in: HADOOP-9044

          stevel@apache.org Steve Loughran added a comment - oh, and postflight checking that codecs went in: HADOOP-9044

          Steve, I've been thinking about exactly the same thing – thanks for filing a JIRA!

          In the ideal world, I'd love to combine the configuration management functionality that Bigtop's puppet code provides within each individual node with an orchestration capabilities (some of which you've mentioned) of Whirr. I filed WHIRR-681 to address this.

          Once the WHIRR-681 gets fixed, the question then becomes whether using the Whirr's puppet service working off of the Bigtop's puppet code would be the best way to accomplish the task of deploying Bigtop in such a way that vendors then can subclass it and arrive at [TheirDistro]Service.

          What do you think?

          rvs Roman Shaposhnik added a comment - Steve, I've been thinking about exactly the same thing – thanks for filing a JIRA! In the ideal world, I'd love to combine the configuration management functionality that Bigtop's puppet code provides within each individual node with an orchestration capabilities (some of which you've mentioned) of Whirr. I filed WHIRR-681 to address this. Once the WHIRR-681 gets fixed, the question then becomes whether using the Whirr's puppet service working off of the Bigtop's puppet code would be the best way to accomplish the task of deploying Bigtop in such a way that vendors then can subclass it and arrive at [TheirDistro] Service. What do you think?
          stevel@apache.org Steve Loughran added a comment -

          -1 to any requirement on puppet. To brittle in my experience -and hard to install itself

          stevel@apache.org Steve Loughran added a comment - -1 to any requirement on puppet. To brittle in my experience -and hard to install itself
          tomwhite Thomas White added a comment -

          Steve, I did a prototype to use Puppet to install Whirr services and it worked very well (WHIRR-516). It is neither brittle nor hard to install Puppet. Can you give a technical justification of your -1 please?

          tomwhite Thomas White added a comment - Steve, I did a prototype to use Puppet to install Whirr services and it worked very well ( WHIRR-516 ). It is neither brittle nor hard to install Puppet. Can you give a technical justification of your -1 please?
          stevel@apache.org Steve Loughran added a comment -

          I'm happy for there to a puppet based option, but I'm against it replacing any RPM-driven installer.

          Puppet contains a lot of assumptions about the network that don't hold in a lot of VM environments, and gets confused easily. RPM installs don't have that issue

          stevel@apache.org Steve Loughran added a comment - I'm happy for there to a puppet based option, but I'm against it replacing any RPM-driven installer. Puppet contains a lot of assumptions about the network that don't hold in a lot of VM environments, and gets confused easily. RPM installs don't have that issue

          Steve, you seem to be confusing something here – Puppet still uses RPM/DEB packages to do the installation. Also, you seem to be confusing a classical puppet server model, with master-less puppet style that Whirr actually uses.

          Now, with that – I'm not even sure how to interpret your "I'm against it replacing any RPM-driven installer". What RPM-driven installer are you talking about?

          rvs Roman Shaposhnik added a comment - Steve, you seem to be confusing something here – Puppet still uses RPM/DEB packages to do the installation. Also, you seem to be confusing a classical puppet server model, with master-less puppet style that Whirr actually uses. Now, with that – I'm not even sure how to interpret your "I'm against it replacing any RPM-driven installer". What RPM-driven installer are you talking about?
          stevel@apache.org Steve Loughran added a comment -

          OK, I'll clarify

          • bad experiences with distributed puppet in a VM world, as many of its assumptions about the network are often wrong
          • some of things I've tried to install are puppet-related, and it's own set of dependencies are pretty tricky to get right -need to get the relevant EPEL repos up, otherwise the wrong versions of things get in.

          The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks:

          https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/install_hdp_hadoop.sh
          https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/configure_hdp_hadoop.sh

          That code would benefit more from detection of non-zero exit codes than puppet

          stevel@apache.org Steve Loughran added a comment - OK, I'll clarify bad experiences with distributed puppet in a VM world, as many of its assumptions about the network are often wrong some of things I've tried to install are puppet-related, and it's own set of dependencies are pretty tricky to get right -need to get the relevant EPEL repos up, otherwise the wrong versions of things get in. The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks: https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/install_hdp_hadoop.sh https://github.com/steveloughran/whirr/blob/hdp1/services/hdp/src/main/resources/functions/configure_hdp_hadoop.sh That code would benefit more from detection of non-zero exit codes than puppet
          bmahe Bruno Mahé added a comment -

          I have been watching this thread and I am still wondering how are these issues blockers to the usage of puppet?
          Whether one would use puppet or not, most of these issues would still be there (ie. hostname misconfigured, creating hdfs users). And using puppet does not prevent any solution to be implemented for these issues (I would argue it is simpler and more maintainable to solve them with puppet than bash, but that's outside the scope of this ticket).

          That said, I have deployed many times Apache Bigtop clusters over several cloud providers with Apache Bigtop puppet's recipes and never had any complain on the puppet side. It just works.

          bmahe Bruno Mahé added a comment - I have been watching this thread and I am still wondering how are these issues blockers to the usage of puppet? Whether one would use puppet or not, most of these issues would still be there (ie. hostname misconfigured, creating hdfs users). And using puppet does not prevent any solution to be implemented for these issues (I would argue it is simpler and more maintainable to solve them with puppet than bash, but that's outside the scope of this ticket). That said, I have deployed many times Apache Bigtop clusters over several cloud providers with Apache Bigtop puppet's recipes and never had any complain on the puppet side. It just works.

          The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks:

          I would argue that missing symlinks suggest about the quality of the installed package. The final state of a page should be "READY TO USE" not "DON'T FORGET TO CREATE THESE SYMLINKS"

          cos Konstantin I Boudnik added a comment - The existing "issue yum -y install commands" strategy is fairly simplistic, and you have to do some extra actions -mainly create some users and symlinks: I would argue that missing symlinks suggest about the quality of the installed package. The final state of a page should be "READY TO USE" not "DON'T FORGET TO CREATE THESE SYMLINKS"
          stevel@apache.org Steve Loughran added a comment -

          "I would argue that missing symlinks suggest about the quality of the installed package"

          Good point. I think the only symlink I set up is putting snappy into lib/native/Linux-AMD64-whatever. The purest way would be to have a hadoop-snappy-symlink RPM that contained nothing but the link; that would make the install fully declarative.

          stevel@apache.org Steve Loughran added a comment - "I would argue that missing symlinks suggest about the quality of the installed package" Good point. I think the only symlink I set up is putting snappy into lib/native/Linux-AMD64-whatever. The purest way would be to have a hadoop-snappy-symlink RPM that contained nothing but the link; that would make the install fully declarative.

          People

            Unassigned Unassigned
            tomwhite Thomas White

            Dates

              Created:
              Updated:

              Slack

                Issue deployment