Whirr
  1. Whirr
  2. WHIRR-667

Add whirr support for HDP-1 installation

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: new service
    • Labels:
      None
    • Environment:

      RHEL and CentOS

      Description

      Add the extension of service/hadoop that installs HDP-1

        Activity

        Hide
        Tom White added a comment -

        > Is the plan to keep out all the downstream vendor stuff and replace w/ bigtop

        Yes. I opened WHIRR-676 for this.

        Show
        Tom White added a comment - > Is the plan to keep out all the downstream vendor stuff and replace w/ bigtop Yes. I opened WHIRR-676 for this.
        Hide
        Steve Loughran added a comment -

        where are we with this? Is the plan to keep out all the downstream vendor stuff and replace w/ bigtop, or should I see what I can do w.r.t adding some more tests here for it to be pulled in?

        -steve

        Show
        Steve Loughran added a comment - where are we with this? Is the plan to keep out all the downstream vendor stuff and replace w/ bigtop, or should I see what I can do w.r.t adding some more tests here for it to be pulled in? -steve
        Hide
        Steve Loughran added a comment -

        That's an interesting point -and as one of the people in that QFS discussion, I clearly have opinions.

        In favour of pulling out:

        • lets people have a faster release cycle, push things out earlier.
        • decouples whirr-core from the various services, that can come in their own JARs
        • puts the homework of keeping the scripts up to date onto the owners of the source

        In favour of retention:

        • build and test process can run against all the various services in one go
        • either forces whirr core to have stability at both the java level and at the installed functions level, including for services that may be subclasses (e.g. hadoop itself)
        • makes it harder to synchronize fixes across the different components. E.g. if a bug is found in the hdp functions, it probably exists in the cdh package.
        • no artifacts are actually being redistributed, just the code to get them out and then tell them to start themselves.

        My (current) view is that those services that directly subclass other services -such as the specific hadoop installers, hbase &c, ought to go into whirr so they can stay in sync with the parent classes.

        At the same time, you'd have to look hard at other services and say "you are more loosely coupled, why don't you stay with your project?".

        As an example of this, I also have on github the code to deploy the Ambari management tooling onto a whirr-created cluster; it can then take on the work of provisioning and managing the worker nodes [ https://github.com/steveloughran/whirr/tree/ambari ]. This is currently in the same service as the hdp1 patch above, but could trivially be moved to its own service tree, where it could be built and tested by that team. It's go not dependencies other than on whirr-core, and I do think it would be better off staying outside the whirr codebase -as it is arguably more tightly coupled to the ambari project than to whirr.

        w.r.t Bigtop, yes, its RPMs & DEBs should go in. What could be done (here comes another JIRA I feel) is tweak the Hadoop component setup with a few more override points to make this easier. Example: allow for a "setup the RPM repo" script that is independent of what you are installing; some tests that are designed to work against different hadoop installations.

        Maybe if we can get a bigtop-rpm/deb installer that works well, the CDH & HDP installers would be extensions of that -and it would become easier to keep them out the core.

        Show
        Steve Loughran added a comment - That's an interesting point -and as one of the people in that QFS discussion, I clearly have opinions. In favour of pulling out: lets people have a faster release cycle, push things out earlier. decouples whirr-core from the various services, that can come in their own JARs puts the homework of keeping the scripts up to date onto the owners of the source In favour of retention: build and test process can run against all the various services in one go either forces whirr core to have stability at both the java level and at the installed functions level, including for services that may be subclasses (e.g. hadoop itself) makes it harder to synchronize fixes across the different components. E.g. if a bug is found in the hdp functions, it probably exists in the cdh package. no artifacts are actually being redistributed, just the code to get them out and then tell them to start themselves. My (current) view is that those services that directly subclass other services -such as the specific hadoop installers, hbase &c, ought to go into whirr so they can stay in sync with the parent classes. At the same time, you'd have to look hard at other services and say "you are more loosely coupled, why don't you stay with your project?". As an example of this, I also have on github the code to deploy the Ambari management tooling onto a whirr-created cluster; it can then take on the work of provisioning and managing the worker nodes [ https://github.com/steveloughran/whirr/tree/ambari ]. This is currently in the same service as the hdp1 patch above, but could trivially be moved to its own service tree, where it could be built and tested by that team. It's go not dependencies other than on whirr-core, and I do think it would be better off staying outside the whirr codebase -as it is arguably more tightly coupled to the ambari project than to whirr. w.r.t Bigtop, yes, its RPMs & DEBs should go in. What could be done (here comes another JIRA I feel) is tweak the Hadoop component setup with a few more override points to make this easier. Example: allow for a "setup the RPM repo" script that is independent of what you are installing; some tests that are designed to work against different hadoop installations. Maybe if we can get a bigtop-rpm/deb installer that works well, the CDH & HDP installers would be extensions of that -and it would become easier to keep them out the core.
        Hide
        Tom White added a comment -

        I've been thinking that we should remove vendor-specific services (i.e. CDH) since these services should be tested and packaged with the external distro. Now that we have Bigtop I think we should change the CDH service to be a Bigtop service. (I started this in WHIRR-443, but that effort also included using the Bigtop Puppet scripts - that is a bigger piece of work, and here I'm just talking about using the Bigtop RPMs/debs.)

        This is similar to the discussion that's going on in Hadoop at the moment about hosting Hadoop filesystem implementations for external filesystems like KFS/QFS with the external codebase.

        Cheers,
        Tom

        Show
        Tom White added a comment - I've been thinking that we should remove vendor-specific services (i.e. CDH) since these services should be tested and packaged with the external distro. Now that we have Bigtop I think we should change the CDH service to be a Bigtop service. (I started this in WHIRR-443 , but that effort also included using the Bigtop Puppet scripts - that is a bigger piece of work, and here I'm just talking about using the Bigtop RPMs/debs.) This is similar to the discussion that's going on in Hadoop at the moment about hosting Hadoop filesystem implementations for external filesystems like KFS/QFS with the external codebase. Cheers, Tom
        Hide
        Steve Loughran added a comment -

        The source to bring up hdfs and mapreduce lives on github:
        https://github.com/steveloughran/whirr/tree/hdp1

        it's been verified on BYON and EC2.

        That specific branch already incorporates WHIRR-661 and WHIRR-665, which, while not explicit dependencies, would simplify the merge.

        Show
        Steve Loughran added a comment - The source to bring up hdfs and mapreduce lives on github: https://github.com/steveloughran/whirr/tree/hdp1 it's been verified on BYON and EC2. That specific branch already incorporates WHIRR-661 and WHIRR-665 , which, while not explicit dependencies, would simplify the merge.

          People

          • Assignee:
            Steve Loughran
            Reporter:
            Steve Loughran
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development