Hadoop Common
  1. Hadoop Common
  2. HADOOP-7278

Automatic Hadoop cluster deployment for build validation

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.22.0, 0.23.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:
    • Environment:

      Apache Jenkins

      Description

      It'd be great to have a way of automatically deploying Hadoop cluster to a set of machine once all components are successfully built (in the form or tar or whatever). Deployed cluster then can be used to run a set of validation jobs to make sure that produced artifacts are viable.

      1. HADOOP-7278.patch
        36 kB
        Konstantin Boudnik

        Activity

        Hide
        Adrian Cole added a comment -

        seems like the ideal case for whirr, right? http://incubator.apache.org/whirr/

        Show
        Adrian Cole added a comment - seems like the ideal case for whirr, right? http://incubator.apache.org/whirr/
        Hide
        Konstantin Boudnik added a comment -

        Here's the idea (and the workflow Roman and I had in mind):

        • a Jenkins job creates a set of tar-balls for all components (similar to what is done here
        • a primitive .deb package (all build machines are Ubuntu based) is created which contains all bits and pieces with required permissions, etc.) and uploaded to a shared location where all build machines can see it
        • an event is triggered once the package is ready puppet is executed on a designated set of machines to install the package into /opt/hadoop, generate configs, and start required services
        • a validation workload is started by Hudson

        This JIRA is going to have a couple of patches:

        • .deb creation script
        • puppet recipes
        Show
        Konstantin Boudnik added a comment - Here's the idea (and the workflow Roman and I had in mind): a Jenkins job creates a set of tar-balls for all components (similar to what is done here a primitive .deb package (all build machines are Ubuntu based) is created which contains all bits and pieces with required permissions, etc.) and uploaded to a shared location where all build machines can see it an event is triggered once the package is ready puppet is executed on a designated set of machines to install the package into /opt/hadoop, generate configs, and start required services a validation workload is started by Hudson This JIRA is going to have a couple of patches: .deb creation script puppet recipes
        Hide
        Konstantin Boudnik added a comment -

        Whirr might or mightn't be a better solution for this task, but certainly will be an additional learning curve

        Show
        Konstantin Boudnik added a comment - Whirr might or mightn't be a better solution for this task, but certainly will be an additional learning curve
        Hide
        Konstantin Boudnik added a comment -

        Here's some more thoughts and background information about the JIRA:

        • there's a number of machines which aren't used by the build system right now and they can be used for on-cluster testing of Hadoop night builds or releases
        • .deb packaging seems like a very nice idea because it provides certain consistency, versioning, etc. However, it might be a bit of over-kill for the simple task at hands (thanks Eli for pointing this out
        • we'd rather skip .deb creation and will do installation of Hadoop right out of the tar-balls into a shared NFS location. Standard Hadoop scripts will be used for start/stop control of the daemons.
        • for simplicity sake list of slaves and configuration files will be checked into the SVN (we might want use common or have a separate top-level directory designated for cluster management related issues)
        Show
        Konstantin Boudnik added a comment - Here's some more thoughts and background information about the JIRA: there's a number of machines which aren't used by the build system right now and they can be used for on-cluster testing of Hadoop night builds or releases .deb packaging seems like a very nice idea because it provides certain consistency, versioning, etc. However, it might be a bit of over-kill for the simple task at hands (thanks Eli for pointing this out we'd rather skip .deb creation and will do installation of Hadoop right out of the tar-balls into a shared NFS location. Standard Hadoop scripts will be used for start/stop control of the daemons. for simplicity sake list of slaves and configuration files will be checked into the SVN (we might want use common or have a separate top-level directory designated for cluster management related issues)
        Hide
        Nigel Daley added a comment -

        To be clear, a single Jenkins build slave will be running the coordination of all this, but the cluster that is stood up will be running on non-slave machines, right?

        Show
        Nigel Daley added a comment - To be clear, a single Jenkins build slave will be running the coordination of all this, but the cluster that is stood up will be running on non-slave machines, right?
        Hide
        Konstantin Boudnik added a comment -

        Yup. I think we'll be using an existing slave to drive this deployment not adding a new one.
        Also, it seems to be a right think to do to chain a number of build to trigger each other e.g.: release build (i.e. the one creating tar balls) triggers deployment, which triggers testing.

        Show
        Konstantin Boudnik added a comment - Yup. I think we'll be using an existing slave to drive this deployment not adding a new one. Also, it seems to be a right think to do to chain a number of build to trigger each other e.g.: release build (i.e. the one creating tar balls) triggers deployment, which triggers testing.
        Hide
        Konstantin Boudnik added a comment -

        The patch adds a set of simple configs sufficient to run a 0.22 cluster on a few nodes.
        Another part of the patch is a deployment script to be driven by a Hudson job.
        This setup works here.
        Once a cluster is deployed a set of workloads will be executed to check the viability of the cluster.

        This is not a part of common source tree strictly speaking. Would it be better to have a separate build_and_deploy repository at the same level of Hadoop component are checked-in into SVN? Thoughts?

        Show
        Konstantin Boudnik added a comment - The patch adds a set of simple configs sufficient to run a 0.22 cluster on a few nodes. Another part of the patch is a deployment script to be driven by a Hudson job. This setup works here . Once a cluster is deployed a set of workloads will be executed to check the viability of the cluster . This is not a part of common source tree strictly speaking. Would it be better to have a separate build_and_deploy repository at the same level of Hadoop component are checked-in into SVN? Thoughts?
        Hide
        Konstantin Boudnik added a comment -

        Patch is ready for review.

        Show
        Konstantin Boudnik added a comment - Patch is ready for review.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479155/HADOOP-7278.patch
        against trunk revision 1103971.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479155/HADOOP-7278.patch against trunk revision 1103971. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/462//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479155/HADOOP-7278.patch
        against trunk revision 1133125.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479155/HADOOP-7278.patch against trunk revision 1133125. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/599//console This message is automatically generated.
        Hide
        Konstantin Boudnik added a comment -

        So, the deployment has been working for at least one month now. Here one can see it in action:
        https://builds.apache.org//view/G-L/view/Hadoop/job/Hadoop-22-cluster-deploy/
        It also triggers on cluster tests (a downstream job from the above).

        Does anyone want to comment?

        Show
        Konstantin Boudnik added a comment - So, the deployment has been working for at least one month now. Here one can see it in action: https://builds.apache.org//view/G-L/view/Hadoop/job/Hadoop-22-cluster-deploy/ It also triggers on cluster tests (a downstream job from the above). Does anyone want to comment?
        Hide
        Robert Joseph Evans added a comment -

        Canceling the patch as it is over 8 months old now and no longer applies to trunk or 0.23. This seems a lot like what big top has been doing too, but perhaps not on a nightly basis.

        Show
        Robert Joseph Evans added a comment - Canceling the patch as it is over 8 months old now and no longer applies to trunk or 0.23. This seems a lot like what big top has been doing too, but perhaps not on a nightly basis.
        Hide
        Konstantin Boudnik added a comment -

        Seeminly not of interest to anyone. Closing.

        Show
        Konstantin Boudnik added a comment - Seeminly not of interest to anyone. Closing.

          People

          • Assignee:
            Konstantin Boudnik
            Reporter:
            Konstantin Boudnik
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development