Bigtop
  1. Bigtop
  2. BIGTOP-1368

Have Jenkins use Mesos-Jenkins plugin to create containerized build slaves

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: backlog
    • Component/s: build
    • Labels:
      None

      Description

      This ticket is about changing our Jenkins setup to do builds inside of containers so that we can auto-configure slaves and cleanup our environment as well as match containers that developers can use to build Bigtop.

      I recommend using the Jenkins Mesos plugin (https://github.com/jenkinsci/mesos-plugin) to start Jenkins slaves, orchestrate and run jobs in containers, and handle resource allocation to different slave containers that start up. Apache Mesos handles the launching of containers and limiting containers resources so that slaves won’t starve each other out for resources while the Jenkins Mesos plugin sits on top of Mesos and starts the slave.jar inside of the container which will then run the Jenkins building the BigTop package you want or doing whatever tests you need in the isolated environment. It would also not require much changing of our current Jenkins setup to get it integrated and build steps for most projects would be unchanged. You would just install mesos-master somewhere and then install mesos-slave on all of the servers that currently do job builds, and Mesos will handle using them all and spinning up the Jenkins slaves.

      While the above allows you to do BigTop builds in containers using Jenkins, we should also do some more things to make life easier. We can use the Jenkins Mesos plugin to run Docker in Docker (dind) so that inside of our Jenkins slave containers we can launch a Docker container inside of that to build Docker images for deployment. For example, you could have a Jenkins job that is kicked off inside of a container that then starts and builds a base Docker image that will later build a specific package on another job. This would allow our Jenkins setup to build and test our Puppet toolchain code and then produce images that would be used by other jobs for builds. This is documented and talked a lot about here: http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/#.U7R4t3VdVhF

      As for matching builds on peoples computers, that gets pretty easy with Docker. You can just take the Jenkins Slave docker image and then run jobs against it without starting the slave.jar. So instead of docker run supervisord which you would do to get the Jenkins slave running, you can do docker run make hive-deb or whatever package you wanted. This would make peoples environments match our Jenkins environment exactly.

      I have an experimental git repo with some Dockerfiles that are a pretty good prototype of this future setup located at https://github.com/jeid64/bigtop-dockerfiles jenkins-docker/ has an example Ubuntu 12.04 Dockerfile that once built uses the Puppet manifests in bigtop_toolchain to setup the build environment. When run by Jenkins, Jenkins will start a slave inside of the container and the slave would handle all the build steps for that build. That image can also just be used on your desktop without needing a Jenkins master or using anything from Jenkins.

      Also in that repo is jenkins-dind/ which is based off of https://github.com/ahunnargikar/jenkins-dind multi-docker setup. This will be the image for dind image builds in Jenkins for build environments.

      I would love to head up this ticket and work on getting everything setup.

        Issue Links

          Activity

          Hide
          Roman Shaposhnik added a comment -

          In genera, this sounds like a super-useful functionality to have for private datacenter Bigtop-based build management. Would love to review your work.

          My question, though is how does it relate to our EC2-based builds? My initial idea with BIGTOP-1323 was to rely on EC2 for actual management of compute fabric and have as many fungible slaves with 0 configuration (CoreOS) as we'd want. The actual environment is then capture by the Docker containers that can be instantiated on base CoreOS at will.

          Can you please elaborate on how the Mesos part would help us on EC2?

          Show
          Roman Shaposhnik added a comment - In genera, this sounds like a super-useful functionality to have for private datacenter Bigtop-based build management. Would love to review your work. My question, though is how does it relate to our EC2-based builds? My initial idea with BIGTOP-1323 was to rely on EC2 for actual management of compute fabric and have as many fungible slaves with 0 configuration (CoreOS) as we'd want. The actual environment is then capture by the Docker containers that can be instantiated on base CoreOS at will. Can you please elaborate on how the Mesos part would help us on EC2?
          Hide
          Julien Eid added a comment -

          Roman,

          It is both super-useful functionality to have for a private datacenter as well as for public builds of Bigtop. I intend on replicating a lot of what I've done here to setup the new Bigtop Jenkins which I'm proposing here.

          Thanks for asking this question, I had a feeling I skipped talking about that. I have much the very same idea as you have, just with a different implementation. We would still use EC2 for management of spinning up and down EC2 instances and use it for management of hosting slaves and masters. Just like in your idea, you would have as many disposable slaves with almost no configuration as you want.

          The way it would work is you would spin up EC2 instances to be slave hosts with a very minimal install of CentOS or Ubuntu or something, the distro for the slaves doesn't matter. Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master. Once build jobs were put in, Jenkins would talk to Mesos to spin up a container on one of the slaves and the mesos-slave instance would use Docker to start the Jenkins slave process inside of a new container to start the build. Mesos would handle spinning up the containers using Docker on slaves with available resources and efficiently distribute jobs across all of our available instances. Once the build is done, the Jenkins slave exits and reports build status to Jenkins, then Mesos handles cleaning up the container nicely, either archiving it or deleting it if you want. The Mesos EC2 interaction is just having something that when the Jenkins build queue is too full spins up more EC2 instances so that you can host more slave containers. This can be done with a script or by hand pretty easily.

          It is very much the same concept that you talked about with a few key differences that can save us a lot of work and have a lot more stability.
          1. There is already Jenkins and Mesos integration written to have Jenkins ask Mesos to spin up containers to do builds in containers. The Mesos Jenkins plugin has already also been used by other people to do Docker image builds inside of those containers so you can have Jenkins manage your Docker build images. This saves us a ton of time and work making our own plugin when one already exists and is used in production by a bunch of people. From what I've seen, there is not much work done to have a Jenkins plugin to start build slaves on CoreOS or do builds inside of CoreOS as well as having some trouble finding anything about doing dind on CoreOS. We would have to write our own plugin to have Jenkins talk to CoreOS while if we use Mesos we just be lazy and do the least amount of work to get to our goals. Mesos in my implementation replaces CoreOS to handle launching containers and should do everything we need in conjunction with the Jenkins plugin.

          2. CoreOS is in my opinion not production ready due to many of the components inside of CoreOS being very bleeding edge at the moment as well as CoreOS itself being in beta right now. http://coreos.com/blog/coreos-beta-release/ has the details about it being in beta. Etcd (https://github.com/coreos/etcd/blob/master/Documentation/production-ready.md) one of the core components in CoreOS is considered bleeding edge. Mesos and the Jenkins Mesos plugin have been around for a while and containers in Mesos has been supported for a long time and recently improved to be even better in the latest stable released.

          I would love to setup and show an example cluster and Jenkins setup. I can spin up a prototype Jenkins master with a replica of Bigtops current Jenkins jobs on a few Docker images for different platforms to show how well it performs and how it works and if it is suited for Bigtops needs. I just need someone to export me configs for build projects from the current Jenkins master so I can recreate an accurate Jenkins setup. I think someone in Cloudera could give me the configs and I could work on setting up the prototype for Bigtop if there is community interest in going with this approach.

          Feel free to ask more questions, I'm happy to give a ton of information about this!

          Show
          Julien Eid added a comment - Roman, It is both super-useful functionality to have for a private datacenter as well as for public builds of Bigtop. I intend on replicating a lot of what I've done here to setup the new Bigtop Jenkins which I'm proposing here. Thanks for asking this question, I had a feeling I skipped talking about that. I have much the very same idea as you have, just with a different implementation. We would still use EC2 for management of spinning up and down EC2 instances and use it for management of hosting slaves and masters. Just like in your idea, you would have as many disposable slaves with almost no configuration as you want. The way it would work is you would spin up EC2 instances to be slave hosts with a very minimal install of CentOS or Ubuntu or something, the distro for the slaves doesn't matter. Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master. Once build jobs were put in, Jenkins would talk to Mesos to spin up a container on one of the slaves and the mesos-slave instance would use Docker to start the Jenkins slave process inside of a new container to start the build. Mesos would handle spinning up the containers using Docker on slaves with available resources and efficiently distribute jobs across all of our available instances. Once the build is done, the Jenkins slave exits and reports build status to Jenkins, then Mesos handles cleaning up the container nicely, either archiving it or deleting it if you want. The Mesos EC2 interaction is just having something that when the Jenkins build queue is too full spins up more EC2 instances so that you can host more slave containers. This can be done with a script or by hand pretty easily. It is very much the same concept that you talked about with a few key differences that can save us a lot of work and have a lot more stability. 1. There is already Jenkins and Mesos integration written to have Jenkins ask Mesos to spin up containers to do builds in containers. The Mesos Jenkins plugin has already also been used by other people to do Docker image builds inside of those containers so you can have Jenkins manage your Docker build images. This saves us a ton of time and work making our own plugin when one already exists and is used in production by a bunch of people. From what I've seen, there is not much work done to have a Jenkins plugin to start build slaves on CoreOS or do builds inside of CoreOS as well as having some trouble finding anything about doing dind on CoreOS. We would have to write our own plugin to have Jenkins talk to CoreOS while if we use Mesos we just be lazy and do the least amount of work to get to our goals. Mesos in my implementation replaces CoreOS to handle launching containers and should do everything we need in conjunction with the Jenkins plugin. 2. CoreOS is in my opinion not production ready due to many of the components inside of CoreOS being very bleeding edge at the moment as well as CoreOS itself being in beta right now. http://coreos.com/blog/coreos-beta-release/ has the details about it being in beta. Etcd ( https://github.com/coreos/etcd/blob/master/Documentation/production-ready.md ) one of the core components in CoreOS is considered bleeding edge. Mesos and the Jenkins Mesos plugin have been around for a while and containers in Mesos has been supported for a long time and recently improved to be even better in the latest stable released. I would love to setup and show an example cluster and Jenkins setup. I can spin up a prototype Jenkins master with a replica of Bigtops current Jenkins jobs on a few Docker images for different platforms to show how well it performs and how it works and if it is suited for Bigtops needs. I just need someone to export me configs for build projects from the current Jenkins master so I can recreate an accurate Jenkins setup. I think someone in Cloudera could give me the configs and I could work on setting up the prototype for Bigtop if there is community interest in going with this approach. Feel free to ask more questions, I'm happy to give a ton of information about this!
          Hide
          Roman Shaposhnik added a comment -

          Given that we're talking relative merits of Jenkins plugins, I'm sure Andrew Bayer will help set us straight With that, here's a few comments:

          CoreOS is in my opinion not production

          The only reason I'm using it is because it is convenient – anything that gets me Docker 1.0 would do. IOW, you can replace CoreOS with whatever distro you want – Jenkins EC2 plugin will spin it up just the same.

          The way it would work is you would spin up EC2 instances to be slave hosts with a very minimal install of CentOS or Ubuntu or something, the distro for the slaves doesn't matter. Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master.

          This is the bit I'm not following – once Jenkins spins up EC2 slaves we're done. We can simply run $ docker ... builds on them. Why an extra layer of Mesos?

          From what I've seen, there is not much work done to have a Jenkins plugin to start build slaves on CoreOS

          We're currently using Jenkins EC2 plugin which is very much maintained and hasn't failed us much.

          Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master.

          And that is the bit that worries me: who will be maintaing Mesos-master and making sure it all works? This is an extra bit of work that I'm trying to avoid by relying on EC2 plugin.

          Show
          Roman Shaposhnik added a comment - Given that we're talking relative merits of Jenkins plugins, I'm sure Andrew Bayer will help set us straight With that, here's a few comments: CoreOS is in my opinion not production The only reason I'm using it is because it is convenient – anything that gets me Docker 1.0 would do. IOW, you can replace CoreOS with whatever distro you want – Jenkins EC2 plugin will spin it up just the same. The way it would work is you would spin up EC2 instances to be slave hosts with a very minimal install of CentOS or Ubuntu or something, the distro for the slaves doesn't matter. Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master. This is the bit I'm not following – once Jenkins spins up EC2 slaves we're done. We can simply run $ docker ... builds on them. Why an extra layer of Mesos? From what I've seen, there is not much work done to have a Jenkins plugin to start build slaves on CoreOS We're currently using Jenkins EC2 plugin which is very much maintained and hasn't failed us much. Mesos-slave would be installed on that EC2 instance that was spun up and it would connect to the Mesos-master. And that is the bit that worries me: who will be maintaing Mesos-master and making sure it all works? This is an extra bit of work that I'm trying to avoid by relying on EC2 plugin.

            People

            • Assignee:
              Julien Eid
              Reporter:
              Julien Eid
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development