Bigtop
  1. Bigtop
  2. BIGTOP-1072

Vagrant scripts for spinning up and "hydrating" bigtop vms

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: deployment
    • Labels:
      None

      Description

      Vagrant is a tool that spins up VMs for you and destroys them. The only real requirement it has is that a "base box" has been created before hand.

      At that point, you can install the VM using different provider hosts (kvm,virtualbox,etc...).

      The goal of vagrant is to unify VM environments for developers with production env. This is very similar to what bigtop aims at providing. Vagrant adds host/guest shared directories, static ips, and allthe other goodies that one has to configure manually, into vm provisioning in a vendor neutral fashion: Essentially giving a declarative API to VM creation.

      I would like to suggest that bigtop provides / maintains vagrant startup scripts that layer hadoop tools on top of a "base box" vm. This is slightly different than the current strategy which creates a full blown VM with hadoop on it. The vagrant approach provides a means for more developer customization of the vm artifacts being used without adding any real overhead (other than having vagrant installed and understanding the very simply vagrant recipe for creating a vm).

      Probably in the begining this could be complimentary to the boxgrinder created VMs, and over time, maybe people would migrated to using the vagrant provisioned VMs as they become more popular and use of vagrant gets more common in the community.

      1. BIGTOP-1072.2.patch
        4 kB
        jay vyas
      2. BIGTOP-1072.1.patch
        4 kB
        jay vyas

        Activity

        Hide
        Sean Mackrory added a comment -

        The pi calculation job worked well for me this morning, although I had previously tried with small parameters and still failed. On one of my tests I had also tried using Bigtop 0.7.0 (your provisioner currently installs 0.6.0) - perhaps it was related to that. In any case, +1 and committed! Great job - thanks!

        Show
        Sean Mackrory added a comment - The pi calculation job worked well for me this morning, although I had previously tried with small parameters and still failed. On one of my tests I had also tried using Bigtop 0.7.0 (your provisioner currently installs 0.6.0) - perhaps it was related to that. In any case, +1 and committed! Great job - thanks!
        Hide
        Peter Linnell added a comment -

        Provisionally +1. I'm traveling, so I cannot test it at the moment. Thanks for this tho!

        Show
        Peter Linnell added a comment - Provisionally +1. I'm traveling, so I cannot test it at the moment. Thanks for this tho!
        Hide
        jay vyas added a comment -

        ^^ patch is now attached (BIGTOP-1072.2.patch)

        Show
        jay vyas added a comment - ^^ patch is now attached ( BIGTOP-1072 .2.patch)
        Hide
        jay vyas added a comment -

        Hi bigtop! Okay. I've got an update from Sean's review.

        • config directory is now removed. Its really nice to know however that vagrant does the shared files for you, so maybe in a later patch we can add a mkdir to host and guest. but for now removed.
        • Commented out the cachier stuff and added a note about how to enable cachier.
        • Regarding openjdk and calculate pi : I made calculate pi job alot smaller, and i guess we will stick with openjdk if its okay.

        Also i added in Sean's idea to mkdir /user/vagrant so that the default SSH user has a home dir by default.

        Submitting patch shortly.

        Show
        jay vyas added a comment - Hi bigtop! Okay. I've got an update from Sean's review. config directory is now removed. Its really nice to know however that vagrant does the shared files for you, so maybe in a later patch we can add a mkdir to host and guest. but for now removed. vagrant v1 used host_name . v2 uses "hostname" as the parameter. So good catch http://docs.vagrantup.com/v2/vagrantfile/machine_settings.html . I've removed "host_name" and now it only uses "hostname". Commented out the cachier stuff and added a note about how to enable cachier. Regarding openjdk and calculate pi : I made calculate pi job alot smaller, and i guess we will stick with openjdk if its okay. Also i added in Sean's idea to mkdir /user/vagrant so that the default SSH user has a home dir by default. Submitting patch shortly.
        Hide
        Sean Mackrory added a comment -

        A couple of other thoughts:

        • We should also `hadoop fs -mkdir /user/vagrant && hadoop fs -chown vagrant:vagrant /user/vagrant` to give the default SSH user a home directory to run jobs, etc.
        • It looks like the config directory is not actually used. If that's the case, might I suggest just removing that line altogether or renaming it to something generic to represent it's just a shared directory?

        The Hadoop PI job keeps timing out on me. I'll have to investigate why. Given that this is entirely new and will certainly undergo more development I don't think we need to wait until everything's perfect to commit it though.

        Show
        Sean Mackrory added a comment - A couple of other thoughts: We should also `hadoop fs -mkdir /user/vagrant && hadoop fs -chown vagrant:vagrant /user/vagrant` to give the default SSH user a home directory to run jobs, etc. It looks like the config directory is not actually used. If that's the case, might I suggest just removing that line altogether or renaming it to something generic to represent it's just a shared directory? The Hadoop PI job keeps timing out on me. I'll have to investigate why. Given that this is entirely new and will certainly undergo more development I don't think we need to wait until everything's perfect to commit it though.
        Hide
        Peter Linnell added a comment -

        Following up, to clarify, no, do not hold up submitting a patch until you have all distros covered. openJDK should be fine with 2.x Hadoop. I saw some patches going into Hadoop itself to disabling the need to have the Oracle specific JDK. The issue there IIRC is security classes which were not in the IBM or openJDK.

        As for running a job, it might be better to add a launcher script which a user can invoke to test basic functionality.

        Show
        Peter Linnell added a comment - Following up, to clarify, no, do not hold up submitting a patch until you have all distros covered. openJDK should be fine with 2.x Hadoop. I saw some patches going into Hadoop itself to disabling the need to have the Oracle specific JDK. The issue there IIRC is security classes which were not in the IBM or openJDK. As for running a job, it might be better to add a launcher script which a user can invoke to test basic functionality.
        Hide
        jay vyas added a comment -

        Thanks ! This is AWESOME feedback. will look into these 3 issues, and resubmit a patch.

        Show
        jay vyas added a comment - Thanks ! This is AWESOME feedback. will look into these 3 issues, and resubmit a patch.
        Hide
        Sean Mackrory added a comment -

        Thanks for putting this patch together, Jay. I'm sorry it's taken a while to get you some reviews. Here are my thoughts:

        • I agree with Peter that we shouldn't limit ourselves to Fedora, but I don't think we should wait to commit this until we can support the full complement of distros (not what Peter was saying anyway, I don't think). Boxgrinder only supported RedHat / CentOS well, but I would imagine a great next step here is to add a bit of distro-aware logic to your provisioner and comment out the other supported options for the base box in the Vagrantfile.
        • Since the config directory is required, I would suggest creating it and adding a .gitignore file or something to it so that git pays attention to it and your patch creates it for users.
        • My system errored out because the "host_name" property doesn't exist, and I couldn't find documentation for it. Mistake?
        • Since the cachier plugin doesn't necessarily come with Vagrant, I have a slight preference toward disabling that by default. Maybe have it commented out with an explanation of the benefits of installing that plugin and enabling it?
        • I see you're installing OpenJDK. That's often what I use and I've never personally run into problems, but just be aware that upstream Hadoop has always recommend Oracle JDK (it did last I checked). Unfortunately we can't distribute that so easily anymore, so I have no objection to using OpenJDK in this. Just something to be aware of.
        • I'm not sure if I like the idea of running a job on start-up. It's a nice test, but I envisioned this as more of just providing a blank slate for people. No objection, just thinking out loud.

        I think this is really cool stuff. The provisioner is still running and I've seen a couple of hiccups, but they look like my fault. I'll comment more later today once I've done some more testing, but if you can resolve the cache, host_name, and possibly the cashier issues I raised, then I'm pretty close to a +1 here.

        Show
        Sean Mackrory added a comment - Thanks for putting this patch together, Jay. I'm sorry it's taken a while to get you some reviews. Here are my thoughts: I agree with Peter that we shouldn't limit ourselves to Fedora, but I don't think we should wait to commit this until we can support the full complement of distros (not what Peter was saying anyway, I don't think). Boxgrinder only supported RedHat / CentOS well, but I would imagine a great next step here is to add a bit of distro-aware logic to your provisioner and comment out the other supported options for the base box in the Vagrantfile. Since the config directory is required, I would suggest creating it and adding a .gitignore file or something to it so that git pays attention to it and your patch creates it for users. My system errored out because the "host_name" property doesn't exist, and I couldn't find documentation for it. Mistake? Since the cachier plugin doesn't necessarily come with Vagrant, I have a slight preference toward disabling that by default. Maybe have it commented out with an explanation of the benefits of installing that plugin and enabling it? I see you're installing OpenJDK. That's often what I use and I've never personally run into problems, but just be aware that upstream Hadoop has always recommend Oracle JDK (it did last I checked). Unfortunately we can't distribute that so easily anymore, so I have no objection to using OpenJDK in this. Just something to be aware of. I'm not sure if I like the idea of running a job on start-up. It's a nice test, but I envisioned this as more of just providing a blank slate for people. No objection, just thinking out loud. I think this is really cool stuff. The provisioner is still running and I've seen a couple of hiccups, but they look like my fault. I'll comment more later today once I've done some more testing, but if you can resolve the cache, host_name, and possibly the cashier issues I raised, then I'm pretty close to a +1 here.
        Hide
        Bruno Mahé added a comment -

        I did the following:

        • mkdir config (otherwise vagrant complains)
        • vagrant destroy --force && vagrant up
        • vagrant ssh

        But I do not see any package from Apache Bigtop installed. java is not even installed.
        Am I missing anything?

        Show
        Bruno Mahé added a comment - I did the following: mkdir config (otherwise vagrant complains) vagrant destroy --force && vagrant up vagrant ssh But I do not see any package from Apache Bigtop installed. java is not even installed. Am I missing anything?
        Hide
        Bruno Mahé added a comment -

        Nevermind, I should have read your previous comments.

        Show
        Bruno Mahé added a comment - Nevermind, I should have read your previous comments.
        Hide
        Bruno Mahé added a comment -

        Got the following error:

        [bruno@p8700 vagrant]$  vagrant destroy --force && vagrant up 
        [bigtop1] VM not created. Moving on...
        Bringing machine 'bigtop1' up with 'virtualbox' provider...
        [bigtop1] Box 'vagrant-fedora19B' was not found. Fetching box from specified URL for
        the provider 'virtualbox'. Note that if the URL does not have
        a box for this provider, you should interrupt Vagrant now and add
        the box yourself. Otherwise Vagrant will attempt to download the
        full box prior to discovering this error.
        Downloading or copying the box...
        Extracting box...te: 884k/s, Estimated time remaining: --:--:--)
        Successfully added box 'vagrant-fedora19B' with provider 'virtualbox'!
        There are errors in the configuration of this machine. Please fix
        the following errors and try again:
        
        vm:
        * The host path of the shared folder is missing: ./config
        
        Vagrant:
        * Unknown configuration section 'cache'.
        

        Will take a look at the errors later. But if you already know the answer, let me know

        Show
        Bruno Mahé added a comment - Got the following error: [bruno@p8700 vagrant]$ vagrant destroy --force && vagrant up [bigtop1] VM not created. Moving on... Bringing machine 'bigtop1' up with 'virtualbox' provider... [bigtop1] Box 'vagrant-fedora19B' was not found. Fetching box from specified URL for the provider 'virtualbox'. Note that if the URL does not have a box for this provider, you should interrupt Vagrant now and add the box yourself. Otherwise Vagrant will attempt to download the full box prior to discovering this error. Downloading or copying the box... Extracting box...te: 884k/s, Estimated time remaining: --:--:--) Successfully added box 'vagrant-fedora19B' with provider 'virtualbox'! There are errors in the configuration of this machine. Please fix the following errors and try again: vm: * The host path of the shared folder is missing: ./config Vagrant: * Unknown configuration section 'cache'. Will take a look at the errors later. But if you already know the answer, let me know
        Hide
        jay vyas added a comment -

        (bumping previous bullets) ...

        Any thoughts on the existing vagrant patch ? by running "vagrant up" , it:

        • spins up a VM from an externally hosted fedora vagrant base box.
        • adds in the yum repos for bigtop from external repos
        • sets up and installs JDK/hadoop/JAVA_HOME etc...
        • starts a mapreduce job
        Show
        jay vyas added a comment - (bumping previous bullets) ... Any thoughts on the existing vagrant patch ? by running "vagrant up" , it: spins up a VM from an externally hosted fedora vagrant base box. adds in the yum repos for bigtop from external repos sets up and installs JDK/hadoop/JAVA_HOME etc... starts a mapreduce job
        Hide
        jay vyas added a comment -

        thanks Peter Linnell .

        To simplify this JIRA, ive split the docker idea (which is a good one !) into another JIRA:
        https://issues.apache.org/jira/browse/BIGTOP-1154

        Show
        jay vyas added a comment - thanks Peter Linnell . To simplify this JIRA, ive split the docker idea (which is a good one !) into another JIRA: https://issues.apache.org/jira/browse/BIGTOP-1154
        Hide
        Peter Linnell added a comment -

        I like this idea a lot. We should not limit it to Fedora IMO. Excusing my SUSE, if we add this I think we should ensure all supported distros have this capability. Debian, Ubuntu, the RHEL variants, and openSUSE, SLES.

        I'll gladly test this for openSUSE and SLES. I've got a honking demo/build box for this.

        The docker option looks more interesting now that it is also multi-distro and does not need specific kernel patches.

        Show
        Peter Linnell added a comment - I like this idea a lot. We should not limit it to Fedora IMO. Excusing my SUSE, if we add this I think we should ensure all supported distros have this capability. Debian, Ubuntu, the RHEL variants, and openSUSE, SLES. I'll gladly test this for openSUSE and SLES. I've got a honking demo/build box for this. The docker option looks more interesting now that it is also multi-distro and does not need specific kernel patches.
        Hide
        jay vyas added a comment - - edited

        Roman Shaposhnik How do you propose testing this ? There is no build artifact from it. I could add a shell script to the bigtop "build" but am not sure where / how it would run... (In order to test on the jenkins server, vagrant + Vbox would have to be installed)

        Show
        jay vyas added a comment - - edited Roman Shaposhnik How do you propose testing this ? There is no build artifact from it. I could add a shell script to the bigtop "build" but am not sure where / how it would run... (In order to test on the jenkins server, vagrant + Vbox would have to be installed)
        Hide
        jay vyas added a comment -

        Hi bigtop ! I have a special holiday surprise for you !

        vagrant setup of a bigtop box This vagrant recipe will install and spin up a single fedora 19 vagrant box, install bigtop YARN, and then start a mapreduce job for you.

        It uses a vagrant fedora19 box that we host publically as the basis.

        Its a first iteration, but ive tested it and it works . To test it locally.

        1) Install virtualbox.

        2) (very easy, and optional if you want to comment out " bigtop1.cache.enable :yum") Install the yum cachier vagrant plugin : https://github.com/fgrehm/vagrant-cachier

        3) vagrant destroy --force && vagrant up

        After a few minutes, you should see "calculate pi" starting up.

        I've basically stolen this recipe from the READMEs in the bigtop-deploy docs. But if this seems useful, maybe as a next iteration i can put in a subproject which builds a bigtop smoke testing environment, so that we can run smoke tests in VMs while developing them, or alternatively , we could use them for things like meetups / hackathons / etc, to make sure everyone has the same reproducible hadoop dev environment.

        Anyways, comments are welcome. If its to raw to put into bigtop main fork just let me know and ill refine it some more, but i think its probably a pretty good start

        Show
        jay vyas added a comment - Hi bigtop ! I have a special holiday surprise for you ! vagrant setup of a bigtop box This vagrant recipe will install and spin up a single fedora 19 vagrant box, install bigtop YARN, and then start a mapreduce job for you. It uses a vagrant fedora19 box that we host publically as the basis. Its a first iteration, but ive tested it and it works . To test it locally. 1) Install virtualbox. 2) (very easy, and optional if you want to comment out " bigtop1.cache.enable :yum") Install the yum cachier vagrant plugin : https://github.com/fgrehm/vagrant-cachier 3) vagrant destroy --force && vagrant up After a few minutes, you should see "calculate pi" starting up. I've basically stolen this recipe from the READMEs in the bigtop-deploy docs. But if this seems useful, maybe as a next iteration i can put in a subproject which builds a bigtop smoke testing environment, so that we can run smoke tests in VMs while developing them, or alternatively , we could use them for things like meetups / hackathons / etc, to make sure everyone has the same reproducible hadoop dev environment. Anyways, comments are welcome. If its to raw to put into bigtop main fork just let me know and ill refine it some more, but i think its probably a pretty good start
        Hide
        Roman Shaposhnik added a comment -

        Sure! Go ahead!

        Show
        Roman Shaposhnik added a comment - Sure! Go ahead!
        Hide
        jay vyas added a comment -

        Thanks ! Okay I will attempt at putting a bigtop-deploy/vagrant setup in there to start. Should i attempt a patch now?

        Show
        jay vyas added a comment - Thanks ! Okay I will attempt at putting a bigtop-deploy/vagrant setup in there to start. Should i attempt a patch now?
        Hide
        Roman Shaposhnik added a comment -

        jay vyas actually we've got bigtop-deploy/ folder for that. There's currently bigtop-deploy/puppet, bigtop-deploy/live-cd and bigtop-deploy/vm. I'd say this is the most appropriate place for it.

        Show
        Roman Shaposhnik added a comment - jay vyas actually we've got bigtop-deploy/ folder for that. There's currently bigtop-deploy/puppet, bigtop-deploy/live-cd and bigtop-deploy/vm. I'd say this is the most appropriate place for it.
        Hide
        jay vyas added a comment -

        Structure wise: Where would these scripts go? Ive posited that maybe a sandbox/ or examples/ directory would help bigtop adoption and lower the learning curve for other tasks (i.e. running smokes). Maybe this is another example where a less structured examples/ directory in the source would be useful to have.

        Other suggestions for where VM utilities to spin up vagrant and/or docker.io instances could go?

        Show
        jay vyas added a comment - Structure wise: Where would these scripts go? Ive posited that maybe a sandbox/ or examples/ directory would help bigtop adoption and lower the learning curve for other tasks (i.e. running smokes). Maybe this is another example where a less structured examples/ directory in the source would be useful to have. Other suggestions for where VM utilities to spin up vagrant and/or docker.io instances could go?
        Hide
        Sean Mackrory added a comment -

        Just throwing in a general vote of support for efforts like this.

        Probably in the begining this could be complimentary to the boxgrinder created VMs, and over time, maybe people would migrated to using the vagrant provisioned VMs as they become more popular and use of vagrant gets more common in the community.

        There was a discussion a while ago on what we might replace Boxgrinder with since it is no longer supported by Red Hat and is already, IMO, showing it's age. I've been using Packer.io for some of my work and I quite like it. I've been thinking about proposing some code to build a Bigtop appliance with Packer, in fact. If we do go this direction, it's worth pointing out that Packer and Vagrant come from the same guy - and you can easily produce Vagrant boxes with Packer.

        Also, Vagrant allows provisioning with puppet - for which we have a lot of deployment code! Probably not something you'd want to run everytime you spin up an instance of the box, but just a suggestion for how this work could start.

        Show
        Sean Mackrory added a comment - Just throwing in a general vote of support for efforts like this. Probably in the begining this could be complimentary to the boxgrinder created VMs, and over time, maybe people would migrated to using the vagrant provisioned VMs as they become more popular and use of vagrant gets more common in the community. There was a discussion a while ago on what we might replace Boxgrinder with since it is no longer supported by Red Hat and is already, IMO, showing it's age. I've been using Packer.io for some of my work and I quite like it. I've been thinking about proposing some code to build a Bigtop appliance with Packer, in fact. If we do go this direction, it's worth pointing out that Packer and Vagrant come from the same guy - and you can easily produce Vagrant boxes with Packer. Also, Vagrant allows provisioning with puppet - for which we have a lot of deployment code! Probably not something you'd want to run everytime you spin up an instance of the box, but just a suggestion for how this work could start.
        Hide
        Bruno Mahé added a comment -

        Vagrant and docker.io are great ideas!
        Although I couldn't use kvm with vagrant on the last Ubuntu LTS due to some dependencies requirements on Vagrant's side.

        I would like to suggest that bigtop provides / maintains vagrant startup scripts that layer hadoop tools on top of a "base box" vm. This is slightly different than the current strategy which creates a full blown VM with hadoop on it. The vagrant approach provides a means for more developer customization of the vm artifacts being used without adding any real overhead (other than having vagrant installed and understanding the very simply vagrant recipe for creating a vm).

        Apache Bigtop does not maintain anything. It is volunteers who maintain components.
        So it is all up to people being interested enough to send patches to improve and keep any Vagrant recipe up to date.

        Probably in the begining this could be complimentary to the boxgrinder created VMs, and over time, maybe people would migrated to using the vagrant provisioned VMs as they become more popular and use of vagrant gets more common in the community.

        It does not have to be one or the other. We could maintain as many deployment recipes as long as there are volunteers to maintain them. And so far, I am interested in maintaining the boxgrinder ones.

        Show
        Bruno Mahé added a comment - Vagrant and docker.io are great ideas! Although I couldn't use kvm with vagrant on the last Ubuntu LTS due to some dependencies requirements on Vagrant's side. I would like to suggest that bigtop provides / maintains vagrant startup scripts that layer hadoop tools on top of a "base box" vm. This is slightly different than the current strategy which creates a full blown VM with hadoop on it. The vagrant approach provides a means for more developer customization of the vm artifacts being used without adding any real overhead (other than having vagrant installed and understanding the very simply vagrant recipe for creating a vm). Apache Bigtop does not maintain anything. It is volunteers who maintain components. So it is all up to people being interested enough to send patches to improve and keep any Vagrant recipe up to date. Probably in the begining this could be complimentary to the boxgrinder created VMs, and over time, maybe people would migrated to using the vagrant provisioned VMs as they become more popular and use of vagrant gets more common in the community. It does not have to be one or the other. We could maintain as many deployment recipes as long as there are volunteers to maintain them. And so far, I am interested in maintaining the boxgrinder ones.
        Hide
        Roman Shaposhnik added a comment -

        This would be really nice to have.

        On the same note, I was thinking that we should probably start producing Bigtop containers for docker.io.

        More details here: http://www.docker.io/ and here https://index.docker.io/

        Dockers is a seriously cool way of provisioning things like Hadoop via CoreOS: http://coreos.com/

        Show
        Roman Shaposhnik added a comment - This would be really nice to have. On the same note, I was thinking that we should probably start producing Bigtop containers for docker.io. More details here: http://www.docker.io/ and here https://index.docker.io/ Dockers is a seriously cool way of provisioning things like Hadoop via CoreOS: http://coreos.com/

          People

          • Assignee:
            jay vyas
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development