Issue Details (XML | Word | Printable)

Key: HADOOP-6255
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Owen O'Malley
Votes: 1
Watchers: 18
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Create an rpm target in the build.xml

Created: 11/Sep/09 11:21 PM   Updated: 20/Oct/09 04:15 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Duplicate
 
Reference
 


 Description  « Hide
We should be able to create RPMs for Hadoop releases.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
FROHNER Ákos added a comment - 16/Sep/09 09:09 AM
Hi,

I would suggest the other way around: create an RPM spec file,
which uses a distribution tarball and calls the generic build.xml
to build the hadoop packages.

This way eases the adoption by upstream distributions, as they
already have the framework to build packages from tarball+spec
files (source RPM).

And the same pattern can be used for Debian/Ubuntu packaging.


Steve Loughran added a comment - 25/Sep/09 03:57 PM
  1. This should be a separate project from the others, it's integration, and will soon get big.
  2. The project would be linux and OS/X (with rpmbuild installed) only. Even on linux, the right tools need to be installed
  3. Basic RPMs are easy, passing rpmlint harder
  4. Testing, that's the fun part.

We test our RPMs by

  1. SCP to configured real/virtual machines. These are Centos 5.x VMs, usually hosted under VMWare. Under VirtualBox, RHEL5 and Centos spins one CPU at 100% (Virtualbox bug #1233)
  2. force-uninstalling any old versions, install the new ones.
  3. SSH in, walk the shell scripts through their entry points
    <target name="rpm-remote-initd"
          depends="rpm-ready-to-remote-install,rpm-remote-install"
          description="check that initd parses">
        <rootssh command="${remote-smartfrogd} start"/>
        <pause/>
        <rootssh command="${remote-smartfrogd} status"/>
        <pause/>
        <rootssh command="${remote-smartfrogd} start"/>
        <rootssh command="${remote-smartfrogd} status"/>
        <rootssh command="${remote-smartfrogd} stop"/>
        <rootssh command="${remote-smartfrogd} stop"/>
        <rootssh command="${remote-smartfrogd} restart"/>
        <pause/>
        <rootssh command="${remote-smartfrogd} status"/>
        <rootssh command="${remote-smartfrogd} restart"/>
        <pause/>
        <rootssh command="${remote-smartfrogd} status"/>
        <rootssh command="${remote-smartfrogd} stop"/>
      </target>
  4. run rpm -qf against various files, verify that they are owned. The RPM commands, executed remotely over SSH, are no fun to use in tests as you have to look for certain strings in the response; error codes are not used to signal failures. Ouch.
    <fail>
          <condition>
            <or>
              <contains string="${rpm.queries.results}"
                  substring="is not owned by any package"/>
              <contains string="${rpm.queries.results}"
                  substring="No such file or directory"/>
            </or>
          </condition>
          One of the directories/files in the RPM is not declared as being owned by any RPM.
          This file/directory will not be managed correctly, or have the correct permissions
          on a hardened linux.
          ${rpm.queries.results}
        </fail>

For full functional testing, we also package up the test source trees as JAR files which are published via Ivy, so that the release/ project can retrieve those test files and point them (by way of java properties) at the remote machine. This is powerful as you can be sure that the RPM installations really do work as intended. If you only test the local machine, you miss out on problems.

These tests don't verify all possible upgrades. They can be trouble as RPM installs the new files before uninstalling the old ones. Trouble.

The other issue is configuration. You can either mark all configuration files as %config(noreplace), meaning people can edit them and upgrades won't stamp on them, or have a more structured process for managing conf files. Cloudera provide a web site to create a new configuration RPM, Apache could be provide a .tar.gz file which contains everything needed to create your own configuration RPM.

Therefore + 1 to RPMs and debs

  1. In a separate package
  2. Named Apache-Hadoop. People out there are already releasing hadoop RPMs, we don't want confusion.
  3. With all config files in the RPM marked as %config) files, which end users can stamp on, or a separate roll-your-own-config RPM tool
  4. Once the tests are designed to run against remote systems, they should be run against the RPM installations.

I don't volunteer to write the spec files or the build files, all mine are up to look at, and I will formally release them as Apache licensed if you want to use them as a starting point:
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/release/
I could help with some of the functional testing now, provided it uses some of my real/virtual cluster management stuff to pick target hosts.


Steve Loughran added a comment - 25/Sep/09 04:12 PM
I should add that I do include my own Hadoop jars in my RPMs, and that these RPMs are what get installed in machine images (real or virtual) that are then used for all the cluster based testing. Because if you are going to distribute your artifacts as RPMs, that's how you should test your code. Once you've automated RPM installation and the creation of gold-VM images for your target VM infrastructure (VMWare, Xen, EC2, etc), then you can worry about cluster scale testing of the artifacts.

Steve Loughran added a comment - 20/Oct/09 04:12 PM
In HADOOP-3835 dhruba proposed an RPM target in the build. "minor" improvement. This issue is later but it has more watchers, so I'm going to close Dhruba's issue as duplicate -even though his came first.

Steve Loughran added a comment - 20/Oct/09 04:15 PM
HADOOP-5615 includes some spec files; these could be a starting point for something targeting 0.22+ (with all of avro's jars too).