Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.7.0
-
None
Description
Here is an initial patch to support Mahout as a Whirr service.
I created the role 'mahout-home' which can be used to install the binary Mahout distribution on a Hadoop namenode.
By combining this role with configuration for a Hadoop cluster you can SSH into the namenode, su to root and start running Mahout jobs via the mahout script immediately.
The 'mahout-home' role has two properties
Mahout version whirr.mahout.version
URL of the Mahout binary distribution tarball whirr.mahout.tarball.url
Note that I used a snapshot version of Mahout for testing, revision 1169784, because there were some problems with the Mahout script in 0.5 that have been fixed on trunk, see MAHOUT-680. To test you can set the tarball property to this link http://dl.dropbox.com/u/13436484/mahout-distribution-0.6-SNAPSHOT.tar.gz
I used configure actions and the onBeforeConfigure(). If there is a better way to express this with the Whirr API let me know.
Currently I am investigating a 'mahout-jar' role, which installs the Mahout examples job jar under $HADOOP_HOME/lib on a tasktracer node. I already have some code for putting the jar in place but when running a job from my local machine I still get ClassNotFoundExceptions. I believe this is because Hadoop has already started before the jar is put in the lib dir, so the jar won't be picked up, but I have to investigate some more. From WHIRR-221 I understood that there is no support (yet?) for ordering of services but if you have an idea on how to fix this let me know.
Comments and suggestions welcome!