Index: src/site/site.xml =================================================================== --- src/site/site.xml (revision 1229466) +++ src/site/site.xml (working copy) @@ -65,7 +65,10 @@ - + + + + Index: src/site/resources/css/site.css =================================================================== --- src/site/resources/css/site.css (revision 1229466) +++ src/site/resources/css/site.css (working copy) @@ -20,20 +20,24 @@ } body a:link { - color: #74240f; + color: #74240f; } body a:visited { - color: #74240f; + color: #74240f; } body a:hover { - text-decoration: underline; - color: #888800; + text-decoration: underline; + color: #888800; } h2, h3, h4 { - color: #74240f; + color: #74240f; } +ul { + margin-left:10px; +} + .green { color: #0f7355; border: 1px solid #0f7355; @@ -43,6 +47,8 @@ .green a:link, .green a:active, .green a:visited { color: #0a4d39; } .green a:hover { color: #888800; } .green h3 { + margin-bottom:-5px; + margin-top:-2px; padding: 0px 0px 0px 0px; border: 0px; color: #74240f; Index: src/site/xdoc/getting_started_with_hama.xml =================================================================== --- src/site/xdoc/getting_started_with_hama.xml (revision 0) +++ src/site/xdoc/getting_started_with_hama.xml (revision 0) @@ -0,0 +1,128 @@ + + + + + +
+

This document describes how to install, configure and manage Hama clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

+ + +

1. Make sure all required software is installed on all nodes in your cluster:

+
    +
  • hadoop-0.20.x (non-secure version)
  • +
  • Sun Java JDK 1.6.x or higher version
  • +
  • SSH access to manage BSP deamons
  • +
+

2. Download Hama from the release page. +

+ For additional information consult our + Compatibility Table

+ + +

Just like Hadoop, we distinct between three modes:

+
    +
  • Local Mode - This mode is the default mode if you download Hama (>= 0.3.0) and install it. When submitting a job it will run a local multithreaded BSP Engine on your server. It can be configured via the bsp.master.address property to local. You can adjust the number of threads used in this utility by setting the bsp.local.tasks.maximum property. See the Settings step how and where to configure this.
  • +
  • Pseudo Distributed Mode - This mode is when you just have a single server and want to launch all the deamon processes (BSPMaster, Groom and Zookeeper). It can be configured when you set the bsp.master.address to a host address e.g., localhost and put the same address into the groomservers file in the configuration directory. As stated it will run a BSPMaster, a Groom and a Zookeeper on your machine.
  • +
  • Distributed Mode - This mode is just like the "Pseudo Distributed Mode", but you have multiple machines, which are mapped in the groomservers file.
  • +
+ + +

The $HAMA_HOME/conf directory contains some configuration files for Hama. These are:

+
    +
  • hama-env.sh - This file contains some environment variable settings used by Hama. You can use these to affect some aspects of Hama daemon behavior, such as where log files are stored, the maximum amount of heap used etc. The only variable you should need to change in this file is JAVA_HOME, which specifies the path to the Java 1.5.x installation used by Hama.
  • +
  • groomservers - This file lists the hosts, one per line, where the GroomServer daemons will run. By default this contains the single entry localhost
  • +
  • hama-default.xml - This file contains generic default settings for Hama daemons. Do not modify this file.
  • +
  • hama-site.xml - This file contains site specific settings for all Hama daemons and BSP jobs. This file is empty by default. Settings in this file override those in hama-default.xml. This file should contain settings that must be respected by all servers and clients in a Hama installation.
  • +
+ + +

The $HAMA_HOME/bin directory contains some script used to start up the Hama daemons.

+
  • start-bspd.sh - Starts all Hama daemons, the BSPMaster, GroomServers and Zookeeper.
+

Note: You have to start Hama with the same user which is configured for Hadoop.

+ + +
  • BSPMaster and Zookeeper settings - Figure out where to run your HDFS namenode and BSPMaster. Set the variable bsp.master.address to the BSPMaster's intended host:port. Set the variable fs.default.name to the HDFS Namenode's intended host:port.
+

Here's an example of a hama-site.xml file:

+ +
+  <?xml version="1.0"?>
+  <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+  <configuration>
+    <property>
+      <name>bsp.master.address</name>
+      <value>host1.mydomain.com:40000</value>
+      <description>The address of the bsp master server. Either the
+      literal string "local" or a host:port for distributed mode
+      </description>
+    </property>
+
+    <property>
+      <name>fs.default.name</name>
+      <value>hdfs://host1.mydomain.com:9000/</value>
+      <description>
+        The name of the default file system. Either the literal string
+        "local" or a host:port for HDFS.
+      </description>
+    </property>
+
+    <property>
+      <name>hama.zookeeper.quorum</name>
+      <value>host1.mydomain.com,host2.mydomain.com</value>
+      <description>Comma separated list of servers in the ZooKeeper Quorum.
+      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
+      By default this is set to localhost for local and pseudo-distributed modes
+      of operation. For a fully-distributed setup, this should be set to a full
+      list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set in hama-env.sh
+      this is the list of servers which we will start/stop zookeeper on.
+      </description>
+    </property>
+  </configuration>
+
+ +If you are managing your own ZooKeeper, you have to specify the port number as below: + +
+  <property>
+    <name>hama.zookeeper.property.clientPort</name>
+    <value>2181</value>
+  </property>
+
+See all Configuration Properties + + +

NOTE: Skip this step if you're in Local Mode. +
Run the command:

+ +
+  % $HAMA_HOME/bin/start-bspd.sh
+  
+

This will startup a BSPMaster, GroomServers and Zookeeper on your machine.

+

Run the command:

+
+  % $HAMA_HOME/bin/stop-bspd.sh
+  
+ +

to stop all the daemons running on your cluster.

+ +
+  % $HAMA_HOME/bin/hama jar hama-examples-0.x.0-incubating.jar [args]
+  
+ +
Index: src/site/xdoc/hama_bsp_tutorial.xml =================================================================== --- src/site/xdoc/hama_bsp_tutorial.xml (revision 0) +++ src/site/xdoc/hama_bsp_tutorial.xml (revision 0) @@ -0,0 +1,30 @@ + + + + + +
+

This document describes the Hama BSP framework and serves as a tutorial.

+ + + + + +
Index: src/site/xdoc/hama_on_clouds.xml =================================================================== --- src/site/xdoc/hama_on_clouds.xml (revision 0) +++ src/site/xdoc/hama_on_clouds.xml (revision 0) @@ -0,0 +1,45 @@ + + + + + +
+

This document describes how to deploy Hama clusters on Clouds e.g., EC2, Rackspace using Whirr.

+ + +

The following commands install Whirr and start a 5 node Hama cluster on Amazon EC2 in 5 minutes or less. +

+  % curl -O http://www.apache.org/dist/whirr/whirr-0.x.0/whirr-0.x.0.tar.gz
+  % tar zxf whirr-0.x.0.tar.gz; cd whirr-0.x.0
+
+  % export AWS_ACCESS_KEY_ID=YOUR_ID
+  % export AWS_SECRET_ACCESS_KEY=YOUR_SECKEY
+  % ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr
+
+  % bin/whirr launch-cluster --config recipes/hama-ec2.properties --private -key-file ~/.ssh/id_rsa_whirr
+    
+

+ +
+  % cd /usr/local/hama-0.x.0-incubating
+  % bin/hama jar hama-examples-0.x.0-incubating.jar [args]
+    
+ +
Index: src/site/xdoc/index.xml =================================================================== --- src/site/xdoc/index.xml (revision 1229466) +++ src/site/xdoc/index.xml (working copy) @@ -22,34 +22,52 @@
-
+

- Apache Hama is a distributed computing framework based on - BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations, - e.g., matrix, graph and network algorithms. - It was inspired by Google's Pregel, but different in the sense - that it's purely BSP and common model, not just for graph. + Apache Hama is a Pure BSP computing framework for massive scientific computations + e.g., matrix, graph and network algorithms. Currently, it has follow features:

- +
    +
  • Job submission and management interface.
  • +
  • Multiple tasks per node.
  • +
  • Input/Output Formatter.
  • +
  • Checkpoint recovery.
  • +
  • Supports to run in the Clouds using Apache Whirr.
  • +
  • Supports to run with Hadoop YARN.
  • +
+

Recent News

    +
  • Jan 30, 2012: release 0.4.0 available
  • July 28, 2011: release 0.3.0 available
  • June 2, 2011: release 0.2.0 available
  • Apr 30, 2010: Introduced in the BSP Worldwide
  • May 20, 2008: Accept Hama to be a Apache incubator project
- + +

Today, many practical data processing applications requires a good scalability, a flexible model, a high compatibility with existing data systems e.g., HDFS, HBase, and especially a communication capability which allows to exchange information using message-passing paradigm beyond MapReduce. + Here, Bulk Synchronous Parallel (BSP) model fills the bill nicely and besides, it has several main advantages over MapReduce and MPI:

+
    +
  • Supports message passing paradigm style of application development
  • +
  • Provides a flexible, simple, and easy-to-use small APIs
  • +
  • Enables to perform better than MPI for communication-intensive applications
  • +
  • Guarantees impossibility of deadlocks or collisions in the communication mechanisms
  • +
+ +

Learn about Hama and BSP by reading the documentation.

+

Start by installing Hama on a Hadoop cluster.