[HAMA-983] Hama runner for DataFlow - ASF JIRA

XML

Word

Printable

JSON

As you already know, Apache Beam provides unified programming model for both batch and streaming inputs.

The APIs are generally associated with data filtering and transforming. So we'll need to implement some data processing runner like https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java

Also, implementing similarity join can be funny. According to http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is clearly winner among Apache Hadoop and Apache Spark.

Since it consists of transformation, aggregation, and partition computations, I think it's possible to implement using Apache Beam APIs.

depends upon

HAMA-940 Add StreamInputFormat

is related to

BEAM-612 Add BSP runner

links to

beam-hama-runner