Uploaded image for project: 'Hama'
  1. Hama
  2. HAMA-983

Hama runner for DataFlow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      As you already know, Apache Beam provides unified programming model for both batch and streaming inputs.

      The APIs are generally associated with data filtering and transforming. So we'll need to implement some data processing runner like https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java

      Also, implementing similarity join can be funny. According to http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is clearly winner among Apache Hadoop and Apache Spark.

      Since it consists of transformation, aggregation, and partition computations, I think it's possible to implement using Apache Beam APIs.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              udanax Edward J. Yoon
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: