Hama
  1. Hama
  2. HAMA-597

Split a GraphJobRunner into multiple classes

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: graph
    • Labels:
      None

      Description

      Really not readable and maintainable.

      Break up pieces of functionality into their own classes.

      1. HAMA-597.patch
        58 kB
        Thomas Jungblut

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hama-Nightly #672 (See https://builds.apache.org/job/Hama-Nightly/672/)
        HAMA-597: Split a GraphJobRunner into multiple classes Part 1 (Revision 1383375)

        Result = SUCCESS
        tjungblut :
        Files :

        • /hama/trunk/CHANGES.txt
        • /hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
        • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobMessage.java
        • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunner.java
        • /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunnerBase.java
        Show
        Hudson added a comment - Integrated in Hama-Nightly #672 (See https://builds.apache.org/job/Hama-Nightly/672/ ) HAMA-597 : Split a GraphJobRunner into multiple classes Part 1 (Revision 1383375) Result = SUCCESS tjungblut : Files : /hama/trunk/CHANGES.txt /hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobMessage.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunner.java /hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobRunnerBase.java
        Hide
        Thomas Jungblut added a comment -

        The multistep partitioning is a bit complex for me currently, I will read through it more carefully. Maybe we can't simplify it in less than 100 LOC.

        Checked in the current version of refactoring now.

        Show
        Thomas Jungblut added a comment - The multistep partitioning is a bit complex for me currently, I will read through it more carefully. Maybe we can't simplify it in less than 100 LOC. Checked in the current version of refactoring now.
        Hide
        Thomas Jungblut added a comment -

        Edward, can't we drastically simplify the loadVertices method?

        What do you think about:
        -Vertices are loaded from disk into RAM
        -Multistep partitioning takes place (that can be combined with repair)

        Imho this would be done in less than 100 lines.

        Show
        Thomas Jungblut added a comment - Edward, can't we drastically simplify the loadVertices method? What do you think about: -Vertices are loaded from disk into RAM -Multistep partitioning takes place (that can be combined with repair) Imho this would be done in less than 100 lines.
        Hide
        Thomas Jungblut added a comment -

        Great, I will refactor the loadVertices method and then commit it.

        Show
        Thomas Jungblut added a comment - Great, I will refactor the loadVertices method and then commit it.
        Hide
        Edward J. Yoon added a comment -

        Thanks very nice!

        Show
        Edward J. Yoon added a comment - Thanks very nice!
        Hide
        Thomas Jungblut added a comment -

        Deleted the base again, added a aggregation runner.

        Still the load vertices method needs to be cleansed. But what do you think about this refactoring? Is this more readable in your opinion?

        Show
        Thomas Jungblut added a comment - Deleted the base again, added a aggregation runner. Still the load vertices method needs to be cleansed. But what do you think about this refactoring? Is this more readable in your opinion?
        Hide
        Thomas Jungblut added a comment -

        I will have a look, most of the mess comes from me, so I should clean that up :/

        Show
        Thomas Jungblut added a comment - I will have a look, most of the mess comes from me, so I should clean that up :/
        Hide
        Edward J. Yoon added a comment -

        If GraphJobRunner can show clearly and concisely overall flow of program concisely (like main function in C/Java), I think moving all implementations into base class is also not bad idea. P.S., I'm not good Java programmer

        We have lots of duplicate code.

        Agree.

        Show
        Edward J. Yoon added a comment - If GraphJobRunner can show clearly and concisely overall flow of program concisely (like main function in C/Java), I think moving all implementations into base class is also not bad idea. P.S., I'm not good Java programmer We have lots of duplicate code. Agree.
        Hide
        Thomas Jungblut added a comment -

        Yes, also in the loadVertices method and others. We have lots of duplicate code.

        Show
        Thomas Jungblut added a comment - Yes, also in the loadVertices method and others. We have lots of duplicate code.
        Hide
        Edward J. Yoon added a comment -

        Personally we should extract a single superstep and refactor it into clean methods.

        Do you mean the logic in while loop?

        Show
        Edward J. Yoon added a comment - Personally we should extract a single superstep and refactor it into clean methods. Do you mean the logic in while loop?
        Hide
        Thomas Jungblut added a comment -

        Do you think that this will lower the complexity? Personally we should extract a single superstep and refactor it into clean methods. Extracting a base class if we don't have multiple implementations is just confusing.

        Show
        Thomas Jungblut added a comment - Do you think that this will lower the complexity? Personally we should extract a single superstep and refactor it into clean methods. Extracting a base class if we don't have multiple implementations is just confusing.
        Hide
        Edward J. Yoon added a comment -

        Moved all except bsp(), setup(), and clean() to abstract class GraphJobRunnerBase.

        Show
        Edward J. Yoon added a comment - Moved all except bsp(), setup(), and clean() to abstract class GraphJobRunnerBase.
        Hide
        Mirko Kaempf added a comment -

        OK, thank you for the comment. As the GraphJob is already there, this item is clear to me now. And I will just make the code of GraphJobRunner more readable.

        Show
        Mirko Kaempf added a comment - OK, thank you for the comment. As the GraphJob is already there, this item is clear to me now. And I will just make the code of GraphJobRunner more readable.
        Hide
        Thomas Jungblut added a comment -

        There is the Job class concept which MapReduce already uses.
        Hama has quite the same interfaces to submit the jobs like MapReduce has (job.submit, job.waitForCompletion). And as you may know, Giraph runs on top of MapReduce so basically it is the same.

        What you are looking for is something that creates a common interface to start these jobs. That is not what the GraphJobRunner does, it is a bridge between the native BSP model and the higher level vertex abstraction in the Pregel paper.

        The handling of job start and submission is handled by the GraphJob. And I do suggest you to not create a constant interface throughout various projects since there are too many options that diverge.

        Show
        Thomas Jungblut added a comment - There is the Job class concept which MapReduce already uses. Hama has quite the same interfaces to submit the jobs like MapReduce has (job.submit, job.waitForCompletion). And as you may know, Giraph runs on top of MapReduce so basically it is the same. What you are looking for is something that creates a common interface to start these jobs. That is not what the GraphJobRunner does, it is a bridge between the native BSP model and the higher level vertex abstraction in the Pregel paper. The handling of job start and submission is handled by the GraphJob. And I do suggest you to not create a constant interface throughout various projects since there are too many options that diverge.
        Hide
        Mirko Kaempf added a comment -

        To your question "But what is the reason to make it comparable to giraph?"

        I started a review on availlable graph processing tools, so I found this:
        (http://blog.acaro.org/entry/google-pregel-the-rise-of-the-clones)

        "So, to summarize, what Hame, GoldenOrb and Giraph have in common is: Java platform, Apache License (and incubation), BSP computation. What they differ for: Hama offers BSP primitives not graph processing API (so it sits at a lower level)..."

        In a separate project, it is called "hadoop distributed graph space" (HDGS) a Metadata store for graph processing (like Hive warehouse) and a generic workflow framework (compatible to Oozie) is under construction which is running on top of HDFS and HBase in a hadoop cluster.

        I would like to see a common way to start these jobs based on HAMA and Giraph or simple MapReduce (for example for preprocessing graph data sets by analysing time series data in a comparable way, so it would be easy to have applications on a higher abstraction level which build workflows by combining different tasks using different computational concepts in one single Hadoop cluster.

        Show
        Mirko Kaempf added a comment - To your question "But what is the reason to make it comparable to giraph?" I started a review on availlable graph processing tools, so I found this: ( http://blog.acaro.org/entry/google-pregel-the-rise-of-the-clones ) "So, to summarize, what Hame, GoldenOrb and Giraph have in common is: Java platform, Apache License (and incubation), BSP computation. What they differ for: Hama offers BSP primitives not graph processing API (so it sits at a lower level)..." In a separate project, it is called "hadoop distributed graph space" (HDGS) a Metadata store for graph processing (like Hive warehouse) and a generic workflow framework (compatible to Oozie) is under construction which is running on top of HDFS and HBase in a hadoop cluster. I would like to see a common way to start these jobs based on HAMA and Giraph or simple MapReduce (for example for preprocessing graph data sets by analysing time series data in a comparable way, so it would be easy to have applications on a higher abstraction level which build workflows by combining different tasks using different computational concepts in one single Hadoop cluster.
        Hide
        Thomas Jungblut added a comment -

        There are some intra-method dependencies, but since they work via the messaging API's you basically just have to take care on the sequence of calling them.

        I will compare it with the "GiraphRunner" to make both runner tools comparable in usage.

        Well, have fun. But what is the reason to make it comparable to giraph?

        If you have problems or questions, let us know.

        Show
        Thomas Jungblut added a comment - There are some intra-method dependencies, but since they work via the messaging API's you basically just have to take care on the sequence of calling them. I will compare it with the "GiraphRunner" to make both runner tools comparable in usage. Well, have fun. But what is the reason to make it comparable to giraph? If you have problems or questions, let us know.
        Hide
        Mirko Kaempf added a comment -

        I will look into this class to do some refactoring.

        Are there some requirements to be aware of? If not I will just start to break it up into smaller peaces. I will compare it with the "GiraphRunner" to make both runner tools comparable in usage.

        Show
        Mirko Kaempf added a comment - I will look into this class to do some refactoring. Are there some requirements to be aware of? If not I will just start to break it up into smaller peaces. I will compare it with the "GiraphRunner" to make both runner tools comparable in usage.
        Hide
        Edward J. Yoon added a comment -

        Since I'm not good Java programmer, .. Someone can take this task? or comment your opinion?

        Show
        Edward J. Yoon added a comment - Since I'm not good Java programmer, .. Someone can take this task? or comment your opinion?

          People

          • Assignee:
            Thomas Jungblut
            Reporter:
            Edward J. Yoon
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development