Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.5.0
    • Component/s: bsp core
    • Labels:
      None

      Description

      refactor bsp() in allowing checkpointed messages to be recovered.

      ChiaHung Lin had a fancy idea in chaining superstep class to make the whole recovering more convenient and less error prone, or at least possible.

      A user does not define a BSP anymore, instead he defines a single superstep inside of a computation class. A user is able to chain these in a specific ordering. After each of this computation the framework calls sync() and exchanges the messages.

      1. HAMA-503.patch
        12 kB
        Thomas Jungblut

        Activity

        Hide
        Thomas Jungblut added a comment -

        Two ideas for a computation "class".

        • Use methods and annotations like:

        @Superstep(1)
        @Superstep(2)

        It will execute in that order. Like JUnit we can also loop through the method names, but I don't think this is much more convenient.

        • Generic types for the messages

        The method which does execute the first superstep get's let's say <Integer, Double,?, ?> as input and ? as output, let's say String. (these should be Writable then..) -> <Integer, Double, String, String>

        The second superstep method has to get <String, String, Integer, Double> assuming that there is no other superstep defined.

        I currently don't know a way how to archieve this without defining several classes. But I will have a deeper look and hack a prototype over the next week.

        Show
        Thomas Jungblut added a comment - Two ideas for a computation "class". Use methods and annotations like: @Superstep(1) @Superstep(2) It will execute in that order. Like JUnit we can also loop through the method names, but I don't think this is much more convenient. Generic types for the messages The method which does execute the first superstep get's let's say <Integer, Double,?, ?> as input and ? as output, let's say String. (these should be Writable then..) -> <Integer, Double, String, String> The second superstep method has to get <String, String, Integer, Double> assuming that there is no other superstep defined. I currently don't know a way how to archieve this without defining several classes. But I will have a deeper look and hack a prototype over the next week.
        Hide
        ChiaHung Lin added a comment -

        For the first method we can have a class like Configurator so the unit of execution step can be composed through, for instance, configurator.add(superstep1).add(superstep2)...

        To reuse the unit within e.g. for loop, each superstep can be viewed as a command and a `For' class, which extends the command interface, can be used to collect units to be computed.

        For for = new For(conidtion);
        for.add(superstepN).add(superstepN1)...;
        configurator.add(superstep1).add(superstep2).add(for)...;

        The reason to have this rework conceived is because the framework needs a way to recover back to a working state. The original bsp() is more natural because users just write code, but this has an issue that it would increase the difficulty in recovery process. For example, the framework needs to understand/ parse the code within bsp() and add the checkpoint appropriately. Also, if something goes wrong, users might feel difficult to figure out from which problems stem because errors may happen in instrumentation code. However, there may have other better mechanisms, we can use it instead.

        Show
        ChiaHung Lin added a comment - For the first method we can have a class like Configurator so the unit of execution step can be composed through, for instance, configurator.add(superstep1).add(superstep2)... To reuse the unit within e.g. for loop, each superstep can be viewed as a command and a `For' class, which extends the command interface, can be used to collect units to be computed. For for = new For(conidtion); for.add(superstepN).add(superstepN1)...; configurator.add(superstep1).add(superstep2).add(for)...; The reason to have this rework conceived is because the framework needs a way to recover back to a working state. The original bsp() is more natural because users just write code, but this has an issue that it would increase the difficulty in recovery process. For example, the framework needs to understand/ parse the code within bsp() and add the checkpoint appropriately. Also, if something goes wrong, users might feel difficult to figure out from which problems stem because errors may happen in instrumentation code. However, there may have other better mechanisms, we can use it instead.
        Hide
        Thomas Jungblut added a comment -

        We can add varargs to the BSPJob like:

        job.setSupersteps(Superstep...)

        So people can add this like:

        job.setSupersteps(Superstep1.class,Superstep2.class, ...)

        The builder pattern is also cool.

        Show
        Thomas Jungblut added a comment - We can add varargs to the BSPJob like: job.setSupersteps(Superstep...) So people can add this like: job.setSupersteps(Superstep1.class,Superstep2.class, ...) The builder pattern is also cool.
        Hide
        ChiaHung Lin added a comment -

        +1 That is good. Interface looks simpler.

        We can add varargs to the BSPJob like:

        job.setSupersteps(Superstep...)

        Show
        ChiaHung Lin added a comment - +1 That is good. Interface looks simpler. We can add varargs to the BSPJob like: job.setSupersteps(Superstep...)
        Hide
        Thomas Jungblut added a comment -

        Hey Lin,

        I have made a bit of an "interface".

        For a superstep:
        https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/Superstep.java

        For the BSP that can handle faults:

        https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/FaultTolerantBSP.java

        The idea behind it is, that you init a task with a kind of start superstep. This is the index of the array of user defined supersteps.
        When fault happens, we inject the index where the superstep failed to the new task, so at runtime it will start computation from the given point.

        I have not really tried to make a real-world BSP example with it, so the Superstep class may not be a good interface.

        What do you think?

        Show
        Thomas Jungblut added a comment - Hey Lin, I have made a bit of an "interface". For a superstep: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/Superstep.java For the BSP that can handle faults: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/FaultTolerantBSP.java The idea behind it is, that you init a task with a kind of start superstep. This is the index of the array of user defined supersteps. When fault happens, we inject the index where the superstep failed to the new task, so at runtime it will start computation from the given point. I have not really tried to make a real-world BSP example with it, so the Superstep class may not be a good interface. What do you think?
        Hide
        Thomas Jungblut added a comment -

        Just translated PiEstimation example to the new prototype API.

        https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/PiEstimator.java

        It's a bit more code. But hey, it is fault tolerant.
        I think it is quite intuitive as well. It reminds me of writing Mappers and Reducers in Hadoop.

        Show
        Thomas Jungblut added a comment - Just translated PiEstimation example to the new prototype API. https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/PiEstimator.java It's a bit more code. But hey, it is fault tolerant. I think it is quite intuitive as well. It reminds me of writing Mappers and Reducers in Hadoop.
        Hide
        Edward J. Yoon added a comment -

        That's elegant!

        Show
        Edward J. Yoon added a comment - That's elegant!
        Hide
        Thomas Jungblut added a comment -

        I'm going to setup a patch tomorrow and then you can tell me what you finally think.

        Show
        Thomas Jungblut added a comment - I'm going to setup a patch tomorrow and then you can tell me what you finally think.
        Hide
        Thomas Jungblut added a comment -

        FaultTolerantBSP seems to be a really bad name. Do you have a better one?

        Another thing I dislike are the compiler warnings when using generics in the varargs. However this seems to be a minor thing.

        Please have a look at it.

        Show
        Thomas Jungblut added a comment - FaultTolerantBSP seems to be a really bad name. Do you have a better one? Another thing I dislike are the compiler warnings when using generics in the varargs. However this seems to be a minor thing. Please have a look at it.
        Hide
        ChiaHung Lin added a comment - - edited

        That looks good! The code is a bit verbose, but it seems unavoidable.

        One more issue, how if users want to reuse some supersteps?

        step1
        spte2
        for(...){
        step3
        ...
        step4
        }
        

        step5

        My original thought was to use e.g. command pattern. So when users want to reuse steps, object such as For can be applied to contain condition and several steps for reuse. Some other conditions may also need to take into account, so the procedure may be more flexible.

        Show
        ChiaHung Lin added a comment - - edited That looks good! The code is a bit verbose, but it seems unavoidable. One more issue, how if users want to reuse some supersteps? step1 spte2 for (...){ step3 ... step4 } step5 My original thought was to use e.g. command pattern. So when users want to reuse steps, object such as For can be applied to contain condition and several steps for reuse. Some other conditions may also need to take into account, so the procedure may be more flexible.
        Hide
        Thomas Jungblut added a comment -

        Thanks, btw I had a bit better name: "SuperstepBSP".

        My original thought was to use e.g. command pattern. So when users want to reuse steps, object such as For can be applied to contain condition and several steps for reuse. Some other conditions may also need to take into account, so the procedure may be more flexible.

        Sounds like a good improvement. But will take a bit of time though.
        Do you think we can add this later without blocking the whole FaultTolerance issues?

        Otherwise you can take over the issue and extend the patch with the command pattern.

        Show
        Thomas Jungblut added a comment - Thanks, btw I had a bit better name: "SuperstepBSP". My original thought was to use e.g. command pattern. So when users want to reuse steps, object such as For can be applied to contain condition and several steps for reuse. Some other conditions may also need to take into account, so the procedure may be more flexible. Sounds like a good improvement. But will take a bit of time though. Do you think we can add this later without blocking the whole FaultTolerance issues? Otherwise you can take over the issue and extend the patch with the command pattern.
        Hide
        ChiaHung Lin added a comment -

        Ideally ft feature should not be blocked by this issue as the system will set/get message to peer, which would not affect supersteps composition. We can rework for that issue later on.

        Show
        ChiaHung Lin added a comment - Ideally ft feature should not be blocked by this issue as the system will set/get message to peer, which would not affect supersteps composition. We can rework for that issue later on.
        Hide
        Thomas Jungblut added a comment -

        That is great, I guess we should get a first working snapshot of fault tolerance, we can then always improve.

        Show
        Thomas Jungblut added a comment - That is great, I guess we should get a first working snapshot of fault tolerance, we can then always improve.
        Hide
        Thomas Jungblut added a comment -

        oh btw this was the wrong patch^^

        Thanks Suraj who has seen this!

        Show
        Thomas Jungblut added a comment - oh btw this was the wrong patch^^ Thanks Suraj who has seen this!
        Hide
        ChiaHung Lin added a comment -

        Just come across to see that VertexInterface also contains compute function. Any chance that we can change method to other name?

        Show
        ChiaHung Lin added a comment - Just come across to see that VertexInterface also contains compute function. Any chance that we can change method to other name?
        Hide
        Thomas Jungblut added a comment -

        Propose one, but I don't see what the problem with computation is.

        Show
        Thomas Jungblut added a comment - Propose one, but I don't see what the problem with computation is.
        Hide
        ChiaHung Lin added a comment -

        That's a minor issue, not a big problem. Just thought that may avoid confusing users with the same function name.

        By the way, how can we implement k-means clusters with this new interface? k-means algorithm dynamically constructs supersteps

        ... bsp() ...{
          while(isConverged()){
            assignmentStep();
            updateStep();
          }// end while
        }// end bsp
        

        So it seems to me allowing user to dynamic specify supersteps is still required. With the current patch, users need to know how many supersteps are going to be processed before job is executed. Or is there alternative way to work around this issue?

        Show
        ChiaHung Lin added a comment - That's a minor issue, not a big problem. Just thought that may avoid confusing users with the same function name. By the way, how can we implement k-means clusters with this new interface? k-means algorithm dynamically constructs supersteps ... bsp() ...{ while (isConverged()){ assignmentStep(); updateStep(); } // end while } // end bsp So it seems to me allowing user to dynamic specify supersteps is still required. With the current patch, users need to know how many supersteps are going to be processed before job is executed. Or is there alternative way to work around this issue?
        Hide
        Thomas Jungblut added a comment -

        That's a minor issue, not a big problem. Just thought that may avoid confusing users with the same function name.

        If we find a better name, we can rename it. But since BSP is composed of computations and syncs, I guess this is a valid name.

        Glad you mention kmeans, I actually wanted to script it that way.
        The assignment step is its own superstep then comes the updateCenters superstep.
        In the assignment step you can never "escape" the while loop, instead in the updateCenter step you override the "haltComputation" method, that can look like this:

         @Override
            protected boolean haltComputation(
                BSPPeer<NullWritable, NullWritable, Text, DoubleWritable> peer) {
              return converged == 0 || iterations > maxIterations;
            }
        

        The big problem with kmeans is that there is a shared state between the supersteps (centers), but since the "part" centers are send between the supersteps as messages, a failed task can reconstruct the state from its messages. However a failed task cannot compare to the last state of the centers since this was stored in RAM, in this case we can assume that either they have converged (skip the step) or we compare to the input means from input split.

        In one of the cleanup methods you would write the assignments onto disk.

        Show
        Thomas Jungblut added a comment - That's a minor issue, not a big problem. Just thought that may avoid confusing users with the same function name. If we find a better name, we can rename it. But since BSP is composed of computations and syncs, I guess this is a valid name. Glad you mention kmeans, I actually wanted to script it that way. The assignment step is its own superstep then comes the updateCenters superstep. In the assignment step you can never "escape" the while loop, instead in the updateCenter step you override the "haltComputation" method, that can look like this: @Override protected boolean haltComputation( BSPPeer<NullWritable, NullWritable, Text, DoubleWritable> peer) { return converged == 0 || iterations > maxIterations; } The big problem with kmeans is that there is a shared state between the supersteps (centers), but since the "part" centers are send between the supersteps as messages, a failed task can reconstruct the state from its messages. However a failed task cannot compare to the last state of the centers since this was stored in RAM, in this case we can assume that either they have converged (skip the step) or we compare to the input means from input split. In one of the cleanup methods you would write the assignments onto disk.
        Hide
        Thomas Jungblut added a comment -

        Just committed it.

        Show
        Thomas Jungblut added a comment - Just committed it.
        Hide
        Hudson added a comment -

        Integrated in Hama-Nightly #496 (See https://builds.apache.org/job/Hama-Nightly/496/)
        HAMA-503: chainable computations (Revision 1304840)

        Result = SUCCESS
        tjungblut :
        Files :

        • /incubator/hama/trunk/CHANGES.txt
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/BSPJob.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/Superstep.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/SuperstepBSP.java
        • /incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java
        Show
        Hudson added a comment - Integrated in Hama-Nightly #496 (See https://builds.apache.org/job/Hama-Nightly/496/ ) HAMA-503 : chainable computations (Revision 1304840) Result = SUCCESS tjungblut : Files : /incubator/hama/trunk/CHANGES.txt /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/BSPJob.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/Superstep.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/bsp/SuperstepBSP.java /incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/SuperstepPiEstimator.java

          People

          • Assignee:
            Thomas Jungblut
            Reporter:
            Thomas Jungblut
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development