Hama
  1. Hama
  2. HAMA-480

Try out different barrier implementations

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      We should have a look at different barrier implementations with Zookeeper.

      Have a look at the goldenorb stuff for example:

      https://github.com/raveldata/goldenorb/blob/master/src/main/java/org/goldenorb/zookeeper/OrbFastBarrier.java

      With out new synchronization service class this should be easily testable.

        Issue Links

          Activity

          Hide
          ChiaHung Lin added a comment -

          Can we "emulate" this tree structure by using znodes in different levels?
          For example if we can make a znode for a job and in the next level are the groom names and in the very next levels are the tasks
          <pre>
          root
          / job
          / groom_name_0
          / task0
          / task1
          / groom_name_1
          / task2
          / task3
          </pre>
          So we can simply sync on grooms in the tasks and they get notified once all grooms have sync'd theirselfs.
          It this the idea behind the tree based sync?

          There should have no problem to simulate the above tree structure, which is similar to our implementation with double barrier, in zookeeper. For tree based barrier sync, it divides processes into subgroup and then synchronize among each other. Taken into an example of 8 processes, ranging from p0 to p7. At first stage, p1 sends message to p0 for sync; p3 to p2; p5 to p4; p7 to p6. At the seconds stage, p6 sends message to p4; p2 to p0. At the third stage, p4 sends message to p0 for reaching the barrier and then reverses notifying for leaving the barrier.

          Show
          ChiaHung Lin added a comment - Can we "emulate" this tree structure by using znodes in different levels? For example if we can make a znode for a job and in the next level are the groom names and in the very next levels are the tasks <pre> root / job / groom_name_0 / task0 / task1 / groom_name_1 / task2 / task3 </pre> So we can simply sync on grooms in the tasks and they get notified once all grooms have sync'd theirselfs. It this the idea behind the tree based sync? There should have no problem to simulate the above tree structure, which is similar to our implementation with double barrier, in zookeeper. For tree based barrier sync, it divides processes into subgroup and then synchronize among each other. Taken into an example of 8 processes, ranging from p0 to p7. At first stage, p1 sends message to p0 for sync; p3 to p2; p5 to p4; p7 to p6. At the seconds stage, p6 sends message to p4; p2 to p0. At the third stage, p4 sends message to p0 for reaching the barrier and then reverses notifying for leaving the barrier.
          Hide
          wolfgang hoschek added a comment -

          Could also have a look at netflix's barrier implementation using zookeeper:

          https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/java/com/netflix/curator/framework/recipes/barriers/DistributedBarrier.java

          https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/java/com/netflix/curator/framework/recipes/barriers/DistributedDoubleBarrier.java

          It would be great if Hama could demonstrate a barrier impl that has low latency and scales to tens of thousands of peers. This would enable jobs with many short supersteps on large clusters.

          Show
          wolfgang hoschek added a comment - Could also have a look at netflix's barrier implementation using zookeeper: https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/java/com/netflix/curator/framework/recipes/barriers/DistributedBarrier.java https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/java/com/netflix/curator/framework/recipes/barriers/DistributedDoubleBarrier.java It would be great if Hama could demonstrate a barrier impl that has low latency and scales to tens of thousands of peers. This would enable jobs with many short supersteps on large clusters.
          Hide
          wolfgang hoschek added a comment -

          For example, a variety of scalable barrier implementation strategies (some of which may not applicable here) are referred to in "Related Work" and other sections of these papers:

          http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2011-hipc.pdf
          http://soft.vub.ac.be/~smarr/downloads/hpcc2010-marr-etal-insertion-tree-phasers.pdf

          Show
          wolfgang hoschek added a comment - For example, a variety of scalable barrier implementation strategies (some of which may not applicable here) are referred to in "Related Work" and other sections of these papers: http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2011-hipc.pdf http://soft.vub.ac.be/~smarr/downloads/hpcc2010-marr-etal-insertion-tree-phasers.pdf
          Hide
          Thomas Jungblut added a comment -

          Thanks for the papers wolfgang!

          We are currently having a double barrier implementation, looks quite like that of Curator. However we had some problems with zookeeper (and I think we still have) so the implementation is quite cluttered and can be improved.

          Show
          Thomas Jungblut added a comment - Thanks for the papers wolfgang! We are currently having a double barrier implementation, looks quite like that of Curator. However we had some problems with zookeeper (and I think we still have) so the implementation is quite cluttered and can be improved.

            People

            • Assignee:
              Unassigned
              Reporter:
              Thomas Jungblut
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development