Hama
  1. Hama
  2. HAMA-519

[GSoC 2012] Add simple latency and bandwidth measuring tool

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: bsp core
    • Labels:

      Description

      It'd be nice if we have simple latency and bandwidth measuring tool. I think this can be added contrib.

        Issue Links

          Activity

          Hide
          Edward J. Yoon added a comment -
          Show
          Edward J. Yoon added a comment - Found some similar code at http://bsponmpi.sourceforge.net/bench/mybench.c
          Hide
          Apurv Verma added a comment -

          I am also interested in this. Doesn't seem much difficult. Does it have to have a GUI?

          Show
          Apurv Verma added a comment - I am also interested in this. Doesn't seem much difficult. Does it have to have a GUI?
          Hide
          Edward J. Yoon added a comment -

          GUI is great idea!

          Show
          Edward J. Yoon added a comment - GUI is great idea!
          Hide
          Thomas Jungblut added a comment -

          We have a webfrontend, so we could add it to that.

          Show
          Thomas Jungblut added a comment - We have a webfrontend, so we could add it to that.
          Hide
          Edward J. Yoon added a comment -

          Here's my quick and dirty example.

              private long measureLatency(BSPPeer peer) {
                long start = System.currentTimeMillis();
          
                if (peer.getPeerName().equals("node0")) {
                  peer.send("node1", new DoubleWritable(packetSize));
                }
                peer.sync();
          
                if (peer.getPeerName().equals("node1")) {
                  peer.send("node0", new DoubleWritable(packetSize));
                }
                peer.sync();
          
                long end = System.currentTimeMillis();
          
                return (end - start) / 2;
              }
              
              private long measureBandwidth(BSPPeer peer) {
          
                int loop = 1MB/packetSize;
                long start = System.currentTimeMillis();
          
                if (peer.getPeerName().equals("node0")) {
                  for(i=0; i<loop; i++)
                    peer.send("node1", packetSize);
                  }
                peer.sync();
          
                if (peer.getPeerName().equals("node1")) {
                  peer.send("node0", WORD);
                }
                peer.sync();
          
                long end = System.currentTimeMillis();
          
                return 1MB/(end-start-LATENCY);
              }
          
          Show
          Edward J. Yoon added a comment - Here's my quick and dirty example. private long measureLatency(BSPPeer peer) { long start = System .currentTimeMillis(); if (peer.getPeerName().equals( "node0" )) { peer.send( "node1" , new DoubleWritable(packetSize)); } peer.sync(); if (peer.getPeerName().equals( "node1" )) { peer.send( "node0" , new DoubleWritable(packetSize)); } peer.sync(); long end = System .currentTimeMillis(); return (end - start) / 2; } private long measureBandwidth(BSPPeer peer) { int loop = 1MB/packetSize; long start = System .currentTimeMillis(); if (peer.getPeerName().equals( "node0" )) { for (i=0; i<loop; i++) peer.send( "node1" , packetSize); } peer.sync(); if (peer.getPeerName().equals( "node1" )) { peer.send( "node0" , WORD); } peer.sync(); long end = System .currentTimeMillis(); return 1MB/(end-start-LATENCY); }
          Hide
          Edward J. Yoon added a comment - - edited

          The ultimate goal is understanding of BSP cost model and correlation with number of processors, number of exchanged messages.

          Show
          Edward J. Yoon added a comment - - edited The ultimate goal is understanding of BSP cost model and correlation with number of processors, number of exchanged messages.
          Hide
          Harshit Shrivastava added a comment -

          The project looks great. It would be great if you could please elaborate in detail as to what is expected in the project and what all features are must. It would be really very helpful in writing the implementation idea in the gsoc proposal.

          Thanks

          Show
          Harshit Shrivastava added a comment - The project looks great. It would be great if you could please elaborate in detail as to what is expected in the project and what all features are must. It would be really very helpful in writing the implementation idea in the gsoc proposal. Thanks
          Hide
          Thomas Jungblut added a comment -

          We want to use this in HAMA-543 as well. So it would be very cool if it can be made abstract to the largest degree possible.

          Show
          Thomas Jungblut added a comment - We want to use this in HAMA-543 as well. So it would be very cool if it can be made abstract to the largest degree possible.
          Hide
          Edward J. Yoon added a comment -

          Hello Harshit,

          1. First of all, understanding of BSP computing model is important.
          2. Then, you can implement the BSP message passing latency and bandwidth measurement tool. Update existing bench example or Create new one is your choice. Message size must be configurable so that user can measure latency for small messages or bandwidth for larger messages.
          3. and add some analyzer stuff if you want to generate meaningful reports.

          Show
          Edward J. Yoon added a comment - Hello Harshit, 1. First of all, understanding of BSP computing model is important. 2. Then, you can implement the BSP message passing latency and bandwidth measurement tool. Update existing bench example or Create new one is your choice. Message size must be configurable so that user can measure latency for small messages or bandwidth for larger messages. 3. and add some analyzer stuff if you want to generate meaningful reports.
          Hide
          Joris Geessels added a comment -

          Maybe I'm underestimating this, but to me this seems a fairly small task for a GSOC project?

          Show
          Joris Geessels added a comment - Maybe I'm underestimating this, but to me this seems a fairly small task for a GSOC project?
          Hide
          Edward J. Yoon added a comment -

          Joris,

          Yes, coding of measurement tool is simple. However, I hope you can suggest some additional ideas with this.

          For example, a Hama cluster may consist of heterogeneous network conditions. Considering these performance factors is most important for improve BSP computing efficiency. Now do you think the network status information can be used to determine the number of Task processors per node?

          Show
          Edward J. Yoon added a comment - Joris, Yes, coding of measurement tool is simple. However, I hope you can suggest some additional ideas with this. For example, a Hama cluster may consist of heterogeneous network conditions. Considering these performance factors is most important for improve BSP computing efficiency. Now do you think the network status information can be used to determine the number of Task processors per node?
          Hide
          Thomas Jungblut added a comment -

          Besides that I have never seen a task that is much more effort than 20h, we want to use the information for active scheduling. So it is not only about writing a BSP job that can be used to measure these stats. It would be cool if we can integrate this into our metrics system and display the result on our web frontend.

          Now do you think the network status information can be used to determine the number of Task processors per node?

          Tasks / node is much more sophisticated than just measuring bandwidth. Consider RAM limitations or CPU load as well as IOWait for IO heavy tasks.

          Show
          Thomas Jungblut added a comment - Besides that I have never seen a task that is much more effort than 20h, we want to use the information for active scheduling. So it is not only about writing a BSP job that can be used to measure these stats. It would be cool if we can integrate this into our metrics system and display the result on our web frontend. Now do you think the network status information can be used to determine the number of Task processors per node? Tasks / node is much more sophisticated than just measuring bandwidth. Consider RAM limitations or CPU load as well as IOWait for IO heavy tasks.

            People

            • Assignee:
              Apurv Verma
              Reporter:
              Edward J. Yoon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development