Hama
  1. Hama
  2. HAMA-363

Add network condition monitoring function to BSPMaster

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: bsp core
    • Labels:
      None

      Description

      There's few reason why this issue is important. Basically Master server should know the status of cluster:

      • to optimize the network usage
      • to handle network connectivity problems
      • to handle different network conditions

      And, I would like to see some network usage statistics.

      In this issue, we implement only basic function which is collect network usage.

      1. HAMA-363.patch
        93 kB
        ChiaHung Lin
      2. HAMA-363.patch
        93 kB
        ChiaHung Lin
      3. HAMA-363.patch
        4 kB
        ChiaHung Lin
      4. HAMA-363.patch
        88 kB
        ChiaHung Lin

        Activity

        Hide
        ChiaHung Lin added a comment -

        To my best of knowledge, Hadoop has metrics package which exposes runtime metrics for collecting data/ debugging. Applying Ganglia seems to be an alternative way for monitor network usage, connectivity, etc.

        Show
        ChiaHung Lin added a comment - To my best of knowledge, Hadoop has metrics package which exposes runtime metrics for collecting data/ debugging. Applying Ganglia seems to be an alternative way for monitor network usage, connectivity, etc.
        Hide
        Thomas Jungblut added a comment -

        As far as I know Hadoop only provides some JVM metrics and host metrices. I don't exactly find the correct source code position, but I think we should implement our own metrics package, which we can later add to ganglia. This is much more useful.

        We should define things we need to determine whether there are problems or not.
        Something like: "We ping every groom every 5 seconds and check the latency."
        This can be easily implemented in BSPMaster.

        To measure the IN and OUT rate or other fancy stuff we need something like heartbeat communication that will transfer the local groom data to the master.
        This should be in the newer versions of Hadoop >0.21 shouldn't it? Don't have the source codes haging around here.

        Show
        Thomas Jungblut added a comment - As far as I know Hadoop only provides some JVM metrics and host metrices. I don't exactly find the correct source code position, but I think we should implement our own metrics package, which we can later add to ganglia. This is much more useful. We should define things we need to determine whether there are problems or not. Something like: "We ping every groom every 5 seconds and check the latency." This can be easily implemented in BSPMaster. To measure the IN and OUT rate or other fancy stuff we need something like heartbeat communication that will transfer the local groom data to the master. This should be in the newer versions of Hadoop >0.21 shouldn't it? Don't have the source codes haging around here.
        Hide
        ChiaHung Lin added a comment -

        If I understand correctly, a new metrics system is proposed (HADOOP-6728) targeting at 0.23 that has been in trunk. We probably will base on this to develop our own package, and integrate with other mechanics such as heartbeat, etc., which is underway.

        Show
        ChiaHung Lin added a comment - If I understand correctly, a new metrics system is proposed ( HADOOP-6728 ) targeting at 0.23 that has been in trunk. We probably will base on this to develop our own package, and integrate with other mechanics such as heartbeat, etc., which is underway.
        Hide
        Edward J. Yoon added a comment -

        Let's schedule this to 0.4

        Show
        Edward J. Yoon added a comment - Let's schedule this to 0.4
        Hide
        ChiaHung Lin added a comment -

        The attached files contains metrics system, its concept is basically borrowed from hadoop's metrics2. The intension is to let Hama have ideas of current system related status.

        I agree what Thomas mentioned that Hama needs to have its own metrics package because IMO Hama and Hadoop's MapReduce are not completely the same. So the implementation does not just copy and paste metrics2 from Hadoop.

        In addition, the patch at the moment just pass unit test. I will find spare time to test it on e.g vm; and it is highly appreciated if someone can help review as well.

        Show
        ChiaHung Lin added a comment - The attached files contains metrics system, its concept is basically borrowed from hadoop's metrics2. The intension is to let Hama have ideas of current system related status. I agree what Thomas mentioned that Hama needs to have its own metrics package because IMO Hama and Hadoop's MapReduce are not completely the same. So the implementation does not just copy and paste metrics2 from Hadoop. In addition, the patch at the moment just pass unit test. I will find spare time to test it on e.g vm; and it is highly appreciated if someone can help review as well.
        Hide
        Edward J. Yoon added a comment -

        Looks great. Should we add this 0.3?

        Show
        Edward J. Yoon added a comment - Looks great. Should we add this 0.3?
        Hide
        ChiaHung Lin added a comment -

        Personally I am not rush to push this patch to 0.3 as this patch is a bit more related to fault tolerance; and it may need some improvement when usage cases increase.

        Show
        ChiaHung Lin added a comment - Personally I am not rush to push this patch to 0.3 as this patch is a bit more related to fault tolerance; and it may need some improvement when usage cases increase.
        Hide
        Edward J. Yoon added a comment -

        Okay~

        Show
        Edward J. Yoon added a comment - Okay~
        Hide
        Steve Loughran added a comment -

        I've been doing some monitoring of HDFS; it's easy to plug in custom stuff in front of the filesystem, but not so good in the internals.

        For HDFS, hama, etc, I'd like to be able to inject monitors into the code (i.e. a plugin point) once every network connection gets set up, to determine source and dest (hostname:port) values, plus #of bytes read/written. Just a thought. I wouldn't put Hama support for this on my priority need, but with a plugin point here the data can be collected and sampled or streamed to things for post-execution analysis.

        Show
        Steve Loughran added a comment - I've been doing some monitoring of HDFS; it's easy to plug in custom stuff in front of the filesystem, but not so good in the internals. For HDFS, hama, etc, I'd like to be able to inject monitors into the code (i.e. a plugin point) once every network connection gets set up, to determine source and dest (hostname:port) values, plus #of bytes read/written. Just a thought. I wouldn't put Hama support for this on my priority need, but with a plugin point here the data can be collected and sampled or streamed to things for post-execution analysis.
        Hide
        Steve Loughran added a comment -

        some more thoughts

        1. ganglia and collectl are the ways everyone monitors Hadoop, it would be simple and consistent for Hama to say "this is what we target"
        2. the monitoring v2 stuff in 0.23 is still unstable. At some point it might be nice to make re-usable, but it isn't there yet.
        Show
        Steve Loughran added a comment - some more thoughts ganglia and collectl are the ways everyone monitors Hadoop, it would be simple and consistent for Hama to say "this is what we target" the monitoring v2 stuff in 0.23 is still unstable. At some point it might be nice to make re-usable, but it isn't there yet.
        Hide
        ChiaHung Lin added a comment -

        Hi Steve,

        The way by plugin to receive internal state/ statistics can be done through implementing MetricsSink interface and register sink to the MetricsSystem. The metrics system will periodically harvests internal state from source, which implements MetricsSource, putting it to sink. For example, at the moment there is a JvmMetrics which records java vm related information; therefore, by registering a sink one can obtain jvm statistics periodically.

        This seems to me correspond to the behaviour in use case explained in your comment. For monitoring network connection, an instance of network MetricsSource can be added to the metrics system beforehand, and a sink can be implemented for receiving related statistics such as source and dest value, etc. But there may have something I miss or I am not aware of due to my limited knowledge. Any chance you can help point out if something is missing in the current system?

        Thank you for the feedback. It is very important to know actual user scenarios for improvement.

        Show
        ChiaHung Lin added a comment - Hi Steve, The way by plugin to receive internal state/ statistics can be done through implementing MetricsSink interface and register sink to the MetricsSystem. The metrics system will periodically harvests internal state from source, which implements MetricsSource, putting it to sink. For example, at the moment there is a JvmMetrics which records java vm related information; therefore, by registering a sink one can obtain jvm statistics periodically. This seems to me correspond to the behaviour in use case explained in your comment. For monitoring network connection, an instance of network MetricsSource can be added to the metrics system beforehand, and a sink can be implemented for receiving related statistics such as source and dest value, etc. But there may have something I miss or I am not aware of due to my limited knowledge. Any chance you can help point out if something is missing in the current system? Thank you for the feedback. It is very important to know actual user scenarios for improvement.
        Hide
        Edward J. Yoon added a comment -

        Is this patch ready to commit?

        Show
        Edward J. Yoon added a comment - Is this patch ready to commit?
        Hide
        ChiaHung Lin added a comment -

        My test on my current vms looks work.

        But after reading parts of the paper[1], I discover the current implementation lacks of features such as federation, data transport, etc. The federation can provide aggregative information so that e.g. BSPMaster can proceed further by deciding if a worker fails, which is required by monitor service. In addition, plugin part may probably need to be improved as well, though at the moment from the developer's viewpoint it looks ok.

        [1]. The ganglia distributed monitoring system: design, implementation, and experience. http://ganglia.info/papers/science.pdf

        Show
        ChiaHung Lin added a comment - My test on my current vms looks work. But after reading parts of the paper [1] , I discover the current implementation lacks of features such as federation, data transport, etc. The federation can provide aggregative information so that e.g. BSPMaster can proceed further by deciding if a worker fails, which is required by monitor service. In addition, plugin part may probably need to be improved as well, though at the moment from the developer's viewpoint it looks ok. [1] . The ganglia distributed monitoring system: design, implementation, and experience. http://ganglia.info/papers/science.pdf
        Hide
        Thomas Jungblut added a comment -

        I would love to see this in 0.4.0.

        Would you please update this patch to the current trunk?
        If you receive your commit rights, you can commit it for yourself.

        I'm +1 after reviewing it.

        And I'm wondering if we can implement this directly to our deamons.

        Show
        Thomas Jungblut added a comment - I would love to see this in 0.4.0. Would you please update this patch to the current trunk? If you receive your commit rights, you can commit it for yourself. I'm +1 after reviewing it. And I'm wondering if we can implement this directly to our deamons.
        Hide
        ChiaHung Lin added a comment -

        I will update a new patch to catch up the code in trunk.

        Show
        ChiaHung Lin added a comment - I will update a new patch to catch up the code in trunk.
        Hide
        Edward J. Yoon added a comment -

        +1

        Show
        Edward J. Yoon added a comment - +1
        Hide
        Thomas Jungblut added a comment -

        Hey ChiaHung, I know you are quite busy, but should I commit the patch?

        There seems to be no dependency to our current trumk.

        Show
        Thomas Jungblut added a comment - Hey ChiaHung, I know you are quite busy, but should I commit the patch? There seems to be no dependency to our current trumk.
        Hide
        ChiaHung Lin added a comment -

        If just regarding to patch, it looks ok to commit it. But some other enhancement are required such as federation so that in the future the system would have better idea about e.g groom servers status.

        Show
        ChiaHung Lin added a comment - If just regarding to patch, it looks ok to commit it. But some other enhancement are required such as federation so that in the future the system would have better idea about e.g groom servers status.
        Hide
        Edward J. Yoon added a comment -

        Please commit and close this issue.

        Let's try to release 0.4 before the end of this month.

        Show
        Edward J. Yoon added a comment - Please commit and close this issue. Let's try to release 0.4 before the end of this month.
        Hide
        ChiaHung Lin added a comment -

        I've committed the patch accommodating the current package structure.

        Show
        ChiaHung Lin added a comment - I've committed the patch accommodating the current package structure.
        Hide
        ChiaHung Lin added a comment -

        Change source tree for accommodating maven package structure. Also the diff is done under core package.

        Show
        ChiaHung Lin added a comment - Change source tree for accommodating maven package structure. Also the diff is done under core package.
        Hide
        Hudson added a comment -

        Integrated in Hama-Nightly #370 (See https://builds.apache.org/job/Hama-Nightly/370/)
        HAMA-363 Add network condition monitoring function to BSPMaster

        chl501 :
        Files :

        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/AbstractPatternFilter.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/DefaultMetricsSystem.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/GlobalFilter.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/JvmMetrics.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/Metric.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsConfig.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsFactory.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsFilter.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsInfo.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsRecord.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSink.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSinkAdaptor.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSource.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSourceAdaptor.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSystem.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsTag.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/Pair.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/RegexFilter.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/SystemMetrics.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/SystemMonitorSink.java
        • /incubator/hama/trunk/core/src/main/java/org/apache/hama/util/ReflectionUtils.java
        • /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics
        • /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestMetricsConfig.java
        • /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestMetricsSystem.java
        • /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestPatternFilter.java
        • /incubator/hama/trunk/core/src/test/org
        • /incubator/hama/trunk/core/src/test/org/apache
        • /incubator/hama/trunk/core/src/test/resources/hama-metrics-msys.properties
        • /incubator/hama/trunk/core/src/test/resources/hama-metrics-test-config.properties
        Show
        Hudson added a comment - Integrated in Hama-Nightly #370 (See https://builds.apache.org/job/Hama-Nightly/370/ ) HAMA-363 Add network condition monitoring function to BSPMaster chl501 : Files : /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/AbstractPatternFilter.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/DefaultMetricsSystem.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/GlobalFilter.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/JvmMetrics.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/Metric.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsConfig.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsFactory.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsFilter.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsInfo.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsRecord.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSink.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSinkAdaptor.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSource.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSourceAdaptor.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsSystem.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/MetricsTag.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/Pair.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/RegexFilter.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/SystemMetrics.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/metrics/SystemMonitorSink.java /incubator/hama/trunk/core/src/main/java/org/apache/hama/util/ReflectionUtils.java /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestMetricsConfig.java /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestMetricsSystem.java /incubator/hama/trunk/core/src/test/java/org/apache/hama/metrics/TestPatternFilter.java /incubator/hama/trunk/core/src/test/org /incubator/hama/trunk/core/src/test/org/apache /incubator/hama/trunk/core/src/test/resources/hama-metrics-msys.properties /incubator/hama/trunk/core/src/test/resources/hama-metrics-test-config.properties

          People

          • Assignee:
            ChiaHung Lin
            Reporter:
            Edward J. Yoon
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development