Hadoop Common
  1. Hadoop Common
  2. HADOOP-5051

hdfs throughput calculation is incorrect in chukwa database

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Redhat 5.1, Java 6

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      What is new in HADOOP-5051:

      - Added macro token subsitution for sum(table_name)
      - Added correct hdfs throughput aggregation SQL macros.
      Show
      What is new in HADOOP-5051 : - Added macro token subsitution for sum(table_name) - Added correct hdfs throughput aggregation SQL macros.

      Description

      The SQL statement to calculate hdfs throughput is incorrect. The correct algorithm is to calculate metrics rate for individual datanode then sum up of all datanode's rate to get the total throughput for the cluster.

        Activity

        Hide
        Eric Yang added a comment -
        • Added ability to expand sum(table) macro into SQL statement.
        • Change SQL statement to calculate rate for individual datanode, then aggregate the cluster throughput from datanode rates.
        Show
        Eric Yang added a comment - Added ability to expand sum(table) macro into SQL statement. Change SQL statement to calculate rate for individual datanode, then aggregate the cluster throughput from datanode rates.
        Hide
        Eric Yang added a comment -

        Hdfs throughput data calculation should process data for one extra time window in case the dfs datanode metrics arrives late.

        Show
        Eric Yang added a comment - Hdfs throughput data calculation should process data for one extra time window in case the dfs datanode metrics arrives late.
        Hide
        Eric Yang added a comment -

        Added extra time window to dfs throughput aggregation.

        Show
        Eric Yang added a comment - Added extra time window to dfs throughput aggregation.
        Hide
        Ari Rabkin added a comment -

        I don't feel qualified to review this; my SQL experience is quite limited.
        Is there a way to reduce the line lengths in aggregator.sql?

        Show
        Ari Rabkin added a comment - I don't feel qualified to review this; my SQL experience is quite limited. Is there a way to reduce the line lengths in aggregator.sql?
        Hide
        Eric Yang added a comment -

        Every iine in the file is a SQL query. There is no way to reduce the line length for now.

        Show
        Eric Yang added a comment - Every iine in the file is a SQL query. There is no way to reduce the line length for now.
        Hide
        Jerome Boulon added a comment -

        +1
        If the goal is to extend the macro language it may be good to look at a template engine like Velocity for example.

        Show
        Jerome Boulon added a comment - +1 If the goal is to extend the macro language it may be good to look at a template engine like Velocity for example.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/ )

          People

          • Assignee:
            Eric Yang
            Reporter:
            Eric Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development