HBase
  1. HBase
  2. HBASE-625

Metrics support for cluster load history: emissions and graphs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None

      Description

      hbase should write loadings on a period in a format that is amenable to tools like ganglia (rrd). Master can dump cluster loadings and averages. Regionservers would report their own loadings. Should exploit the work up in hadoop for doing this kinda thing (GangliaContext) where it makes sense. Extra browning points if user can optionally enable display of graphs in the hbase UI (JRobin).

      1. metrics.patch
        7 kB
        stack
      2. metrics-2.patch
        8 kB
        stack
      3. metrics-3.patch
        23 kB
        stack

        Issue Links

          Activity

          stack created issue -
          Hide
          stack added a comment -

          One thought is that we could put loadings and averages into an hbase table. There'd be a row per regionserver.

          Show
          stack added a comment - One thought is that we could put loadings and averages into an hbase table. There'd be a row per regionserver.
          Hide
          Billy Pearson added a comment -

          Now that we have a ttl option on the tables this would not be a bad idea as the users could modify the ttl to store the amount of history needed +1 on idea of a stats table idea.
          also with it in a table all clients could query it from thrift, rest, or java api

          Show
          Billy Pearson added a comment - Now that we have a ttl option on the tables this would not be a bad idea as the users could modify the ttl to store the amount of history needed +1 on idea of a stats table idea. also with it in a table all clients could query it from thrift, rest, or java api
          Hide
          Andrew Purtell added a comment -

          Just use the gmond xml-over-tcp protocol directly to report metrics to Ganglia?
          Also core is getting something working for 0.19.0: http://wiki.apache.org/hadoop/GangliaMetrics

          Show
          Andrew Purtell added a comment - Just use the gmond xml-over-tcp protocol directly to report metrics to Ganglia? Also core is getting something working for 0.19.0: http://wiki.apache.org/hadoop/GangliaMetrics
          Andrew Purtell made changes -
          Field Original Value New Value
          Link This issue incorporates HBASE-658 [ HBASE-658 ]
          Andrew Purtell made changes -
          Summary cluster load history: emissions and graphs Ganglia support for cluster load history: emissions and graphs
          Hide
          stack added a comment -

          Lets try and go the route of hadoop-metrics. Easy to add our own. Has support for file and ganglia. Out of box does dfs, jvm and job metrics. Easy to configure. I can make file work but not ganglia even w/ HADOOP-3422 applied. Will be back.

          Show
          stack added a comment - Lets try and go the route of hadoop-metrics. Easy to add our own. Has support for file and ganglia. Out of box does dfs, jvm and job metrics. Easy to configure. I can make file work but not ganglia even w/ HADOOP-3422 applied. Will be back.
          stack made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Assignee stack [ stack ]
          Hide
          Andrew Purtell added a comment -

          Renamed issue to reflect shift of focus.

          Show
          Andrew Purtell added a comment - Renamed issue to reflect shift of focus.
          Andrew Purtell made changes -
          Summary Ganglia support for cluster load history: emissions and graphs Metrics support for cluster load history: emissions and graphs
          stack made changes -
          Comment [ See http://hadoop.apache.org/core/docs/r0.18.1/api/index.html for howto on metrics. ]
          Hide
          stack added a comment -

          Here's a basic patch. Needs more stats added. Just does regionserver at moment. Also need to make sure that when master and regionserver on same machine they don't clash. Also adds jvm monitoring so can see jvm stats up in ganglia too.

          See http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/metrics/package-summary.html#package_description for howto on metrics.

          Show
          stack added a comment - Here's a basic patch. Needs more stats added. Just does regionserver at moment. Also need to make sure that when master and regionserver on same machine they don't clash. Also adds jvm monitoring so can see jvm stats up in ganglia too. See http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/metrics/package-summary.html#package_description for howto on metrics.
          stack made changes -
          Attachment metrics.patch [ 12391867 ]
          Hide
          stack added a comment -

          Here is sample using File context:

          ...
          hbase.regionserver: hostName=durruti.local, regions=2, requests=0
          hbase.regionserver: hostName=durruti.local, regions=2, requests=0
          hbase.regionserver: hostName=durruti.local, regions=2, requests=0
          hbase.regionserver: hostName=durruti.local, regions=2, requests=0
          ..
          
          Show
          stack added a comment - Here is sample using File context: ... hbase.regionserver: hostName=durruti.local, regions=2, requests=0 hbase.regionserver: hostName=durruti.local, regions=2, requests=0 hbase.regionserver: hostName=durruti.local, regions=2, requests=0 hbase.regionserver: hostName=durruti.local, regions=2, requests=0 ..
          Hide
          stack added a comment -

          If I enable rpc metrics, it shows for each rpc invocation – openscanner, get, commit, etc. – the average time and the number of operations.

          Show
          stack added a comment - If I enable rpc metrics, it shows for each rpc invocation – openscanner, get, commit, etc. – the average time and the number of operations.
          Hide
          stack added a comment -

          Here is v2 of patch. Need to wire up count of store files and size of memcaches. Also need to see what happens in ganglia when more than one jvm is sending metrics (looks like they are aggregated).

          Show
          stack added a comment - Here is v2 of patch. Need to wire up count of store files and size of memcaches. Also need to see what happens in ganglia when more than one jvm is sending metrics (looks like they are aggregated).
          stack made changes -
          Attachment metrics-2.patch [ 12392334 ]
          Hide
          stack added a comment -

          Need to also add master stats and as is, requests are being recorded wrong. Fix.

          Show
          stack added a comment - Need to also add master stats and as is, requests are being recorded wrong. Fix.
          Hide
          stack added a comment -

          This patch adds master metrics and adds to the regionserver UI publishing of regionserver metrics.

          I need to test more.

          Also want to think if a better way of getting store level metrics than getting synchronization lock on the online regions and then iterating over all stores and per store over synchronizing on the regions store Collection.

          Show
          stack added a comment - This patch adds master metrics and adds to the regionserver UI publishing of regionserver metrics. I need to test more. Also want to think if a better way of getting store level metrics than getting synchronization lock on the online regions and then iterating over all stores and per store over synchronizing on the regions store Collection.
          stack made changes -
          Attachment metrics-3.patch [ 12392398 ]
          Hide
          stack added a comment -

          Whats here is good enough to commit for now. Lets get it into 0.19.0. Hadoop work still needed but thats elsewhere.

          Show
          stack added a comment - Whats here is good enough to commit for now. Lets get it into 0.19.0. Hadoop work still needed but thats elsewhere.
          stack made changes -
          Fix Version/s 0.19.0 [ 12313364 ]
          Hide
          stack added a comment -

          Committed. Added a documentation page under src/docs too.

          Show
          stack added a comment - Committed. Added a documentation page under src/docs too.
          stack made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          stack made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          169d 5h 57m 1 stack 29/Oct/08 22:30
          Resolved Resolved Closed Closed
          318d 23h 56m 1 stack 13/Sep/09 23:26

            People

            • Assignee:
              stack
              Reporter:
              stack
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development