Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Detailed information on what HBase is doing in terms of reads is hard to come by.

      What would be useful is to have a periodic StoreFile query report. Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files.

      This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile.

      Even this level of summary would be useful to detect a which tables & CFs are being accessed the most, and including the StoreFile would provide insight into relative "uncompaction" (i.e., lots of StoreFiles).

      I think the log-output, as opposed to UI, is an important facet with this. I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data). Just getting this to log-file would be a big improvement.

      Will this have a non-zero performance impact? Yes. Hopefully small, but yes it will. However, flying a plane without any instrumentation isn't fun.

        Issue Links

          Activity

          Hide
          Doug Meil added a comment -

          I think that instrumenting StoreFileScanner by gathering time spent for all the 'next' and 'seek' calls would do it. And then on 'close' it would publish the detailed record to some internal service that would gather up all the these detail records and then periodically dump the summary.

          I'm doing some hand-waving here because we don't want to introduce concurrency issues in the publishing process (e.g., publishing to something that is synchronized will effectively single-thread StoreFileScanners which would be a non-starter), but based on my understanding of the code it seems like this would be a fairly targeted change.

          Thoughts?

          Show
          Doug Meil added a comment - I think that instrumenting StoreFileScanner by gathering time spent for all the 'next' and 'seek' calls would do it. And then on 'close' it would publish the detailed record to some internal service that would gather up all the these detail records and then periodically dump the summary. I'm doing some hand-waving here because we don't want to introduce concurrency issues in the publishing process (e.g., publishing to something that is synchronized will effectively single-thread StoreFileScanners which would be a non-starter), but based on my understanding of the code it seems like this would be a fairly targeted change. Thoughts?
          Hide
          Doug Meil added a comment -

          Adding document for requirements and 1st pass general design.

          Show
          Doug Meil added a comment - Adding document for requirements and 1st pass general design.
          Hide
          Todd Lipcon added a comment -

          Maybe I'm wandering out of scope, but I've been thinking a bit about statistics and monitoring as well recently, and I think it might integrate well with this work.

          The idea is to define various probe points (to use the dtrace terminology) throughout the code. Each probe point would have a name and some predefined set of arguments. For example, in the HFile code you might have:

          HFile() {
            this.readTrace = Tracer.get("hfile.read.complete");
          }
          
          read() {
          ...
          if (readTrace != null && readTrace.isEnabled()) {
            readTrace.trace(millisSpent, thisHFilePath, blockIdx, ...);
          }
          

          then different things interested in this tracing data can subscribe to the trace point – in this case in order to collect aggregate statistics for each 30 second period, though other applications would be useful as well. (eg dynamically attach a listener to sample some percentage of requests)

          Advantage of the above design is that it's flexible, and if off-by-default should have no performance impact since it will be basically jitted away

          Show
          Todd Lipcon added a comment - Maybe I'm wandering out of scope, but I've been thinking a bit about statistics and monitoring as well recently, and I think it might integrate well with this work. The idea is to define various probe points (to use the dtrace terminology) throughout the code. Each probe point would have a name and some predefined set of arguments. For example, in the HFile code you might have: HFile() { this .readTrace = Tracer.get( "hfile.read.complete" ); } read() { ... if (readTrace != null && readTrace.isEnabled()) { readTrace.trace(millisSpent, thisHFilePath, blockIdx, ...); } then different things interested in this tracing data can subscribe to the trace point – in this case in order to collect aggregate statistics for each 30 second period, though other applications would be useful as well. (eg dynamically attach a listener to sample some percentage of requests) Advantage of the above design is that it's flexible, and if off-by-default should have no performance impact since it will be basically jitted away
          Hide
          Doug Meil added a comment -

          Thanks Todd. I don't think it's that is out of scope. If something like this could serve as the general purpose framework to gather log details, then that sounds like a good idea. I'd like to see some examples that would satisfy the use cases in my writeup and any other common performance issues that anybody else has.

          One expectation I would have is that this would be on for development/production clusters - the next question is what level of summarization is done out of the box in HBase, and (here we go again) where the output goes.

          Show
          Doug Meil added a comment - Thanks Todd. I don't think it's that is out of scope. If something like this could serve as the general purpose framework to gather log details, then that sounds like a good idea. I'd like to see some examples that would satisfy the use cases in my writeup and any other common performance issues that anybody else has. One expectation I would have is that this would be on for development/production clusters - the next question is what level of summarization is done out of the box in HBase, and (here we go again) where the output goes.
          Hide
          Nicolas Spiegelberg added a comment -

          @Doug: what is your goal for this JIRA? Collecting stats on StoreFile usage is really good from a core developer perspective, but it sounds like you want better DBA tools. For example:

          1) Get sampling. You just want a way to log every 1k database commands and have some collector that displays high level information on get vs put rate, with basic filtering capabilities.
          2) Note that we're developing a version of "show processlist" for HBase that might also provide the visibility you want (HBASE-4057).
          3) Another option is exporting per-CF metrics in addition to our existing per-server metrics. We have this sorta hacked up for 89 and could give you the diffs if you want to finish it off for 92.

          Show
          Nicolas Spiegelberg added a comment - @Doug: what is your goal for this JIRA? Collecting stats on StoreFile usage is really good from a core developer perspective, but it sounds like you want better DBA tools. For example: 1) Get sampling. You just want a way to log every 1k database commands and have some collector that displays high level information on get vs put rate, with basic filtering capabilities. 2) Note that we're developing a version of "show processlist" for HBase that might also provide the visibility you want ( HBASE-4057 ). 3) Another option is exporting per-CF metrics in addition to our existing per-server metrics. We have this sorta hacked up for 89 and could give you the diffs if you want to finish it off for 92.
          Hide
          Doug Meil added a comment -

          Hi Nicolas, I updated another version of this "...storefilereport_2011_08_10.pdf" with a "Why" section that has a little more background on scenarios.

          In a word: Statspack. Say what one might about Oracle, that report in was the intersection of DBA and developer reporting. So the answer to the question "developer" vs. "DBA" is "both."

          re: HBASE-4057, is "show processlist" a point-in-time command? It seems to be. That's certainly useful, but it's a little different than the view that statspack provides, which does the summary for you.

          Show
          Doug Meil added a comment - Hi Nicolas, I updated another version of this "...storefilereport_2011_08_10.pdf" with a "Why" section that has a little more background on scenarios. In a word: Statspack. Say what one might about Oracle, that report in was the intersection of DBA and developer reporting. So the answer to the question "developer" vs. "DBA" is "both." re: HBASE-4057 , is "show processlist" a point-in-time command? It seems to be. That's certainly useful, but it's a little different than the view that statspack provides, which does the summary for you.
          Hide
          Doug Meil added a comment -

          Just want to point this out again, "StoreFile" reporting implies that the full path is available (table, region, cf, storefile) not just the StoreFile itself. That is pointed out in the doc, but I want to point that out again.

          With respect to per-CF metrics, that is better than the current situation of only RS level, but I still don't think that covers what I'm proposing there isn't a sense of time-cost with the current metrics, nor does it split out system vs. app usage.

          Show
          Doug Meil added a comment - Just want to point this out again, "StoreFile" reporting implies that the full path is available (table, region, cf, storefile) not just the StoreFile itself. That is pointed out in the doc, but I want to point that out again. With respect to per-CF metrics, that is better than the current situation of only RS level, but I still don't think that covers what I'm proposing there isn't a sense of time-cost with the current metrics, nor does it split out system vs. app usage.
          Hide
          Doug Meil added a comment -

          For those not familiar with Statspack, see this: http://www.akadia.com/services/ora_statspack_survival_guide.html

          re: "STATSPACK is a diagnosis tool for instance-wide performance problems; it also supports application tuning activities by providing data which identifies high-load SQL statements. STATSPACK can be used both proactively to monitor the changing load on a system, and also reactively to investigate a performance problem."
          overview..

          re: "The STATSPACK reports we like are from 1 5-minute intervals during a busy or peak time, when the performance is at its worst."
          That's exactly what I'm talking about... small intervals. Not too small, but but not too big.

          re: "Another common mistake with STATSPACK is to gather snapshots only when there is a problem."
          That's why this type of reporting should pretty much be 'always on' - you need to be able to compare to other points in time.

          Again, some things don't translate 1:1 from the RDBMS world, but a lot does.

          Show
          Doug Meil added a comment - For those not familiar with Statspack, see this: http://www.akadia.com/services/ora_statspack_survival_guide.html re: "STATSPACK is a diagnosis tool for instance-wide performance problems; it also supports application tuning activities by providing data which identifies high-load SQL statements. STATSPACK can be used both proactively to monitor the changing load on a system, and also reactively to investigate a performance problem." overview.. re: "The STATSPACK reports we like are from 1 5-minute intervals during a busy or peak time, when the performance is at its worst." That's exactly what I'm talking about... small intervals. Not too small, but but not too big. re: "Another common mistake with STATSPACK is to gather snapshots only when there is a problem." That's why this type of reporting should pretty much be 'always on' - you need to be able to compare to other points in time. Again, some things don't translate 1:1 from the RDBMS world, but a lot does.
          Hide
          Gary Helmling added a comment -

          I don't want to hijack this issue, but if we're talking about broader tracing and monitoring support, I think another inspiration worth looking at is Dapper:
          http://research.google.com/pubs/pub36356.html

          Some of the situations we're currently trying to help application teams with are things like: call #1 took 10msec to process, call #2 took 300000msec to process... Why? We don't have a whole lot at the moment to help in answering this. Better load stats help see what's going on in the cluster from one direction. But it still requires a lot of inferring to see how it ties together from the client end.

          @Todd, I like the dtrace-like approach. I think we could start simply with something like this and spread and evolve it as we go. Possibly even growing it into distributed tracing. It seems like that's a broader need so maybe we should move that discussion into another issue.

          In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework. Not everything will fit with this – it's particularly not so great at dynamically named stats (like stats based on store filenames). But it does give us an existing framework for collecting, aggregating and reporting those stats, with a variety of tools that integrate nicely. Just writing stats to log files requires a whole lot more work to actually make the output useful. Ackowledging sometimes it may be the best/only option though (I'm currently patching up the RPC logging to make it a little more useful). I really need to look at metricsv2 and see how much more flexibility it gives us.

          @Doug, thanks for the STATSPACK link. I'll read up on that as well.

          Show
          Gary Helmling added a comment - I don't want to hijack this issue, but if we're talking about broader tracing and monitoring support, I think another inspiration worth looking at is Dapper: http://research.google.com/pubs/pub36356.html Some of the situations we're currently trying to help application teams with are things like: call #1 took 10msec to process, call #2 took 300000msec to process... Why? We don't have a whole lot at the moment to help in answering this. Better load stats help see what's going on in the cluster from one direction. But it still requires a lot of inferring to see how it ties together from the client end. @Todd, I like the dtrace-like approach. I think we could start simply with something like this and spread and evolve it as we go. Possibly even growing it into distributed tracing. It seems like that's a broader need so maybe we should move that discussion into another issue. In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework. Not everything will fit with this – it's particularly not so great at dynamically named stats (like stats based on store filenames). But it does give us an existing framework for collecting, aggregating and reporting those stats, with a variety of tools that integrate nicely. Just writing stats to log files requires a whole lot more work to actually make the output useful. Ackowledging sometimes it may be the best/only option though (I'm currently patching up the RPC logging to make it a little more useful). I really need to look at metricsv2 and see how much more flexibility it gives us. @Doug, thanks for the STATSPACK link. I'll read up on that as well.
          Hide
          stack added a comment -

          Some of the situations we're currently trying to help application teams with are things like: call #1 took 10msec to process, call #2 took 300000msec to process... Why?

          Me and J-D need to be able to answer that question too.

          In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework.

          Agree, as much as we can. We use tsdb here so if a metric, we already have ready means for viewing across time.

          Show
          stack added a comment - Some of the situations we're currently trying to help application teams with are things like: call #1 took 10msec to process, call #2 took 300000msec to process... Why? Me and J-D need to be able to answer that question too. In general, for additional stats that we export, I would like to gently encourage a hard-look at whether or not there's a way to incorporate them into the metrics framework. Agree, as much as we can. We use tsdb here so if a metric, we already have ready means for viewing across time.
          Hide
          Doug Meil added a comment -

          With user example #1 of my updated writeup, an example of that happened on the dist-list recently. He still had major compactions scheduled daily, and his cluster "got really slow" every day. If he had summarized 5 min slices differentiating system and user activity, he could tell which table/reg/cf/storefile was getting system (compaction) activity, etc.

          I doubt this answers your particular case, but I think this is a fairly common occurrence on with users.

          Another example is #3 example of the "degree of uncompaction" stats (e.g., how much time is being spent reading each StoreFile.) This kind of information would also help the "HBase as queue" issue that was on the dist-list recently. They weren't doing major compactions and apparently had a lot of StoreFiles.

          #2 is the "hybrid activity" use-case. e.g., MR job going on one table while there are also random reads on another table. At least seeing the activity during the reporting slice would let you know that something else is happening on the cluster and what is getting accessed.

          Show
          Doug Meil added a comment - With user example #1 of my updated writeup, an example of that happened on the dist-list recently. He still had major compactions scheduled daily, and his cluster "got really slow" every day. If he had summarized 5 min slices differentiating system and user activity, he could tell which table/reg/cf/storefile was getting system (compaction) activity, etc. I doubt this answers your particular case, but I think this is a fairly common occurrence on with users. Another example is #3 example of the "degree of uncompaction" stats (e.g., how much time is being spent reading each StoreFile.) This kind of information would also help the "HBase as queue" issue that was on the dist-list recently. They weren't doing major compactions and apparently had a lot of StoreFiles. #2 is the "hybrid activity" use-case. e.g., MR job going on one table while there are also random reads on another table. At least seeing the activity during the reporting slice would let you know that something else is happening on the cluster and what is getting accessed.
          Hide
          Otis Gospodnetic added a comment -

          +1 for Todd's tracing idea (is this already in a separate JIRA issue that I can't find?)
          +1 for what Gary said about using existing mechanisms for publishing metrics, so that those of us who already have tools to gather and aggregate data from there can just keep using those tools instead of developing new things that scrape metrics from additional places.

          Show
          Otis Gospodnetic added a comment - +1 for Todd's tracing idea (is this already in a separate JIRA issue that I can't find?) +1 for what Gary said about using existing mechanisms for publishing metrics, so that those of us who already have tools to gather and aggregate data from there can just keep using those tools instead of developing new things that scrape metrics from additional places.
          Hide
          stack added a comment -

          Upping priority and marking against 0.96.0 so it gets more consideration.

          Show
          stack added a comment - Upping priority and marking against 0.96.0 so it gets more consideration.
          Hide
          Lars Hofhansl added a comment -

          Should we output the statistics via JMX as well?

          Show
          Lars Hofhansl added a comment - Should we output the statistics via JMX as well?
          Hide
          Otis Gospodnetic added a comment -

          Yes, +1 for publishing in JMX. We need that for our SPM for HBase monitoring service, for example.

          Show
          Otis Gospodnetic added a comment - Yes, +1 for publishing in JMX. We need that for our SPM for HBase monitoring service, for example.
          Hide
          Ted Yu added a comment -

          Looking at Gary's comment from 10/Aug/11 20:50, can priority of this JIRA be lowered now that we have htrace in trunk ?

          Show
          Ted Yu added a comment - Looking at Gary's comment from 10/Aug/11 20:50, can priority of this JIRA be lowered now that we have htrace in trunk ?
          Hide
          stack added a comment -

          This turned into a (useful) discussion. Elliott Clark Can you take a look and note what in 0.96 metrics2 might help answering the questions Doug poses above? We also have a trace mechanism committed but again it would take some work to get it to the level Doug is asking for in the above.

          It would seem that this issue should become two issues now: one to improve the trace so can go down to the per-storefile level and another to add to metrics so can do at the storefile emissions (if possible).

          Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor.

          Show
          stack added a comment - This turned into a (useful) discussion. Elliott Clark Can you take a look and note what in 0.96 metrics2 might help answering the questions Doug poses above? We also have a trace mechanism committed but again it would take some work to get it to the level Doug is asking for in the above. It would seem that this issue should become two issues now: one to improve the trace so can go down to the per-storefile level and another to add to metrics so can do at the storefile emissions (if possible). Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor.
          Hide
          Andrew Purtell added a comment -

          Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor.

          I might be looking at this again in the future in the context of HBASE-6572. Deciding what stores to migrate.

          Show
          Andrew Purtell added a comment - Meantime, marking this as non-critical and moving out of 0.96 while it is w/o a sponsor. I might be looking at this again in the future in the context of HBASE-6572 . Deciding what stores to migrate.

            People

            • Assignee:
              Unassigned
              Reporter:
              Doug Meil
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development