Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15161 Umbrella: Miscellaneous improvements from production usage
  3. HBASE-15376

ScanNext metric is size-based while every other per-operation metric is time based

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Removed ScanNext histogram metrics as regionserver level and per-region level metrics since the semantics is not compatible with other similar metrics (size histogram vs latency histogram).

      Instead, this patch adds ScanTime and ScanSize histogram metrics at the regionserver and per-region level.
      Show
      Removed ScanNext histogram metrics as regionserver level and per-region level metrics since the semantics is not compatible with other similar metrics (size histogram vs latency histogram). Instead, this patch adds ScanTime and ScanSize histogram metrics at the regionserver and per-region level.

      Description

      We have per-operation metrics for Get, Mutate, Delete, Increment, and ScanNext.

      The metrics are emitted like:

         "Get_num_ops" : 4837505,
          "Get_min" : 0,
          "Get_max" : 296,
          "Get_mean" : 0.2934618155433431,
          "Get_median" : 0.0,
          "Get_75th_percentile" : 0.0,
          "Get_95th_percentile" : 1.0,
          "Get_99th_percentile" : 1.0,
      ...
          "ScanNext_num_ops" : 194705,
          "ScanNext_min" : 0,
          "ScanNext_max" : 18441,
          "ScanNext_mean" : 7468.274651395701,
          "ScanNext_median" : 583.0,
          "ScanNext_75th_percentile" : 583.0,
          "ScanNext_95th_percentile" : 13481.0,
          "ScanNext_99th_percentile" : 13481.0,
      

      The problem is that all of Get,Mutate,Delete,Increment,Append,Replay are time based tracking how long the operation ran, while ScanNext is tracking returned response sizes (returned cell-sizes to be exact). Obviously, this is very confusing and you would only know this subtlety if you read the metrics collection code.

      Not sure how useful is the ScanNext metric as it is today. We can deprecate it, and introduce a time based one to keep track of scan request latencies.

      ps. Shamelessly using the parent jira (since these seem relavant).

        Attachments

        1. HBASE-15376.patch
          12 kB
          Heng Chen
        2. HBASE-15376_v3.patch
          15 kB
          Heng Chen
        3. HBASE-15376_v1.patch
          15 kB
          Heng Chen

          Activity

            People

            • Assignee:
              chenheng Heng Chen
              Reporter:
              enis Enis Soztutar
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: