Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15161 Umbrella: Miscellaneous improvements from production usage
  3. HBASE-15376

ScanNext metric is size-based while every other per-operation metric is time based

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0, 2.0.0
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      Removed ScanNext histogram metrics as regionserver level and per-region level metrics since the semantics is not compatible with other similar metrics (size histogram vs latency histogram).

      Instead, this patch adds ScanTime and ScanSize histogram metrics at the regionserver and per-region level.
      Show
      Removed ScanNext histogram metrics as regionserver level and per-region level metrics since the semantics is not compatible with other similar metrics (size histogram vs latency histogram). Instead, this patch adds ScanTime and ScanSize histogram metrics at the regionserver and per-region level.

    Description

      We have per-operation metrics for Get, Mutate, Delete, Increment, and ScanNext.

      The metrics are emitted like:

         "Get_num_ops" : 4837505,
          "Get_min" : 0,
          "Get_max" : 296,
          "Get_mean" : 0.2934618155433431,
          "Get_median" : 0.0,
          "Get_75th_percentile" : 0.0,
          "Get_95th_percentile" : 1.0,
          "Get_99th_percentile" : 1.0,
      ...
          "ScanNext_num_ops" : 194705,
          "ScanNext_min" : 0,
          "ScanNext_max" : 18441,
          "ScanNext_mean" : 7468.274651395701,
          "ScanNext_median" : 583.0,
          "ScanNext_75th_percentile" : 583.0,
          "ScanNext_95th_percentile" : 13481.0,
          "ScanNext_99th_percentile" : 13481.0,
      

      The problem is that all of Get,Mutate,Delete,Increment,Append,Replay are time based tracking how long the operation ran, while ScanNext is tracking returned response sizes (returned cell-sizes to be exact). Obviously, this is very confusing and you would only know this subtlety if you read the metrics collection code.

      Not sure how useful is the ScanNext metric as it is today. We can deprecate it, and introduce a time based one to keep track of scan request latencies.

      ps. Shamelessly using the parent jira (since these seem relavant).

      Attachments

        1. HBASE-15376.patch
          12 kB
          Heng Chen
        2. HBASE-15376_v3.patch
          15 kB
          Heng Chen
        3. HBASE-15376_v1.patch
          15 kB
          Heng Chen

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chenheng Heng Chen
            enis Enis Soztutar
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment