Details
Description
We have per-operation metrics for Get, Mutate, Delete, Increment, and ScanNext.
The metrics are emitted like:
"Get_num_ops" : 4837505, "Get_min" : 0, "Get_max" : 296, "Get_mean" : 0.2934618155433431, "Get_median" : 0.0, "Get_75th_percentile" : 0.0, "Get_95th_percentile" : 1.0, "Get_99th_percentile" : 1.0, ... "ScanNext_num_ops" : 194705, "ScanNext_min" : 0, "ScanNext_max" : 18441, "ScanNext_mean" : 7468.274651395701, "ScanNext_median" : 583.0, "ScanNext_75th_percentile" : 583.0, "ScanNext_95th_percentile" : 13481.0, "ScanNext_99th_percentile" : 13481.0,
The problem is that all of Get,Mutate,Delete,Increment,Append,Replay are time based tracking how long the operation ran, while ScanNext is tracking returned response sizes (returned cell-sizes to be exact). Obviously, this is very confusing and you would only know this subtlety if you read the metrics collection code.
Not sure how useful is the ScanNext metric as it is today. We can deprecate it, and introduce a time based one to keep track of scan request latencies.
ps. Shamelessly using the parent jira (since these seem relavant).