Details
-
Task
-
Status: Resolved
-
Critical
-
Resolution: Done
-
None
-
Storage R11 Sprint 40, Storage: RI-12 Sprint 43
-
1
Description
Currently SLRP provides per-CSI-call metrics, e.g.:
resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs/csi.v0.controller.CreateVolume/successes resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs/csi.v0.node.NodeGetId/errors
If we are to continue to provide such fine-grained metrics, when operators upgrade their CSI plugins to CSI v1, then SLRP would report another set of metrics for v1, which would be inconvenient to operators.
Also the fine-grained metrics are not very useful for operators, as most information are highly correlated to per-operation metrics. So most likely operators would simply aggregate the per-CSI-call metrics for monitoring CSI plugins, and use per-operation metrics to monitor volume creation/destroy/etc.
So instead of provide such fine-grained metrics, we could just provide a set of aggregated rpc metrics that are agnostic to CSI versions, such as:
resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs_pending resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs_finished resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs_failed resource_providers/<rp_type>.<rp_name>/csi_plugin/rpcs_cancelled