Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.12.0
-
None
-
None
Description
We sometimes find it useful to drill into what data directories are being used as a means to determining why flushes or compactions are slow. Currently the easiest way to discover this data is to run the kudu remote_replica list which shows the data directory paths, and to look through tserver logs for any associated slowness warnings.
We should bring put this information front and center in some tooling or web UI page. Off the top of my head, it'd be really nice to understand:
- How many tablets are have data in each data directory.
- How many data blocks are in each data directory.
- How full each data directory is.
- How many maintenance ops have recently writing into each data directory (and conversely, which data directories each maintenance op is writing into).
- Average write/read latency, realtime - usertime, etc. per byte written in each data directory.