Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Currently, the setting for the --tablet_history_max_age_sec is set to quite arbitrary value of 60 * 60 * 24 * 7 seconds (7 days). Keeping a lot of data in UNDO deltas for longer than necessary means using IO throughput, CPU cycles, and memory during various types of background maintenance jobs to process data which no longer needed. However, as of Kudu 1.16.0 version, there isn't a simple way to tell whether the current setting of --tablet_history_max_age_sec is appropriate for the workload running on a Kudu cluster. An operator interested in optimizing the amount of tablet history stored has no visibility on what might be the optimal value for the --tablet_history_max_age_sec based on the workloads run against the cluster.
It would be great to add a per-tablet metric (a histogram?) to accumulate stats on the difference of snapshots used for scan operations in READ_AT_SNAPSHOT and READ_YOUR_WRITES mode vs current timestamp.