We provide a coarse grained admin API and associated shell command for determining the compaction status of a table:
We also log compaction activity, including a compaction journal at completion, via log4j to whatever log aggregation solution is available in production.
This is not sufficient for online and interactive observation, debugging, or performance analysis of current compaction activity. In this kind of activity an operator is attempting to observe and analyze compaction activity in real time. Log aggregation and presentation solutions have typical latencies (end to end visibility of log lines on the order of ~minutes) which make that not possible today.
We don't offer any API or tools for directly interrogating split and merge activity in real time. Some indirect knowledge of split or merge activity can be inferred from RIT information via ClusterStatus. It can also be scraped, with some difficulty, from the debug servlet.
We should have new APIs and shell commands, and perhaps also new admin UI views, for
at regionserver scope:
- listing the current state of a regionserver's compaction, split, and merge tasks and threads
- counting (simple view) and listing (detailed view) a regionserver's compaction queues
- listing a region's currently compacting, splitting, or merging status
at master scope, aggregations of the above detailed information into:
- listing the active compaction tasks and threads for a given table, the extension of compaction_state with a new detailed view
- listing the active split or merge tasks and threads for a given table's regions
Compaction detail should include the names of the effective engine and policy classes, and the results and timestamp of the last compaction selection evaluation. Split and merge detail should include the names of the effective policy classes and the result of the last split or merge criteria evaluation.