Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
I needed to analyze snapshots to determine which application caused a massive snapshot size increase by recursively checking child node count and data size for nodes, but could not find a tool for the job. Loading the snapshots one by one and using a ZooKeeper client proved too slow and SnapshotFormatter was very fast but processing the output to get the relevant data for my usecase proved more work than writing a tool that has the output I need. So I wrote SnapshotSumFormatter based on SnapshotFormatter:
USAGE: SnapshotSumFormatter snapshot_file starting_node max_depth
The tool recursively travels the child nodes under "starting_node" and collects both node count and summarizes the data stored in every node under the current one. This helps to identify problematic jobs/applications that either store too much data or does not properly clean up. "max_depth" defines the depth where the tool still writes to the output. 0 means there is no depth limit, every node's stats will be displayed, 1 means it will only contain the starting node's and it's children's stats, 2 ads another level and so on. This ONLY affects the level of details displayed, NOT the calculation.
An example output looks like this (with "SnapshotSumFormatter <snapshot_file> / 2"):
/ children: 1250511 data: 1952186580 -- /zookeeper -- children: 1 -- data: 0 -- /solr -- children: 1773 -- data: 8419162 ---- /solr/configs ---- children: 1640 ---- data: 8407643 ---- /solr/overseer ---- children: 6 ---- data: 0 ---- /solr/live_nodes ---- children: 3 ---- data: 0
I think this might prove useful for others too and would like to share it.
Attachments
Issue Links
- links to