Affects Version/s: None
Fix Version/s: None
Component/s: scripts and tools
Umbrella issue for tasks which will improve the lives of Operations and Support personell running large Solr clusters. The following description snippet is copy/paste from a comment by Shalin on another issue:
There's plenty of information that is required for troubleshooting but is not available in clusterstatus or any other documented/public API. Sure there's the undocumented /admin/zookeeper which has a weird output format meant for I don't know who. But even that does not have a few things that I've found necessary to troubleshoot Solr.
Here's a non-exhaustive list of things you need to troubleshoot Solr:
- Length of overseer queues (available in overseerstatus API)
- Contents of overseer queue (mildly useful, available in /admin/zookeeper)
- Overseer election queue and current leader (former is available in /admin/zookeeper and latter in overseer status)
- Cluster state (cluster status API)
- Solr.xml (no API regardless of whether it is in ZK or filesystem)
- Leader election queue and current leader for each shard (available in /admin/zookeeper)
- Shard terms for each shard/replica (not available in any API)
- Metrics/stats (metrics API)
- Solr Logs (log API? unless it is rolled over)
- GC logs (no API)
Please link related tasks or create new sub tasks as necessary.
Fixing SOLR-7796 would probably help a lot in the short term since there would be a well defined way to zip up info and send to support. But it won't hurt adding better APIs, small tools and new AdminUI panels for simplified live troubleshooting as well.