Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14305

Improve tooling for OPS and Support personnel



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • scripts and tools
    • None


      Umbrella issue for tasks which will improve the lives of Operations and Support personell running large Solr clusters. The following description snippet is copy/paste from a comment by Shalin on another issue:

      There's plenty of information that is required for troubleshooting but is not available in clusterstatus or any other documented/public API. Sure there's the undocumented /admin/zookeeper which has a weird output format meant for I don't know who. But even that does not have a few things that I've found necessary to troubleshoot Solr.

      Here's a non-exhaustive list of things you need to troubleshoot Solr:

      1. Length of overseer queues (available in overseerstatus API)
      2. Contents of overseer queue (mildly useful, available in /admin/zookeeper)
      3. Overseer election queue and current leader (former is available in /admin/zookeeper and latter in overseer status)
      4. Cluster state (cluster status API)
      5. Solr.xml (no API regardless of whether it is in ZK or filesystem)
      6. Leader election queue and current leader for each shard (available in /admin/zookeeper)
      7. Shard terms for each shard/replica (not available in any API)
      8. Metrics/stats (metrics API)
      9. Solr Logs (log API? unless it is rolled over)
      10. GC logs (no API)

      Please link related tasks or create new sub tasks as necessary.

      Fixing SOLR-7796 would probably help a lot in the short term since there would be a well defined way to zip up info and send to support. But it won't hurt adding better APIs, small tools and new AdminUI panels for simplified live troubleshooting as well.



        Issue Links



              Unassigned Unassigned
              janhoy Jan Høydahl
              0 Vote for this issue
              1 Start watching this issue