On the Storm UI, we need an ability to have a Topology Submission Time, Topology Uptime as well as how many times a Topology worker process has restarted since last Submission.
The reason been, lets say we have a Supervisor with 8 GB RAM.
We also have 4 Slots on this Supervisor.
We submit 4 Topologies each with worker memory of 3 GB leading to a total of 12GB / 8 GB utilization assuming not all topologies would use up all the memory at the same time.
Now, we find that topologies are dying behind the scenes due to out of memory and Storm Nimbus keeps restarting these topologies again.
The uptime requests as part of STORM-2816 (https://issues.apache.org/jira/browse/STORM-2816) we can address the uptime but it still won't say we have a deeper issue and the topologies are restarting behind the scene. Adding this counter would help to flag issues.
The counts should be at both per topology level like
Submission Time T1
Restarts 4 (Possible log links to why restarted)
The other should be at the Storm UI level
Total Topologies : 20
Total Topologies Restart since Submission : 12 (Possible links to topologies that got restarted)
This way monitoring and alerting systems can hook into these counts and alert when things go wrong.