Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2817

Topology Restart Counts are not maintained in Storm UI

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.2
    • Fix Version/s: None
    • Component/s: storm-ui
    • Labels:
    • Environment:
      CentOS7, Docker

      Description

      On the Storm UI, we need an ability to have a Topology Submission Time, Topology Uptime as well as how many times a Topology worker process has restarted since last Submission.

      The reason been, lets say we have a Supervisor with 8 GB RAM.
      We also have 4 Slots on this Supervisor.
      We submit 4 Topologies each with worker memory of 3 GB leading to a total of 12GB / 8 GB utilization assuming not all topologies would use up all the memory at the same time.

      Now, we find that topologies are dying behind the scenes due to out of memory and Storm Nimbus keeps restarting these topologies again.

      The uptime requests as part of STORM-2816 (https://issues.apache.org/jira/browse/STORM-2816) we can address the uptime but it still won't say we have a deeper issue and the topologies are restarting behind the scene. Adding this counter would help to flag issues.

      The counts should be at both per topology level like

      Topology 1
      Submission Time T1
      Uptime T2
      Restarts 4 (Possible log links to why restarted)

      The other should be at the Storm UI level

      Total Topologies : 20
      Total Topologies Restart since Submission : 12 (Possible links to topologies that got restarted)

      This way monitoring and alerting systems can hook into these counts and alert when things go wrong.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              antonpious@gmail.com Anton Alfred
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: