Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3099

Extend metrics on supervisor, workers, and DRPC

    Details

      Description

      This patch serves to extend metrics on supervisor and worker. Currently the following metrics are being implemented, including but not limited to:

      Worker:

      1. Kill Count by Category - Assignment Change/HB too old/Heap Space
      2. Time spent in each state
      3. Time to Actually Kill worker (from identifying need by supervisor and actual change in the state of the worker) - per worker?
      4. Time to start worker for topology from reading assignment for the first time.
      5. Worker cleanup Time/Worker cleanup Retries
      6. Worker Suicide Count - category: internal error or Assignment Change

      Supervisor:

      1. Supervisor restart Count
      2. Blobstore (Request to download time)
      • # Download time individual blob (inside localizer) localizer gettting requst to actually download hdfs request to finish
      • # Download rate individual blob (inside localizer)
      • # Supervisor localizer thread blob download - how long (outside localizer)
      1. Blobstore Update due to Version change Cnts
      2. Blobstore Storage by users

      DRPC:

      1. Avg/Max Time to respond to Http Request

      There might be more metrics added later.

      This patch will also refactor code in relevant files. Bugs found during the process will be reported in other issues and handled separately.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhengdai Zhengdai Hu
                Reporter:
                zhengdai Zhengdai Hu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 10m
                  5h 10m