Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-109

Detect and report application liveness

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Slider 0.40
    • Fix Version/s: Slider 2.0.0
    • Component/s: agent-provider, appmaster
    • Labels:
      None
    • Sprint:
      slider June 1

      Description

      Yarn Application state is different than the application state as perceived by Yarn. Such as:

      • When Yarn app state says RUNNING, the application deployed by Slider may actually be in the process of starting, and not yet ready for clients.
      • When Yarn app state says RUNNING, the application may in fact be unhealthy as in the component instances have gone down and waiting to come back up

      Application should be allowed to define its state (for its admin and clients) that is different than the application state as reported by Yarn.

        Issue Links

          Activity

          Hide
          jmaron Jonathan Maron added a comment -

          What do you see as the mechanism for relating this state? API, REST, JMX?

          Show
          jmaron Jonathan Maron added a comment - What do you see as the mechanism for relating this state? API, REST, JMX?
          Hide
          sumitmohanty Sumit Mohanty added a comment -

          Slider should also provide a state that allows distinction between frozen application vs. destroyed application. Both are "FAILED" from YARN perspective.

          Show
          sumitmohanty Sumit Mohanty added a comment - Slider should also provide a state that allows distinction between frozen application vs. destroyed application. Both are "FAILED" from YARN perspective.
          Hide
          sumitmohanty Sumit Mohanty added a comment -

          Two different requirements surfaced from discussions:

          • How to know application is ready - by ready it may mean that the application is ready to accept request. This could be based on the fact that a URL got registered by the app or a specific component came up successfully.
          • Using Status command is it possible to distinguish between Stopped vs. Destroyed applications?
          Show
          sumitmohanty Sumit Mohanty added a comment - Two different requirements surfaced from discussions: How to know application is ready - by ready it may mean that the application is ready to accept request. This could be based on the fact that a URL got registered by the app or a specific component came up successfully. Using Status command is it possible to distinguish between Stopped vs. Destroyed applications?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          the codahale metrics include healthchecks: small/fast function calls to return the health of a system. App health can be provided this way.

          We also need to consider what the minimum no of instances of a component is needed for an app to consider itself live.

          Show
          stevel@apache.org Steve Loughran added a comment - the codahale metrics include healthchecks: small/fast function calls to return the health of a system. App health can be provided this way. We also need to consider what the minimum no of instances of a component is needed for an app to consider itself live.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          This can be done if we can determine ways to check the health of deployed applications.

          1. A simple URI pattern will suffice for many components
          2. an even simpler "check the port" for being open probe works for IPC &c

          The package org.apache.slider.server.servicemonitor contains the service monitors from the Hadoop 1.0 HA daemons, which can monitor service health through a bootstrap process (no port/URL may fail once it has started), and runs in a separate thread, so that hung services are detected through timeouts.

          The problem then becomes one of "how to set this up"

          Show
          stevel@apache.org Steve Loughran added a comment - This can be done if we can determine ways to check the health of deployed applications. A simple URI pattern will suffice for many components an even simpler "check the port" for being open probe works for IPC &c The package org.apache.slider.server.servicemonitor contains the service monitors from the Hadoop 1.0 HA daemons, which can monitor service health through a bootstrap process (no port/URL may fail once it has started), and runs in a separate thread, so that hung services are detected through timeouts. The problem then becomes one of "how to set this up"

            People

            • Assignee:
              sumitmohanty Sumit Mohanty
              Reporter:
              sumitmohanty Sumit Mohanty
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development

                  Agile