Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-479

Provide a slider command to kill all stranded containers continuing to run post stop command

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • Slider 2.0.0
    • None
    • None

    Description

      A container can continue to run even after a slider stop command has been issued. One such scenarios is when NM of a non Slider-AM node is lost (for some intermittent reason) and then slider stop command is issued. YARN will not be able to clean up the stranded agent (and the application processes). In such a scenario even if the NM is brought back up later YARN does not kill these containers.

      In a large cluster with several applications deployed/managed by slider there could easily be numerous such stranded containers.

      Slider client could expose a "stop-all" command or maybe an option "stop --clean" (or anything appropriate for this task) to do the cleanup. It can bring up the Slider-AM in clean mode (say) which will not start any application but will simply register to ZK and wait for these stranded agents to heart-beat into it. Subsequently each one of these agents should receive a terminate command from the AM and do necessary cleanup and shutdown.

      This new command can be issued only after an application has been stopped. When invoked while the application is running this command should ignore/fail providing relevant information. This command can also provide a summary of how many stranded containers it cleaned up.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gsaha Gour Saha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: