Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-438

Slider agent continues to run in the container on a node where NM dies

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Slider 0.60
    • agent, agent-provider
    • None
    • Slider October #2

    Description

      Steps to reproduce:

      • Setup a 3-node cluster (in non-HA mode)
      • Run slider create for HBase app-package (with HMaster and HRegionServer components only - just to keep things simple)
      • Let's assume that the HRegionServer came up in a node different from that of HMaster and Slider AM (if not, doing destroy-create couple of times will definitely get you to this setup)
      • Kill the NM in the node where HRegionServer is running
      • Restart the NM within 10 minutes (which is the default time after which RM marks the node as KILLED, configurable using yarn.nm.liveness-monitor.expiry-interval-ms)
      • At this point Slider AM received the container lost event from RM, it marked the container lost and requested a new one to RM. A new HRegionServer container came up (in the same host where the old one was running). At this point both the HRegionServer containers continued to run happily along side each other and successfully heart-beating to AM.

      Expected:

      • Given that the first HRegionServer instance was still heart-beating with AM, AM should be able to send a kill signal and bring the agent/container down.

      Attachments

        Issue Links

          Activity

            People

              gsaha Gour Saha
              gsaha Gour Saha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: