Uploaded image for project: 'Myriad'
  1. Myriad
  2. MYRIAD-133

Multiple flexed up NMs try to run on same node, altogether.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Myriad 0.1.0
    • Scheduler
    • None

    Description

      On a 3 node cluster with latest build running with NM +Executor merge, I am seeing issue with flexing up
      Multiple instances of NMs that multiple NMs try to start on same node at same
      time altogether.

      Here is the existing/Already running tasks from Myriad: (Before multiple NM
      flex up)

      [root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state

      {"pendingTasks":[], "stagingTasks":[], "activeTasks":[ "nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89", "nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"], "killableTasks":[]}

      Then, I tried flexing up 4 instances of Zero-profile NM, Keep note that only 1
      Node is without any NM, other 2 nodes already running NMs (See above).

      here is the task status from myriad just after flex up and when all NMs were in
      active state.

      [root@qa101-137 ~]# curl -H "Content-Type: application/json" -X PUT -d
      '

      {"instances":4, "profile":"zero"}

      '
      http://testrm.marathon.mesos:8192/api/cluster/flexup

      [root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
      python -mjson.tool
      {
      "activeTasks": [
      "nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
      "nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"
      ],
      "killableTasks": [],
      "pendingTasks": [
      "nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
      "nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
      "nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
      ],
      "stagingTasks": [
      "nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0"
      ]
      }

      [root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
      python -mjson.tool
      {
      "activeTasks": [
      "nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0",
      "nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
      "nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52",
      "nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
      "nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
      "nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
      ],
      "killableTasks": [],
      "pendingTasks": [],
      "stagingTasks": []
      }

      On Mesos, all 4 NMs tries to start on a single node, and they all in RUNNING
      state at some point, and then moved to LOST state after all NMs settled down.
      Also, Myriad moved the rest of NON-Successful tasks from active to pending
      state later on.

      [root@qa101-137 ~]# curl -s http://testrm.marathon.mesos:8192/api/state |
      python -mjson.tool
      {
      "activeTasks": [
      "nm.zero.a5e73358-351f-4938-ba3d-9dc759b514e0",
      "nm.medium.a8a36268-e365-4fd2-a87c-4c02ac2aeb89",
      "nm.small.30e0ce9c-f9da-49de-b927-ab8a58be6d52"
      ],
      "killableTasks": [],
      "pendingTasks": [
      "nm.zero.cd35db39-30f0-4da5-aa07-67c22cfe40ee",
      "nm.zero.ad7d597c-27f8-4e2c-8108-ae675990fdd9",
      "nm.zero.5110931a-279e-4f95-b4e6-5d1167d45993"
      ],
      "stagingTasks": []
      }
      Let me know if need any additional details regarding the issue?

      Attachments

        Activity

          People

            sdaingade Swapnil Daingade
            sarjeet Sarjeet Singh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: