Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2141

Should not preempt placeholders which has been released

    XMLWordPrintableJSON

Details

    Description

      The details about the bug:

      • The real pod created and waiting for scheduling after placeholders bound
      {"stream":"stdout","log":"2023-11-08T15:16:14.912Z\tINFO\tcache/task_state.go:380\tTask state transition\t{\"app\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"task\": \"8837de6e-d888-4549-9baf-254c8a807421\", \"taskAlias\": \"dex-app-q5nslqd5/ogautaealleventsdynamicu2klogfm2-50-eb8bde8baf814091-driver\", \"source\": \"New\", \"destination\": \"Pending\", \"event\": \"InitTask\"}"}
      {"stream":"stdout","log":"2023-11-08T15:16:14.912Z\tINFO\tcache/task_state.go:380\tTask state transition\t{\"app\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"task\": \"8837de6e-d888-4549-9baf-254c8a807421\", \"taskAlias\": \"dex-app-q5nslqd5/ogautaealleventsdynamicu2klogfm2-50-eb8bde8baf814091-driver\", \"source\": \"Pending\", \"destination\": \"Scheduling\", \"event\": \"SubmitTask\"}"}
      • Scheduler replace placeholder processed, and send release allocation request to shim side:
      {"stream":"stdout","log":"2023-11-08T15:16:14.912Z\tINFO\tscheduler/partition.go:828\tscheduler replace placeholder processed\t{\"appID\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"allocationKey\": \"8837de6e-d888-4549-9baf-254c8a807421\", \"uuid\": \"9508439d-60a2-404e-9c84-bd2c6783b5c7\", \"placeholder released uuid\": \"cc243ba1-7054-4b07-8344-6afb1424b1e0\"}"}
      {"stream":"stdout","log":"2023-11-08T15:16:14.913Z\tINFO\tcache/application.go:637\ttry to release pod from application\t{\"appID\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"allocationUUID\": \"cc243ba1-7054-4b07-8344-6afb1424b1e0\", \"terminationType\": \"PLACEHOLDER_REPLACED\"}"}
      • The same time, Preempting task try to preempt the already sent release allocation
      {"stream":"stdout","log":"2023-11-08T15:16:20.870Z\tINFO\tobjects/preemption.go:563\tPreempting task\t{\"applicationID\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"allocationKey\": \"e6e91651-7152-42f5-8504-355590fa0079\", \"nodeID\": \"ip-10-157-240-201.ec2.internal\", \"resources\": \"map[memory:3430940672 pods:1 vcore:2100]\"}"}
      {"stream":"stdout","log":"2023-11-08T15:16:20.871Z\tINFO\tcache/application.go:637\ttry to release pod from application\t{\"appID\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"allocationUUID\": \"cc243ba1-7054-4b07-8344-6afb1424b1e0\", \"terminationType\": \"PREEMPTED_BY_SCHEDULER\"}"}
      • The pod deleted and trigger complete task and the terminationType is PREEMPTED_BY_SCHEDULER
      {"stream":"stdout","log":"2023-11-08T15:16:45.489Z\tINFO\tgeneral/general.go:204\tdelete pod\t{\"appType\": \"general\", \"namespace\": \"dex-app-q5nslqd5\", \"podName\": \"tg-spark-driver-spark-28105bdfe17b494887c0c4-0\", \"podUID\": \"e6e91651-7152-42f5-8504-355590fa0079\"}"}
      {"stream":"stdout","log":"2023-11-08T15:16:45.489Z\tINFO\tcache/task_state.go:380\tTask state transition\t{\"app\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"task\": \"e6e91651-7152-42f5-8504-355590fa0079\", \"taskAlias\": \"dex-app-q5nslqd5/tg-spark-driver-spark-28105bdfe17b494887c0c4-0\", \"source\": \"Bound\", \"destination\": \"Completed\", \"event\": \"CompleteTask\"}"}
      {"stream":"stdout","log":"2023-11-08T15:16:45.489Z\tINFO\tscheduler/partition.go:1245\tremoving allocation from application\t{\"appID\": \"spark-28105bdfe17b494887c0c443f8a3ab0f\", \"allocationId\": \"cc243ba1-7054-4b07-8344-6afb1424b1e0\", \"terminationType\": \"PREEMPTED_BY_SCHEDULER\"}"}
      • The real pod will always pending, because the core side doesn't receive the release response for PLACEHOLDER_REPLACED
      // if we have an uuid the termination type is important
      if release.TerminationType == si.TerminationType_PLACEHOLDER_REPLACED {
          log.Logger().Info("replacing placeholder allocation",
             zap.String("appID", appID),
             zap.String("allocationId", uuid))
          if alloc := app.ReplaceAllocation(uuid); alloc != nil {
             released = append(released, alloc)
          }
      } else {
          log.Logger().Info("removing allocation from application",
             zap.String("appID", appID),
             zap.String("allocationId", uuid),
             zap.String("terminationType", release.TerminationType.String()))
          if alloc := app.RemoveAllocation(uuid); alloc != nil {
             released = append(released, alloc)
          }
      } 

      Attachments

        Issue Links

          Activity

            People

              zhuqi Qi Zhu
              zhuqi Qi Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: