Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1187 [Umbrella] Recovery stabilization
  3. YUNIKORN-1197

Placeholders are immediately replaced during recovery

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When we restart YK, some placeholders that are running are immediately replaced, despite the fact that the timeout has not yet expired.

      Example:

      2022-04-27T11:43:47.145Z	INFO	cache/context_recovery.go:182	node state	{"nodeName": "minikube", "nodeState": "Healthy"}
      2022-04-27T11:43:47.145Z	INFO	cache/context_recovery.go:196	nodes recovery is successful	{"recoveredNodes": 1}
      2022-04-27T11:43:47.145Z	INFO	shim/scheduler.go:226	scheduler recovery succeed
      2022-04-27T11:43:47.145Z	INFO	cache/nodes.go:238	scheduler node event 	{"name": "minikube", "current state ": "New", "transition to ": "RecoverNode"}
      2022-04-27T11:43:47.145Z	INFO	shim/scheduler.go:356	No outstanding apps found for a while	{"timeout": "2m0s"}
      2022-04-27T11:43:47.145Z	INFO	cache/application.go:557	Skip the reservation stage	{"appID": "batch-sleep-job"}
      2022-04-27T11:43:47.145Z	INFO	cache/context.go:318	trigger scheduler configuration reloading
      2022-04-27T11:43:48.148Z	INFO	objects/application.go:585	Ask added successfully to application	{"appID": "batch-sleep-job", "ask": "ce3558cd-2a02-47d8-9bb7-93b2aadf9cc8", "placeholder": false, "pendingDelta": "map[memory:10000000 vcore:10]"}
      2022-04-27T11:43:48.148Z	INFO	objects/application.go:585	Ask added successfully to application	{"appID": "batch-sleep-job", "ask": "c88d0bba-ef94-4728-ad54-da30f72646ee", "placeholder": false, "pendingDelta": "map[memory:10000000 vcore:10]"}
      2022-04-27T11:43:48.148Z	INFO	objects/application.go:585	Ask added successfully to application	{"appID": "batch-sleep-job", "ask": "412d750d-f8c2-4b9c-a4cf-c7077c5384e1", "placeholder": false, "pendingDelta": "map[memory:10000000 vcore:10]"}
      2022-04-27T11:43:48.148Z	INFO	objects/application.go:585	Ask added successfully to application	{"appID": "batch-sleep-job", "ask": "54607708-a8f3-4ff3-b73c-210111a54625", "placeholder": false, "pendingDelta": "map[memory:10000000 vcore:10]"}
      2022-04-27T11:43:48.148Z	INFO	objects/application.go:585	Ask added successfully to application	{"appID": "batch-sleep-job", "ask": "f080aad1-6b08-4d83-8802-8dbf853a89cd", "placeholder": false, "pendingDelta": "map[memory:10000000 vcore:10]"}
      2022-04-27T11:43:48.156Z	INFO	scheduler/partition.go:863	scheduler replace placeholder processed	{"appID": "batch-sleep-job", "allocationKey": "ce3558cd-2a02-47d8-9bb7-93b2aadf9cc8", "UUID": "5f0d5e0d-0668-4297-82ba-c8ebb585b0f7", "placeholder released UUID": "312d7df9-000c-4035-9170-9ea96ef9e718"}
      2022-04-27T11:43:48.156Z	INFO	scheduler/partition.go:863	scheduler replace placeholder processed	{"appID": "batch-sleep-job", "allocationKey": "c88d0bba-ef94-4728-ad54-da30f72646ee", "UUID": "a80035cc-9751-4dc2-9a36-9a649ae50922", "placeholder released UUID": "a2c072c7-3814-4464-bcd8-64f3e3b79b4e"}
      2022-04-27T11:43:48.156Z	INFO	scheduler/partition.go:863	scheduler replace placeholder processed	{"appID": "batch-sleep-job", "allocationKey": "412d750d-f8c2-4b9c-a4cf-c7077c5384e1", "UUID": "126f996f-ca79-4895-a593-ffa51a6fc40e", "placeholder released UUID": "a48d2a0a-c9cc-446b-8f33-bf7952e5771c"}
      2022-04-27T11:43:48.156Z	INFO	scheduler/partition.go:863	scheduler replace placeholder processed	{"appID": "batch-sleep-job", "allocationKey": "54607708-a8f3-4ff3-b73c-210111a54625", "UUID": "d82c103e-85ce-4375-9448-75d251549326", "placeholder released UUID": "84e6a8bc-42ab-45da-9af6-c4067b2a3561"}
      2022-04-27T11:43:48.156Z	INFO	scheduler/partition.go:863	scheduler replace placeholder processed	{"appID": "batch-sleep-job", "allocationKey": "f080aad1-6b08-4d83-8802-8dbf853a89cd", "UUID": "2441b758-4c12-44d8-ab70-6c6b3fb100de", "placeholder released UUID": "ec4f534c-628a-4dc3-87d1-73f782da8c46"}
      2022-04-27T11:43:48.156Z	INFO	cache/application.go:675	try to release pod from application	{"appID": "batch-sleep-job", "allocationUUID": "312d7df9-000c-4035-9170-9ea96ef9e718", "terminationType": "PLACEHOLDER_REPLACED"}
      2022-04-27T11:43:48.168Z	INFO	cache/application.go:675	try to release pod from application	{"appID": "batch-sleep-job", "allocationUUID": "a2c072c7-3814-4464-bcd8-64f3e3b79b4e", "terminationType": "PLACEHOLDER_REPLACED"}
      2022-04-27T11:43:48.174Z	INFO	cache/application.go:675	try to release pod from application	{"appID": "batch-sleep-job", "allocationUUID": "a48d2a0a-c9cc-446b-8f33-bf7952e5771c", "terminationType": "PLACEHOLDER_REPLACED"}
      2022-04-27T11:43:48.180Z	INFO	cache/application.go:675	try to release pod from application	{"appID": "batch-sleep-job", "allocationUUID": "84e6a8bc-42ab-45da-9af6-c4067b2a3561", "terminationType": "PLACEHOLDER_REPLACED"}
      2022-04-27T11:43:48.199Z	INFO	cache/application.go:675	try to release pod from application	{"appID": "batch-sleep-job", "allocationUUID": "ec4f534c-628a-4dc3-87d1-73f782da8c46", "terminationType": "PLACEHOLDER_REPLACED"}
      2022-04-27T11:43:49.671Z	INFO	general/general.go:285	task completes	{"appType": "general", "namespace": "default", "podName": "tg-groupa-batch-sleep-job-3", "podUID": "84e6a8bc-42ab-45da-9af6-c4067b2a3561", "podStatus": "Failed"}
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pbacsko Peter Bacsko Assign to me
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment