Details
-
Sub-task
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
Description
After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue inside the MockScheduler:
2023-12-21T17:58:49.203+0100 INFO core.scheduler.ugm ugm/manager.go:136 Removing user from manager {"user": "testuser"} ... 2023-12-21T17:58:59.209+0100 INFO core.entrypoint entrypoint/service_context.go:40 ServiceContext stop all services ... 2023-12-21T17:58:59.211+0100 INFO core.scheduler.partition scheduler/partition_manager.go:144 marking all queues for removal {"partitionName": "[rm:123]default"} 2023-12-21T17:58:59.211+0100 INFO core.scheduler.queue objects/queue.go:952 marking managed queue for deletion {"queue": "root"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm objects/object_state.go:81 object transition {"object": "root", "source": "Active", "destination": "Draining", "event": "Remove"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue objects/queue.go:952 marking managed queue for deletion {"queue": "root.singleleaf"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm objects/object_state.go:81 object transition {"object": "root.singleleaf", "source": "Active", "destination": "Draining", "event": "Remove"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.partition scheduler/partition_manager.go:150 removing all applications from partition {"numOfApps": 1, "partitionName": "[rm:123]default"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.application objects/application.go:608 ask removed successfully from application {"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"} 2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue objects/queue.go:837 Application completed and removed from queue {"queueName": "root.singleleaf", "applicationID": "app-1"} 2023-12-21T17:59:32.848+0100 ERROR core.scheduler.ugm ugm/manager.go:118 user tracker must be available in userTrackers map {"user": "testuser"} github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource /home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118 github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654 github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843 github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388 github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156 github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97 github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop /home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217 github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll /home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50 github.com/apache/yunikorn-core/pkg/scheduler/tests.(*mockScheduler).Stop /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/mockscheduler_test.go:91 github.com/apache/yunikorn-core/pkg/scheduler/tests.TestApplicationHistoryTracking /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/application_tracking_test.go:172
The problem is that the tracker object no longer exist when PartitionContext.removeApplication() is called. At this point the app is also in Completed state, so it's not necessary to decrement any resource.