Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Running some performance tests locally triggered data races:
================== WARNING: DATA RACE Read at 0x00c008351538 by goroutine 50598: k8s.io/api/core/v1.(*Pod).DeepCopyInto() /home/bacskop/go/pkg/mod/k8s.io/api@v0.27.3/core/v1/zz_generated.deepcopy.go:3369 +0x44 k8s.io/api/core/v1.(*Pod).DeepCopy() /home/bacskop/go/pkg/mod/k8s.io/api@v0.27.3/core/v1/zz_generated.deepcopy.go:3383 +0x7d4 github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated.func1() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:358 +0x760 Previous write at 0x00c008351538 by goroutine 66: k8s.io/kubernetes/pkg/api/v1/pod.UpdatePodCondition() /home/bacskop/go/pkg/mod/k8s.io/kubernetes@v1.27.3/pkg/api/v1/pod/util.go:380 +0x63d github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1023 +0x65d github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1063 +0x30e github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:213 +0x45 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:84 +0x38 github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39 Goroutine 50598 (running) created at: github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:338 +0x8e github.com/apache/yunikorn-k8shim/pkg/cache.newTaskState.func3() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task_state.go:394 +0x73 github.com/looplab/fsm.(*FSM).enterStateCallbacks() /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0xd1 github.com/looplab/fsm.(*FSM).Event.func2.1() /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x1fe github.com/looplab/fsm.transitionerStruct.transition() /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 +0x92 github.com/looplab/fsm.(*transitionerStruct).transition() <autogenerated>:1 +0x29 github.com/looplab/fsm.(*FSM).doTransition() /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 +0xa72 github.com/looplab/fsm.(*FSM).Event() /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0xa48 github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).handle() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:122 +0x26d github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).TaskEventHandler.func1() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1113 +0x104 github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1.2() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher.go:195 +0x58 Goroutine 66 (running) created at: github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService() /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x4c6 github.com/apache/yunikorn-core/pkg/entrypoint.startAllServicesWithParameters() /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:87 +0x37a github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServices() /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:44 +0x64 github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServicesWithLogger() /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:55 +0x3b main.main() /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cmd/shim/main.go:51 +0x775 ==================
The problem is that inside Context.updatePodCondition(), we modify the PodStatus field of the pod object, which we use as a source of copies.
It might be better to store the condition separately and protect it with the task lock.
Attachments
Issue Links
- links to