Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1874

Data race: unlocked access in Context.updatePodCondition()

    XMLWordPrintableJSON

Details

    Description

      Running some performance tests locally triggered data races:

      ==================
      WARNING: DATA RACE
      Read at 0x00c008351538 by goroutine 50598:
        k8s.io/api/core/v1.(*Pod).DeepCopyInto()
            /home/bacskop/go/pkg/mod/k8s.io/api@v0.27.3/core/v1/zz_generated.deepcopy.go:3369 +0x44
        k8s.io/api/core/v1.(*Pod).DeepCopy()
            /home/bacskop/go/pkg/mod/k8s.io/api@v0.27.3/core/v1/zz_generated.deepcopy.go:3383 +0x7d4
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated.func1()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:358 +0x760
      
      Previous write at 0x00c008351538 by goroutine 66:
        k8s.io/kubernetes/pkg/api/v1/pod.UpdatePodCondition()
            /home/bacskop/go/pkg/mod/k8s.io/kubernetes@v1.27.3/pkg/api/v1/pod/util.go:380 +0x63d
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1023 +0x65d
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1063 +0x30e
        github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:213 +0x45
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:169 +0x581
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:84 +0x38
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39
      
      Goroutine 50598 (running) created at:
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).postTaskAllocated()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:338 +0x8e
        github.com/apache/yunikorn-k8shim/pkg/cache.newTaskState.func3()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task_state.go:394 +0x73
        github.com/looplab/fsm.(*FSM).enterStateCallbacks()
            /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0xd1
        github.com/looplab/fsm.(*FSM).Event.func2.1()
            /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x1fe
        github.com/looplab/fsm.transitionerStruct.transition()
            /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 +0x92
        github.com/looplab/fsm.(*transitionerStruct).transition()
            <autogenerated>:1 +0x29
        github.com/looplab/fsm.(*FSM).doTransition()
            /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 +0xa72
        github.com/looplab/fsm.(*FSM).Event()
            /home/bacskop/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0xa48
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Task).handle()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/task.go:122 +0x26d
        github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).TaskEventHandler.func1()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:1113 +0x104
        github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1.2()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher.go:195 +0x58
      
      Goroutine 66 (running) created at:
        github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService()
            /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:67 +0x4c6
        github.com/apache/yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
            /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:87 +0x37a
        github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServices()
            /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:44 +0x64
        github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServicesWithLogger()
            /home/bacskop/repos/yunikorn-core/pkg/entrypoint/entrypoint.go:55 +0x3b
        main.main()
            /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/cmd/shim/main.go:51 +0x775
      ==================
      
       

      The problem is that inside Context.updatePodCondition(), we modify the PodStatus field of the pod object, which we use as a source of copies.

      It might be better to store the condition separately and protect it with the task lock.

      Attachments

        Issue Links

          Activity

            People

              pbacsko Peter Bacsko
              pbacsko Peter Bacsko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: