Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
Description
During the test, I observed on some occasions the scheduler could run into Nil pointer exception like below:
4-261f-4448-bc0f-5ea14d23f9e8"} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x190cf5d] goroutine 114 [running]: github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc004250000, 0xc0038e01b0, 0x24, 0x0) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/objects/application.go:1026 +0xcd github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc0026de600, 0xc0003c0a10, 0x0, 0x0, 0x0, 0x0) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/partition.go:1137 +0x14b5 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc0001400f0, 0xc0066400c0, 0x1, 0x1, 0x7ffeefbff80f, 0x9) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:683 +0x150 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocations(0xc0001400f0, 0xc006730000) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:606 +0x185 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processRMUpdateEvent(0xc0001400f0, 0xc0066ee0b8) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/context.go:213 +0x77 github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc00000e3c0) /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/scheduler.go:112 +0x416 created by github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService /Users/wyang/go/pkg/mod/github.com/apache/incubator-yunikorn-core@v0.0.0-20210126213806-78bf4f684709/pkg/scheduler/scheduler.go:54 +0xa2 make: *** [run] Error 2
the root cause is when the shim deletes a placeholder, it can trigger 2 events sometime,
- Pod Update
- Pod Delete
When a pod updated to TERMINATED state and when a pod gets DELETED, the shim will send a release request to the core. But when there is a second release request, as the previous one already removed the allocation, then we are hitting the Nil pointer. We need to avoid sending a second time release if the pod is already released.
Attachments
Attachments
Issue Links
- is caused by
-
YUNIKORN-229 shim sends the same remove request twice for a remove allocation
- Closed
- links to