Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
After YUNIKORN-1107, the health checker runs as a background thread in 30s interval. We observed a few scheduler restarts in the past week that seems to be caused by this thread, because it has an unsafe access to the partition context without proper read lock. I have uploaded a patch to reproduce this locally, and a file of the stack trace when crash happens.
Attachments
Attachments
Issue Links
- links to