Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.10.0
Description
Working on FLINK-14742 revealed that the problem with that test instability was the modification of the taskSlotTable of the TaskManager under test from multiple threads, namely the test thread and the main thread of the rpcEnpoint. This data-structure is not thread-safe and this should not happen.
This anti-pattern seems to be repeated in multiple tests like most of the tests in the TaskExecutorSubmissionTest (look for the call to the TaskSlotTable.allocateSlot()). There we seem to call taskSlotTable.allocateSlot() and then tmGateway.submitTask() which is essentially accessing the slot table from within the main rpc-endpoint thread.
This JIRA is just to investigate if this is also a problem in those tests or not.
cc trohrmann, chesnay , yangwang166