Description
The test to reproduce:
IgniteCacheLockPartitionOnAffinityRunWithCollisionSpiTest.testJobFinishing
Root cause
GridJobExecuteResponse isn't set from target node because there is a confusion with GridJobWorker instances in the CollisionContext.
Suggestion
The method GridJobProcessor.CollisionJobContext.cancel()
use passiveJobs.remove(jobWorker.getJobId(), jobWorker).
passiveJobs is a ConcurrentHashMap and GridJobWorker.equals() implements as a equation of jobId.
So, when two thread try to cancel the two workers with the same jobIds we have the case:
- thread0 remove jobWorker0 & cancel jobWorker0.
- thread0 put jobWorker1 (because jobWorker0 already removed);
- thread1: (has a copy of jobWorker0) and try to cancel it.
- thread1: remove jobWorker1 instead of jobWorker0 (because jobId is used to identify);
- thread1: doesn't send ExecuteResponse because jobWorker0 has been canceled.
Proposal
Try to use system default equals for the GridJobWorker
Attachments
Attachments
Issue Links
- is related to
-
IGNITE-2310 Lock cache partition for affinityRun/affinityCall execution
-
- Closed
-