Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-3558

Affinity task hangs when Collision SPI produces a lot of job rejections & Failover SPI produces many attempts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3
    • compute
    • None

    Description

      The test to reproduce:
      IgniteCacheLockPartitionOnAffinityRunWithCollisionSpiTest.testJobFinishing

      Root cause
      GridJobExecuteResponse isn't set from target node because there is a confusion with GridJobWorker instances in the CollisionContext.

      Suggestion
      The method GridJobProcessor.CollisionJobContext.cancel()
      use passiveJobs.remove(jobWorker.getJobId(), jobWorker).
      passiveJobs is a ConcurrentHashMap and GridJobWorker.equals() implements as a equation of jobId.

      So, when two thread try to cancel the two workers with the same jobIds we have the case:

      • thread0 remove jobWorker0 & cancel jobWorker0.
      • thread0 put jobWorker1 (because jobWorker0 already removed);
      • thread1: (has a copy of jobWorker0) and try to cancel it.
      • thread1: remove jobWorker1 instead of jobWorker0 (because jobId is used to identify);
      • thread1: doesn't send ExecuteResponse because jobWorker0 has been canceled.

      Proposal
      Try to use system default equals for the GridJobWorker

      Attachments

        Issue Links

          Activity

            People

              tledkov-gridgain Taras Ledkov
              tledkov-gridgain Taras Ledkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h