Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-12078

Suspected race condition between TOPOLOGY_INIT and JobManager.addJob

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Event 4.3.12
    • None
    • Event
    • None

    Description

      Two regular cases where a job is stored as part of JobManager.addJob():

      • when a topology is defined, it directly gets stored to the appropriate assigned/target slingId subtree. This is the most frequent case by far.
      • if no topology is defined (no TOPOLOGY_INIT received) yet, it gets put into the unassigned subtree. Later upon receiving TOPOLOGY_INIT CheckTopologyTask.fullRun() finds such unassigned jobs and moves them to the corresponding assigned subtree.

      There is a suspect race condition (test case to be provided), which happens between the thread doing JobManager.addJob() and the thread handling the TOPOLOGY_INIT:

      • JobManager.addJob determines the target slingId - which is not yet defined, as TOPOLOGY_INIT is just being handled concurrently
      • CheckTopologyTask.fullRun(), as part of TOPOLOGY_INIT handling, however does not yet find the above new job in unassigned, as the job is just being stored concurrently.

      The result is a job in the unassigned subtree, which waits until the next TopologyEvent happens - which then invokes CheckTopologyTask.fullRun() - which then finds the unassigned job and re/assigns it accordingly. So the job is never lost, but substantially delayed due to this. (the frequency of TopologyEvents depends on actual cluster/property changes happening in the topology and can thus vary).

      Tasks:

      • provide a test case to reproduce
      • fix the race-condition
      • undo this and this commit

      Attachments

        Activity

          People

            Unassigned Unassigned
            stefanegli Stefan Egli
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: