Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-551

node removal races for lock during scheduling

    XMLWordPrintableJSON

Details

    Description

      A more complicated version of the dead lock mentioned in YUNIKORN-481.

      In this case the scheduler is racing with the node removal which in turn removes allocations from the application. The locks taken are al short term locks but it could happen that the application being scheduled also has an allocation on a node being removed.

      Scheduling requires the write locked app to request a read lock on the partition to get all known nodes. The partition write locks while removing the node from its internal list and keeps hold of that write lock while removing the allocations which tries to lock the app.

      The partition should have released the lock immediately after the node was removed from the list as the rest of the updates are not modifying the partition object.

      Attachments

        Issue Links

          Activity

            People

              wilfreds Wilfred Spiegelenburg
              wilfreds Wilfred Spiegelenburg
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: