Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2737

Cleanup handleFailApplicationEvent handling

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • shim - kubernetes
    • None

    Description

      When we handle a failed application in the shim in handleFailApplicationEvent() we call the placeholder cleanup.
      Three issues:

      • The cleanup needs the app lock after it takes the mgr lock. The app lock is already held when we process the event. Should be placing the cleanup last to not hold the manager lock for longer than needed
      • failing an application is triggered by the core which should do the cleanup already so this might be redundant to start with.
      • The failure handling also marks unassigned pods as failed which means there is an overlap between the failure handling and the placeholder cleanup which we should remove. Either ignore all placeholders in the failure or only cleanup assigned placeholders.

      Attachments

        Activity

          People

            blue.tzuhua Tzu-Hua Lan
            wilfreds Wilfred Spiegelenburg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: