Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2323

Gang scheduling user experience issues

    XMLWordPrintableJSON

Details

    Description

      In case of any issues, users are finding it bit difficult to understand what is going on with the gang app. 

      Issue 1:

      "driver pod is getting struck"

      At times, when driver pod is not able to run successfully for some reasons, users are getting the perspective that pod is getting struck and app is hanged, not moving further. Users are waiting for some time and don't understand the clear picture. How do we close the gap quickly and communicate accordingly through events?

      Issue 2:

      ResumeApplication is fired when all ph's are timed out. Do we need to inform the users about this event as they may not clue any about this significant change?

      Issue 3: 

      When Gang app ph's are in progress (and allocated), when there is request for real asks and there is resource crunch, do we need to trigger auto scaling?

      Attachments

        Issue Links

          Activity

            People

              mani Manikandan R
              mani Manikandan R
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: