Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2030

Need to check headroom when trying other nodes for reserved allocations

    XMLWordPrintableJSON

Details

    Description

      As reported in YUNIKORN-1996, we are seeing many messages like below from time to time:

       WARN    objects/application.go:1504     queue update failed unexpectedly        {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 vcore:390584]), current usage (map[memory:3291983380480 pods:91 vcore:186000])“}

      Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate why it happened, because it's not supposed to happen as we check if there is enough resource headroom before calling 

       

      func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation 
      

      which printed the above message, and only call it when there is enough headroom.

      There maybe a bug in headroom checking?

       

      Attachments

        Issue Links

          Activity

            People

              yzhangal Yongjun Zhang
              yzhangal Yongjun Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: