Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2574

totalPartitionResource should not be mutated with AddTo/SubFrom

    XMLWordPrintableJSON

Details

    Description

      There is a potential data race in PartitionContext: the field totalPartitionResource is mutated in place. The problem is that the method GetTotalPartitionResource() does not clone it.

      func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
      	pc.RLock()
      	defer pc.RUnlock()
      
      	return pc.totalPartitionResource
      }
      

      In general, we should prefer the immutable approach for variables like this, just like in objects.Queue:

      func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported bool) error {
      	// check this queue: failure stops checks if the allocation is not part of a node addition
      	newAllocated := resources.Add(sq.allocatedResource, alloc)    <----  New object
              [ ... removed ... ]
      	sq.Lock()
      	defer sq.Unlock()
      	// all OK update this queue
      	sq.allocatedResource = newAllocated
      	sq.updateAllocatedResourceMetrics()
      	return nil
      }
      
      // incPendingResource increments pending resource of this queue and its parents.
      func (sq *Queue) incPendingResource(delta *resources.Resource) {
      	// update the parent
      	if sq.parent != nil {
      		sq.parent.incPendingResource(delta)
      	}
      	// update this queue
      	sq.Lock()
      	defer sq.Unlock()
      	sq.pending = resources.Add(sq.pending, delta)     <---- New object
              sq.updatePendingResourceMetrics()
      }
      

      Attachments

        Issue Links

          Activity

            People

              pbacsko Peter Bacsko
              pbacsko Peter Bacsko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: