Daniel Templeton - good questions.
Reasons why getResourceUsage() needs change:
- Subtracting resources queued for preemption before returning current allocation is plain incorrect, both to users who see this information through UI and other calculations in the scheduler. It is possible that the application gives up another container, and the current container queued for preemption is never preempted. Or, more resources are added to the cluster. Basically, I just feel we shouldn't count our chicken before they hatch.
- Acquiring the lock is an unnecessary overhead all calls to getResourceUsage incur. This gets called, recursively, so frequently that this can easily add up. Remember the frequency of calls is the reason we added an if-check in the first place.
I did manually check exhaustively all places getResourceUsage is called. In fact, my first patch (have not posted) was to have two methods getResourceUsage and getEffectiveResourceUsage. While working on that version, I carefully considered every call and realized there is only one place calling getEffectiveResourceUsage. That is when it dawned on me that I could just change that one caller. That said, I do appreciate another pair of eyes validate my claim.