When too many applications are running, we found that client couldn't submit the application, and a high callqueuelength of port 8032. I catch the jstack of resourcemanager when callqueuelength is too high. I found that the thread "IPC Server handler xxx on 8032" are waitting for the object lock of FairScheduler, nodeupdate holds the lock of the FairScheduler. Maybe high process time leads to the phenomenon that client can't submit the application.
Here I don't consider the problem that client can't submit the application, only estimates the performance of the fairscheduler. We can see too many function which needs object lock are used, the granularity of object lock is too big. For example, nodeUpdate and getAppWeight wanna hold the same object lock. It is unresonable and inefficiency. I recommand that the low granularity lock replaces the current lock.