On the other hand, does anybody know how frequently would the queue structure be refreshed?
Very infrequently, compared to the other competing operations like heartbeats.
If not very frequent, the main overhead of the implementation is to tie up one thread waiting for the task scheduler lock, and I'd tend to think it is acceptable.
Typical implementations of scheduling lock the task scheduler (i.e. the assignTasks call in the scheduler). Hence, when the queue refresh is triggered, all heartbeats will be locked. At least that is the case with the capacity scheduler.
Still, given how infrequent this is going to be (and also a typical queueRefresh operation is very fast) I think I am fine with this approach. If there are no other objections, let us go ahead. Makes sense ?
That said, looking at the latest changes, I did not quite follow why you introduced a new static refreshQueues method. The callers of this (apart from the non-static method in JobTracker) are in test cases. All these tests have a scheduler instance. We could lock it and then call the QueueManager API to be consistent. So it seems that the static method is an unnecessary indirection. Am I missing something ?
Findbugs still warns about the inconsistent synchronization and I have to exclude them in findbugsExcludeFile.xml.
I suppose this is because FindBugs does not realize the taskScheduler instance on which we are locking in the refreshQueues is the same instance which is locked in the other usages of this variable. So, this seems a valid reason to add to the exclude file. Do you think it makes sense to document this rationale in the excludes file so there is context ? BTW, thanks for updating the API documentation of the refreshQueues contract in TaskScheduler. It is very useful.