Thanks Jonathan Hung for updating the patch,
For leveldb and zk, it will ignore it and use the scheduler configuration persisted in the store.
Then I suggest to add this as a Javadoc to the base class's method, this should be respected by all future implementations. Otherwise behavior will be changed when different store is configured.
Not sure about this, then we are doing the reservation system reinitialization inside scheduler, so every time scheduler#reinitialize is called, the reservation system is also initialized, not sure if this is the desired behavior. Also we would need to duplicate the reservation system reinitialization for all schedulers, or make ResourceScheduler an abstract class and add it there. ...
I just checked the code, probably I should use a different way to describe the problem:
There're two different code path to refresh scheduler config:
Path #1 (When mutation disabled) : Client -> AdminService -> Scheduler/ReservationSystem#reinitialize
Path #2 (When mutation enabled) : Client -> RMWebService -> Scheduler -> ConfProvider (do log persistent) -> AdminService -> Scheduler/ReservationSystem#reinitialize -> ConfProvider (confirm or discard mutation).
Please note that in the different code path, ordering of scheduler and AdminService is inverted, this is confusing and could possibly cause deadlock, etc.
Here's my proposal:
1) Change MutableConfigurationProvider#mutateConfiguration to log-scheduler-config-mutation, it will do following things:
a. Merge mutations to existing configs.
b. Call confStore.logMutation to persistent it.
2) Add two new method to MutableConfigurationProvider
a. Confirm last mutation - confirm last logged mutation. (Just call YarnConfigurationStore#confirmMutation(valid = true))
b. Discard last mutation - discard last logged mutation. (Just call YarnConfigurationStore#confirmMutation(valid = false))
And is it possible to remove id field in the confirmMutation method? Should we allow at most one pending mutation?
One we have above, the call path#2 becomes:
(1) Client -> RMWebService#updateSchedulerConfiguration -> MutableConfigurationProvider#log-scheduler-config-mutation
(2) ... RMWebService#updateSchedulerConfiguration -> AdminService#refreshQueues -> Scheduler/ReservationSystem#reinitialize.
If reinitialize succeeded:
(3) ... RMWebService#updateSchedulerConfiguration -> MutableConfigurationProvider#confirmLastChange
If reinitialize failed:
(4) RMWebService#updateSchedulerConfiguration -> MutableConfigurationProvider#discardLastChange