Currently we check if job id belong to this server by using modulus operation.
This may not be optimum way to do.
1. We are not processing MATERIALIZATION_SYSTEM_LIMIT, each server is only doing half (in case of two servers) processing. We can always double the limit. But as we add new system, we need to restart whole cluster to increase the limit.
2. The job sequence id is shared among wf,coord,bundle. So, we could have a case where coord with odd/even id is more. In that case we are not distribute load. One server will always do more processing.
3. We also have different frequency for different coord jobs. Job with 1 min or 5 min frequency will put more load on system. In this approach one particular job will always run in one system and eventually putting more load on one server.
May be simple way to optimize is to have a lock mechanism, each CoordMaterializeTriggerService will obtain a lock and materialize coord. If lock is held by other system, then it will wait for other system to release lock. In this way coord jobs will get distributed among servers.