Jian He, to answer your question let me start with why we need epoch in a federated cluster:
currently only a single RM generates containerIDs (applicationID + a sequence number) but in a federated cluster, there are multiple RMs that are concurrently generating them. So there will be conflicts if an application spans across multiple sub-clusters. To avoid this conflict, we use epoch in a federated cluster similar to how it's used in the context of work preserving restarts to prevent conflicts.
The idea is we will set epoch number to be 0 for first sub-cluster RM, 10000 for second sub-cluster RM, 20000 for third sub-cluster RM, etc. This should be sufficient as we have 1M epochs as they are represented as a 20bit integer. With this, there will be a conflict of containerIDs only if all of the below conditions are satisfied:
- The RM of sub-cluster 1 is rebooted over 10000 times
- There is a running App the is still running (during over 10k reboots of one of the RMs)
- The app is run across sub-cluster 1 and sub-cluster 2
- The app is still holding onto containers from sub-cluster 2 issued from the first reboot of that sub-cluster
- The containers have Ids low enough that the newly issued containers from RM1 clash