Jira to note the discussion points from an initial chat about integrating Timeline Service v2 with Federation (
- all entities that belong to the same flow run should have the same cluster name
- app id in the same flow run strongly ordered in time
- need a logical cluster name and physical cluster name
- a possibility to implement the Application TimelineCollector as an interceptor in the AMRMProxyService.
For Timeline Service:
- need to store physical cluster id and logical cluster id so that we don't lose information at any level (flow/app/entity etc)
- add a new table app id to cluster mapping table
- need a different entity table/some table to store node level metrics for physical cluster stats. Once we get to node-level rollup, we probably have to store something in a dc, cluster, rack, node hierarchy. In that case a physical cluster makes sense, but we'd still need some way to tie physical and logical together in order to make automatic error detection etc that we're envisioning feasible within a federated setup.
For the Cluster Naming convention:
- three situations for cluster name:
----> app submitted to router should take federated (aka logical) cluster name
----> app submitted directly to RM should take physical cluster name
----> Info about the physical cluster in entities?
- suggestion to set the cluster name as yarn tag at the router level (in the app submission context)
Other points to note:
- for federation to work smoothly in environments that use HDFS some additional considerations are needed, and possibly some solution like what is being used at Twitter with the nFly approach.
Email thread context: