Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.10.0
-
None
-
None
Description
We wanted to reduce to container start-up time for jobs that have large change logs. There is no metric to measure this.
Additionally, Yarn host-affinity makes a best-effort to run a container in the requested host. There is no metric to measure this either.
Proposal:
Since task store restoration takes up most of the time during a container start-up time, we can measure the time to restore task stores. It may also help identify data distribution among the task stores. To measure "host-affinity" itself on Yarn, we can add a gauge for the ratio of container requests that is successfully matched to the requested host to the total container requests.