Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We would like to capture the following information at certain progress thresholds as a task runs:
- Time taken so far
- CPU load [either at the time the data are taken, or exponentially smoothed]
- Memory load [also either at the time the data are taken, or exponentially smoothed]
This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges – [0-1/3], (1/3-2/3], and (2/3-3/3] – where fundamentally different activities happen. Mappers have different boundaries, I understand, that are not symmetrically placed. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.
This data would flow in with the heartbeats. It would be placed in the job history as part of the task attempt completion event, so it could be processed by rumen or some similar tool and could drive a benchmark engine.
Attachments
Attachments
Issue Links
- depends upon
-
MAPREDUCE-220 Collecting cpu and memory usage for MapReduce tasks
- Closed
- is depended upon by
-
MAPREDUCE-2063 We need a benchmark to model system behavior in the face of tasks with time-variant performance
- Open
- relates to
-
MAPREDUCE-2039 Improve speculative execution
- Resolved
- requires
-
MAPREDUCE-901 Move Framework Counters into a TaskMetric structure
- Closed