Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Rumen produces "job traces", which are JSON format files describing important aspects of all jobs that are run [successfully or not] on a hadoop map/reduce cluster. There are two packages under development that will consume these trace files and produce actions in that cluster or another cluster: gridmix3 [see jira MAPREDUCE-1124 ] and Mumak [a simulator -- see MAPREDUCE-728 ].
It would be useful to be able to do two things with job traces, so we can run experiments using these two tools: change the duration, and change the density. I would like to provide a "folder", a tool that can wrap a long-duration execution trace to redistribute its jobs over a shorter interval, and also change the density by duplicating or culling away jobs from the folded combined job trace.