Previous benchmarks (
HADOOP-2369 , HADOOP-3770 ), while informed by production jobs, were principally load generating tools used to validate stability and performance under saturation. The important dimensions of that load- submission order/rate, I/O profile, CPU usage, etc- only accidentally match that of the real load on the cluster. Given related work that characterizes production load ( MAPREDUCE-751 ), it would be worthwhile to use mined data to impose a corresponding load for tuning and guiding development of the framework.
The first version will focus on modeling task I/O, submission, and memory usage.