> If JHLA does analysis across set of M/R jobs over a given time range, it can be added as another offline analysis tool
Yes JHLA analyzes history logs of multiple MR jobs over a time range.
> Should this be part of the MAPREDUCE?
The is based on the framework defined for TestDFSIO as everything else in hdfs-with-mr subproject. This was the reason why JHLA is where it is. I was thinking that TestDFSIO and related classes tools here should be actually move to benchmarks. Because this is what they are. But this is not a part of this patch.
> There is already a job history analyzer contributed to hadoop, called hadoop vaidya.
Sure there are different ways and motivations to analyze history logs.
This approach is trying to capture some characteristics, which would reflect the load on the cluster based on all jobs ran on the cluster during a period of time. The results are in very simple table-like format so that they could be processed by Excel of R system. I'll attache some pictures to demonstrate the final output.