At times there is a need to benchmark certain Hadoop client APIs. Often, this is done by running simple & standard sort-like programs on Hadoop and then using an external utility to benchmark the APIs. But then the benchmarking results tend to be off from reality as the load on the cluster doesn't match the actual load. We believe that Gridmix3 - which is a Hadoop workload simulator - can prove useful here. Gridmix3 already provides a mechanism to load the cluster - often called as a 'test cluster' - using a real trace thus mimicking the real-life workload.
Currently, Gridmix3 consumes a representative workload trace and loads the Hadoop cluster to match what is seen in the trace. Gridmix3 can be enhanced to also support user scripts (hereby referred as 'addons') which will be loaded within Gridmix3 and will get updates like
1. Job submission
2. Job completion
3. Cluster status
These addons can also ping/access a live, close-to-real-life Hadoop cluster. This will allow users to benchmark the Hadoop cluster while it is running.