Would be very useful to have some simple reference topologies included with Storm that can be used to measure performance both by devs during development (to start with) and perhaps also on a real storm cluster (subsequently).
To start with, the goal is to put the focus on the performance characteristics of individual building blocks such as specifics bolts, spouts, grouping options, queues, etc. So, initially biased towards micro-benchmarking but subsequently we could add higher level ones too.
Although there is a storm benchmarking tool (originally written by Intel?) that can be used, and i have personally used it, its better for this to be integrated into Storm proper and also maintained by devs as storm evolves.
On a side note, in some instances I have noticed (to my surprise) that the perf numbers change when the topologies written for Intel benchmark when rewritten without the required wrappers so that they runs directly under Storm.
Have a few topologies in mind for measuring each of these:
- Queuing and Spout Emit Performance: A topology with a Generator Spout but no bolts.
- Queuing & Grouping performance: Generator Spout -> A grouping method -> DevNull Bolt
- Hdfs Bolt: Generator Spout -> Hdfs Bolt
- Hdfs Spout: Hdfs Spout -> DevNull Botl
- Kafka Spout: Kafka Spout -> DevNull Bolt
- Simple Data Movement: Kafka Spout -> Hdfs Bolt
Shall add these for Storm core first. Then we can have the same for Trident also.