The benchmarking suite is now here: https://github.com/thesearchstack/solr-bench
Actual datasets and queries are TBD yet.
— Original description —
Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be found here, https://home.apache.org/~mikemccand/lucenebench/.
Preferably, we need:
- A suite of benchmarks that build Solr from a commit point, start Solr nodes, both in SolrCloud and standalone mode, and record timing information of various operations like indexing, querying, faceting, grouping, replication etc.
- It should be possible to run them either as an independent suite or as a Jenkins job, and we should be able to report timings as graphs (Jenkins has some charting plugins).
- The code should eventually be integrated in the Solr codebase, so that it never goes out of date.
There is some prior work / discussion:
- https://github.com/shalinmangar/solr-perf-tools (Shalin)
- https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md (Ishan/Vivek)
- SOLR-2646 & SOLR-9863 (Mark Miller)
- https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
- https://github.com/lucidworks/solr-scale-tk (Tim Potter)
There is support for building, starting, indexing/querying and stopping Solr in some of these frameworks above. However, the benchmarks run are very limited. Any of these can be a starting point, or a new framework can as well be used. The motivation is to be able to cover every functionality of Solr with a corresponding benchmark that is run every night.
Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure [~shalinmangar] and ~[firstname.lastname@example.org would help here.