Index: build.xml
===================================================================
--- build.xml (revision 475655)
+++ build.xml (working copy)
@@ -115,5 +115,20 @@
+
Usage: java Benchmark properties-file algorithm-file + *
+ * Data includes: + *
+This package provides "task based" performance benchmarking of Lucene. +One can use the predefined benchmarks, or create new ones. +
++Contained packages: +
+ +| Package | +Description | +
| stats | +Statistics maintained when running benchmark tasks. | +
| tasks | +Benchmark tasks. | +
| feeds | +Sources foe benchmark inputs: documents and queries. | +
| utils | +Utilities used for the benchmark, and for the reports. | +
+
+Benchmark Lucene using task primitives. +
+ ++A benchmark is composed of some predefined tasks, allowing for creating an index, adding documents, +optimizing, searching, generating reports, and more. A benchmark run takes an "algorithm" file +that describes the sequence of tasks making up the run, and a properties file defining a few +additional characteristics of the benchmark run. +
+ + ++Predefined benchmarks are ran using the predefined ant tasks: +
+You can create your own benchmark by modifying one of the predefined .alg and .properties +files and using the appropriate ant target, or by providing your own .alg and .properties files. +In this case, you should run the class apache.lucene.taskBenchmark.Benchmark and provide +the two arguments: file.properties file.alg. +
++It is very likely that this would be sufficient for defining the benchmark you need, +otherwise, you can extend the framework to meet your needs, as explained herein. +
+ ++Each benchmark run has a DocMaker and a QueryMaker. These two should usually match, so +that "meaningful" queries are used for a certain collection. You can modify +the properties file to define which "makers" should be used. You can also +specify your own, extending the DocMaker and QureyMaker interfaces. +
+ ++Benchmark .alg file contains the benchmark "algorithm". The syntax is described below. +Within the algorithm, you can specify groups of commands, assign them names, specify commands that should be repeated, +do commands in serial or in parallel, and also control the speed of "firing" the commands. +
++This allows, for instance, to specify +that an index should be opened for update, +documents should be added to it one by one but not faster than 20 docs a minute, +and, in parallel with this, +some N queries should be searched against that index, +again, no more than 2 queries a second. +You can have the searches all share an index searcher, +or have them each open its own searcher and close it afterwords. +
+ ++If the commands available for use in the algorithm do not meet your needs, +you can add commands by adding a new task under +org.apache.lucene.taskBenchmark.task - +you should extend the PerfTask abstract class. +Make sure that your new task class name is suffixed by Task. +Assume you added the class "WonderfulTask" - doing so also enables the +command "Wonderful" to be used in the algorithm. +
+ + ++The following is an informal description of the supported syntax. +
+ ++Existing tasks can be divided into a few groups: +regular index/search work tasks, report tasks, and control tasks. +
+ ++Properties are read from the .properties file, and +define several parameters of the performance test. +As mentioned above for the NewRound task, +numeric and boolean properties that are defined as a sequence +of values, e.g. merge.factor=mrg.10.100.10.100 +would increment (cyclic) to the next value, when NewRound is called, and would also +appear as a named column in the reports (column name would be "mrg" in this example). +
+ ++Some of the currently defined properties are: +
+ ++For additional defined properties see the task*.properties file under conf. +
+ + ++The following example is in conf/task-sample.alg: +
+# --------------------------------------------------------
+#
+# Sample: what is the effect of doc size on indexing time?
+#
+# There are two parts in this test:
+# - PopulateShort adds 2N documents of length L
+# - PopulateLong adds N documents of length 2L
+# Which one would be faster?
+# The comparison is done twice.
+#
+# --------------------------------------------------------
+
+{
+
+ { "PopulateShort"
+ CreateIndex
+ { AddDoc(4000) > : 20000
+ Optimize
+ CloseIndex
+ >
+
+ ResetSystemErase
+
+ { "PopulateLong"
+ CreateIndex
+ { AddDoc(8000) > : 10000
+ Optimize
+ CloseIndex
+ >
+
+ ResetSystemErase
+
+} : 2
+
+RepSelectByPref Populate
+
+
++The output report from running this test is the following: +
+Operation round cmpnd buf mrg runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem +PopulateShort 0 true 10 10 1 20003 106.2 188.36 1,664,232 4,194,304 +PopulateLong - - 0 true 10 10 - - 1 - - 10003 - - - 89.6 - - 111.69 - 2,257,112 - - 4,194,304 +PopulateShort 0 true 10 10 1 20003 107.5 186.14 2,972,088 4,194,304 +PopulateLong - - 0 true 10 10 - - 1 - - 10003 - - - 85.9 - - 116.42 - 2,980,024 - - 4,194,304 ++ +
Usage: java Benchmark properties-file algorithm-file + *
+This package provides "task based" performance benchmarking of Lucene. +One can use the predefined benchmarks, or create new ones. +
++Contained packages: +
+ +| Package | +Description | +
| stats | +Statistics maintained when running benchmark tasks. | +
| tasks | +Benchmark tasks. | +
| feeds | +Sources foe benchmark inputs: documents and queries. | +
| utils | +Utilities used for the benchmark, and for the reports. | +
+
+Benchmark Lucene using task primitives. +
+ ++A benchmark is composed of some predefined tasks, allowing for creating an index, adding documents, +optimizing, searching, generating reports, and more. A benchmark run takes an "algorithm" file +that describes the sequence of tasks making up the run, and a properties file defining a few +additional characteristics of the benchmark run. +
+ + ++Predefined benchmarks are ran using the predefined ant tasks: +
+You can create your own benchmark by modifying one of the predefined .alg and .properties +files and using the appropriate ant target, or by providing your own .alg and .properties files. +In this case, you should run the class apache.lucene.taskBenchmark.Benchmark and provide +the two arguments: file.properties file.alg. +
++It is very likely that this would be sufficient for defining the benchmark you need, +otherwise, you can extend the framework to meet your needs, as explained herein. +
+ ++Each benchmark run has a DocMaker and a QueryMaker. These two should usually match, so +that "meaningful" queries are used for a certain collection. You can modify +the properties file to define which "makers" should be used. You can also +specify your own, extending the DocMaker and QureyMaker interfaces. +
+ ++Benchmark .alg file contains the benchmark "algorithm". The syntax is described below. +Within the algorithm, you can specify groups of commands, assign them names, specify commands that should be repeated, +do commands in serial or in parallel, and also control the speed of "firing" the commands. +
++This allows, for instance, to specify +that an index should be opened for update, +documents should be added to it one by one but not faster than 20 docs a minute, +and, in parallel with this, +some N queries should be searched against that index, +again, no more than 2 queries a second. +You can have the searches all share an index searcher, +or have them each open its own searcher and close it afterwords. +
+ ++If the commands available for use in the algorithm do not meet your needs, +you can add commands by adding a new task under +org.apache.lucene.taskBenchmark.task - +you should extend the PerfTask abstract class. +Make sure that your new task class name is suffixed by Task. +Assume you added the class "WonderfulTask" - doing so also enables the +command "Wonderful" to be used in the algorithm. +
+ + ++The following is an informal description of the supported syntax. +
+ ++Existing tasks can be divided into a few groups: +regular index/search work tasks, report tasks, and control tasks. +
+ ++Properties are read from the .properties file, and +define several parameters of the performance test. +As mentioned above for the NewRound task, +numeric and boolean properties that are defined as a sequence +of values, e.g. merge.factor=mrg.10.100.10.100 +would increment (cyclic) to the next value, when NewRound is called, and would also +appear as a named column in the reports (column name would be "mrg" in this example). +
+ ++Some of the currently defined properties are: +
+ ++For additional defined properties see the task*.properties file under conf. +
+ + ++The following example is in conf/task-sample.alg: +
+# --------------------------------------------------------
+#
+# Sample: what is the effect of doc size on indexing time?
+#
+# There are two parts in this test:
+# - PopulateShort adds 2N documents of length L
+# - PopulateLong adds N documents of length 2L
+# Which one would be faster?
+# The comparison is done twice.
+#
+# --------------------------------------------------------
+
+{
+
+ { "PopulateShort"
+ CreateIndex
+ { AddDoc(4000) > : 20000
+ Optimize
+ CloseIndex
+ >
+
+ ResetSystemErase
+
+ { "PopulateLong"
+ CreateIndex
+ { AddDoc(8000) > : 10000
+ Optimize
+ CloseIndex
+ >
+
+ ResetSystemErase
+
+} : 2
+
+RepSelectByPref Populate
+
+
++The output report from running this test is the following: +
+Operation round cmpnd buf mrg runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem +PopulateShort 0 true 10 10 1 20003 106.2 188.36 1,664,232 4,194,304 +PopulateLong - - 0 true 10 10 - - 1 - - 10003 - - - 89.6 - - 111.69 - 2,257,112 - - 4,194,304 +PopulateShort 0 true 10 10 1 20003 107.5 186.14 2,972,088 4,194,304 +PopulateLong - - 0 true 10 10 - - 1 - - 10003 - - - 85.9 - - 116.42 - 2,980,024 - - 4,194,304 ++ +
+ * Data includes: + *