Issue Details (XML | Word | Printable)

Key: LUCENE-675
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Grant Ingersoll
Reporter: Andrzej Bialecki
Votes: 3
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Lucene benchmark: objective performance test for Lucene

Created: 21/Sep/06 05:16 AM   Updated: 20/Jan/07 01:20 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works benchmark.byTask.patch 2006-11-16 08:17 PM Doron Cohen 334 kB
Text File Licensed for inclusion in ASF works benchmark.patch 2006-11-06 03:25 AM Grant Ingersoll 71 kB
Text File Licensed for inclusion in ASF works BenchmarkingIndexer.pm 2006-10-24 12:33 AM Marvin Humphrey 8 kB
Text File Licensed for inclusion in ASF works byTask.2.patch.txt 2007-01-05 04:42 AM Doron Cohen 218 kB
Text File Licensed for inclusion in ASF works byTask.jre1.4.patch.txt 2007-01-11 07:48 AM Doron Cohen 219 kB
File Licensed for inclusion in ASF works extract_reuters.plx 2006-10-24 12:33 AM Marvin Humphrey 4 kB
Java Source File Licensed for inclusion in ASF works LuceneBenchmark.java 2006-09-21 05:18 AM Andrzej Bialecki 31 kB
Java Source File Licensed for inclusion in ASF works LuceneIndexer.java 2006-10-24 12:35 AM Marvin Humphrey 7 kB
Zip Archive Licensed for inclusion in ASF works taskBenchmark.zip 2006-11-15 09:15 AM Doron Cohen 65 kB
Zip Archive Licensed for inclusion in ASF works timedata.zip 2006-11-07 10:50 AM Doron Cohen 7 kB
Text File Licensed for inclusion in ASF works tiny.alg 2006-11-12 10:12 AM Doron Cohen 3 kB
File Licensed for inclusion in ASF works tiny.properties 2006-11-12 10:12 AM Doron Cohen 1.0 kB

Resolution Date: 13/Jan/07 04:16 AM


 Description  « Hide
We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.

Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.