[LUCENE-967] Add "tokenize documents only" task to contrib/benchmark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3
Fix Version/s: 2.3
Component/s: modules/benchmark
Labels:
None

Lucene Fields:

New, Patch Available

Description

I've been looking at performance improvements to tokenization by
re-using Tokens, and to help benchmark my changes I've added a new
task called ReadTokens that just steps through all fields in a
document, gets a TokenStream, and reads all the tokens out of it.

EG this alg just reads all Tokens for all docs in Reuters collection:

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker
doc.maker.forever=false
{ReadTokens > : *

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-967.patch
26/Jul/07 18:09
11 kB
Michael McCandless
LUCENE-967.take2.patch
29/Jul/07 00:59
12 kB
Michael McCandless
LUCENE-967.take3.patch
01/Aug/07 12:16
14 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Jul/07 18:08

Updated:: 28/Aug/22 11:39

Resolved:: 01/Aug/07 18:55