|
Jason Rutherglen made changes - 27/Mar/09 08:03 PM
Are these tests measuring adding a single doc, then searching on it? What are the numbers you measure in the results (eg 25882 for LuceneRealtimeWriter)?
I think we need a more realistic test for near real-time search, but I'm not sure exactly what that is. In
Michael McCandless made changes - 10/Jun/09 08:12 PM
Michael McCandless made changes - 11/Jun/09 09:32 AM
We need a benchmark that simply measures the indexing of
1,5,10,100,1000 docs + (reopen + query). The first benchmark can use IW.getReader as is (meaning the newly created segments are written to disk), the other LUCENE-1313 (which stores newly created segments in RAM). This way we can accurately say which method works best and in what situation. The use case LUCENE-1313 is designed for is sub 100 document updates. I'll update LUCENE-1313, and give this a try.
Really depends though I think - I would bet that many users that want real time are dealing with a huge amount of updates at given times, and that type of thing seems likely to grow. A lot of times its I think it could be much more than a trickle. A lot of installations I have seen have certain times when a lot of documents are coming in (certain times, certain days). Social Networking type sites likely see a constant stream of updates at most times. Press releases have hotspots for release - newspaper data all comes in at once in the morning - etc.
Jason Rutherglen made changes - 28/Aug/09 05:24 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I found it odd that RealtimeWriter is faster than LuceneWriter and so perhaps the benchmark is incorrect somehow. Otherwise the results look highly promising in that we can implement realtime search with no impact to existing indexing performance.
Summary of the results:
numRounds:3 docs indexed:50000
lowest of each, percent compared with lowest
RealtimeWriter:7597 dif:0%
LuceneWriter:12940 dif:70%
LuceneRealtimeWriter:25882 dif:241%