Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
Two improvements were added: 8.6 has merge-on-commit (by Froh et. all), 8.7 has merge-on-refresh (by Simon). See MergePolicy.findFullFlushMerges
The original description follows:
With near-real-time search we ask IndexWriter to write all in-memory segments to disk and open an IndexReader to search them, and this is typically a quick operation.
However, when you use many threads for concurrent indexing, IndexWriter will accumulate write many small segments during refresh and this then adds search-time cost as searching must visit all of these tiny segments.
The merge policy would normally quickly coalesce these small segments if given a little time ... so, could we somehow improve {{IndexWriter'}}s refresh to optionally kick off merge policy to merge segments below some threshold before opening the near-real-time reader? It'd be a bit tricky because while we are waiting for merges, indexing may continue, and new segments may be flushed, but those new segments shouldn't be included in the point-in-time segments returned by refresh ...
One could almost do this on top of Lucene today, with a custom merge policy, and some hackity logic to have the merge policy target small segments just written by refresh, but it's tricky to then open a near-real-time reader, excluding newly flushed but including newly merged segments since the refresh originally finished ...
I'm not yet sure how best to solve this, so I wanted to open an issue for discussion!
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-8965 ConcurrentMergeScheduler should maybe sometimes do synchronously
- Resolved
- relates to
-
LUCENE-8331 MergePolicy simulator utility
- Open
-
SOLR-14582 Expose IWC.setMaxCommitMergeWaitMillis as an expert feature in Solr's index config
- Closed
- links to