Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
Lucene's replication module makes it easy to incrementally sync index
changes from a master index to any number of replicas, and it
handles/abstracts all the underlying complexity of holding a
time-expiring snapshot, finding which files need copying, syncing more
than one index (e.g., taxo + index), etc.
But today you must first commit on the master, and then again the
replica's copied files are fsync'd, because the code operates on
commit points. But this isn't "technically" necessary, and it mixes
up durability and fast turnaround time.
Long ago we added near-real-time readers to Lucene, for the same
reason: you shouldn't have to commit just to see the new index
changes.
I think we should do the same for replication: allow the new segments
to be copied out to replica(s), and new NRT readers to be opened, to
fully decouple committing from visibility. This way apps can then
separately choose when to replicate (for freshness), and when to
commit (for durability).
I think for some apps this could be a compelling alternative to the
"re-index all documents on each shard" approach that Solr Cloud /
ElasticSearch implement today, and it may also mean that the
transaction log can remain external to / above the cluster.