Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The design principle of the overwriteDupes option of SignatureUpdateProcessorFactory is something that is only viable in single shard use cases, and even then it currently doesn't work because UpdateCommand "options" are not included when Shard Leaders write updates to the tlog, or forwards them to other replicas (SOLR-8030). With multiple shards it can never be viable w/o broadcasting a "Delete By Query" to every replica on every document add/update (SOLR-3473) which is vastly less efficient then the current low level updateDocument(Term,...) support provided by IndexWriter for replacing documents by uniqueKey.
I think in general we should remove the overwriteDupes option completely. If SignatureUpdateProcessorFactory is used to generate a synthetic uniqueKey field then the existing Solr/Lucene behavior of routing the document to the correct shard, and replacing any prior instances of that doc will work find.
The functionality of SignatureUpdateProcessorFactory should be constrained solely to generating a signature – if that signature is put in the unique key field, then de-duplication will happen automatically.