Description
The current state of things (as of 8.8) is that SignatureUpdateProcessorFactory CAN be safely used in in SolrCloud for two possible usecases:
- For de-duplication:
- the signatureField MUST be the uniqueKey field AND the processor MUST be configured to run prior to DistributedUpdateProcessor
- Solely for generating signatures, w/o de-duplication
- overwriteDupes MUST be set to false ... any signatureField may be used, and it may run at any point in the processor chain
If you attempt to use SignatureUpdateProcessorFactory for de-duplication w/ a non-uniqueKey signature field, one of two failure situations are likely to arise:
- in a multi-shard collection, documents with identical signatureField values will not be removed from any shard (leader) other then the one the document is routed to (by it's id)
- even in a single-shard collection, with multiple replicas, documents with identical signatureField values will only be deleted on the 'leader' and not on any other replicas, because the leader does not propogate the AddUpdateCommand.updateTerm computed by the SignatureUpdateProcessorFactory to each of it's shards
Solr's deduplication via the SignatureUpdateProcessor is broken for distributed updates on SolrCloud.
Mark Miller:
Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert update commands into solr documents - and that can cause a loss of info if an update proc modifies the update command.
I think the reason that you see a multiple values error when you try the other order is because of the lack of a document clone (the other issue I mentioned a few emails back). Addressing that won't solve your issue though - we have to come up with a way to propagate the currently lost info on the update command.
Please see the ML thread for the full discussion: http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
Attachments
Attachments
Issue Links
- is related to
-
SOLR-2822 don't run update processors twice
- Closed
-
SOLR-4016 Deduplication is broken by partial update
- Closed
-
SOLR-15290 Better Docs/Tests/Warnings/Defaults for SignatureUpdateProcessorFactory in SolrCloud
- Open
- relates to
-
SOLR-3215 We should clone the SolrInputDocument before adding locally and then send that clone to replicas.
- Closed