Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I was recently helping someone who had very weird symptoms of replicas getting out of sync, that turned out to be because of using SignatureUpdateProcessorFactory to "de-duplicate" documents that had different unique keys, but identical computed "signatures".
Although they had customized the fields configuration of SignatureUpdateProcessorFactory, most of the (bad) behavior came from the defaults:
overwriteDupes = params.getBool("overwriteDupes", true); signatureField = params.get("signatureField", "signatureField");
...in spite of the fact that this combination does not – and has never – worked with SolrCloud: SOLR-3473
I'm opening this issue to serve as a Parent for a few Sub-Tasks. Some of which I hope to takle imminently, and some of which are just ideas for the future.
Attachments
Issue Links
- relates to
-
SOLR-3473 Distributed deduplication broken when using non-uniqueKey for signatureField
- Open
1.
|
Deprecate/remove overwriteDupes option from SignatureUpdateProcessorFactory | Open | Unassigned | |
2.
|
Support "post-indexing" cleanup of documents with duplicate signatures | Open | Unassigned |