Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15290 Better Docs/Tests/Warnings/Defaults for SignatureUpdateProcessorFactory in SolrCloud
  3. SOLR-15293

Deprecate/remove overwriteDupes option from SignatureUpdateProcessorFactory

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The design principle of the overwriteDupes option of SignatureUpdateProcessorFactory is something that is only viable in single shard use cases, and even then it currently doesn't work because UpdateCommand "options" are not included when Shard Leaders write updates to the tlog, or forwards them to other replicas (SOLR-8030). With multiple shards it can never be viable w/o broadcasting a "Delete By Query" to every replica on every document add/update (SOLR-3473) which is vastly less efficient then the current low level updateDocument(Term,...) support provided by IndexWriter for replacing documents by uniqueKey.

      I think in general we should remove the overwriteDupes option completely. If SignatureUpdateProcessorFactory is used to generate a synthetic uniqueKey field then the existing Solr/Lucene behavior of routing the document to the correct shard, and replacing any prior instances of that doc will work find.

      The functionality of SignatureUpdateProcessorFactory should be constrained solely to generating a signature – if that signature is put in the unique key field, then de-duplication will happen automatically.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hossman Chris M. Hostetter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: