Solr
  1. Solr
  2. SOLR-1908

SignatureUpdateProcessor does not fail on invalid config, can lead to deleting all docs (Deduplication)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 4.0-ALPHA
    • Component/s: update
    • Labels:
      None

      Description

      Dedupe removes all documents from the index if overwriteDupes=true and the schema's signature field has indexed=false. The maxDoc value does grow as it always does but numDocs is always zero.

      solrconfig.xml
      <bool name="overwriteDupes">true</bool>
      <str name="signatureField">sig</str>

      schema.xml
      <field name="sig" type="string" stored="true" indexed="false" multiValued="true" />

      mailinglist
      http://lucene.472066.n3.nabble.com/Dedupe-and-overwriteDupes-setting-td809320.html

      log entries

      May 12, 2010 2:35:34 PM org.apache.solr.core.SolrDeletionPolicy onInit
      INFO: SolrDeletionPolicy.onInit: commits:num=1
      commit{dir=/opt/apache/solr/data/index,segFN=segments_1,version=1273667628292,generation=1,filenames=[segments_1]
      May 12, 2010 2:35:34 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
      INFO: newest commit = 1273667628292
      May 12, 2010 2:35:35 PM org.apache.solr.update.processor.LogUpdateProcessor finish
      INFO:

      {add=[<ID's WERE HERE>, ... (8 added)]}

      0 1097
      May 12, 2010 2:35:35 PM org.apache.solr.core.SolrCore execute
      INFO: [] webapp=/solr path=/update params=

      {wt=javabin&version=2.2}

      status=0 QTime=1097
      May 12, 2010 2:35:35 PM org.apache.solr.update.DirectUpdateHandler2 commit
      INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
      May 12, 2010 2:35:35 PM org.apache.solr.core.SolrDeletionPolicy onCommit
      INFO: SolrDeletionPolicy.onCommit: commits:num=2
      commit{dir=/opt/apache/solr/data/index,segFN=segments_1,version=1273667628292,generation=1,filenames=[segments_1]
      commit{dir=/opt/apache/solr/data/index,segFN=segments_2,version=1273667628293,generation=2,filenames=[_0.tis, _0.nrm, _0.fnm, _0.tvd, _0_1.del, _0.tvf, _0.tii, _0.tvx, _0.frq, segments_2, _0.fdx, _0.prx, _0.fdt]
      May 12, 2010 2:35:35 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
      INFO: newest commit = 1273667628293
      May 12, 2010 2:35:35 PM org.apache.solr.search.SolrIndexSearcher <init>
      INFO: Opening Searcher@8ae59a main
      May 12, 2010 2:35:35 PM org.apache.solr.update.DirectUpdateHandler2 commit
      INFO: end_commit_flush

      1. SOLR-1908.patch
        6 kB
        Hoss Man
      2. SOLR-1908.patch
        6 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        First pass at a test & fix – allowed signatureField to be un-indexed (for people who want to compute a signature for other purposes) but fail if if it's un-indexed and overwriteDupes==true.

        Show
        Hoss Man added a comment - First pass at a test & fix – allowed signatureField to be un-indexed (for people who want to compute a signature for other purposes) but fail if if it's un-indexed and overwriteDupes==true.
        Hide
        Hoss Man added a comment -

        after posting the last patch i remembered that URPFs could be SolrCoreAware, so here's a much better fix (with tests) tha catches the problem at init.

        Show
        Hoss Man added a comment - after posting the last patch i remembered that URPFs could be SolrCoreAware, so here's a much better fix (with tests) tha catches the problem at init.
        Hide
        Hoss Man added a comment -

        Revised summary to elaborate on problem

        FYI: further testing made me realize that lots of other tests use solconfig.xml/schema.xml combinations that now fail to init because of hte stricter testing about signatureField, so the final commit will need to update all fo those schema.xml files to contain the signatureField(s) needed.

        Show
        Hoss Man added a comment - Revised summary to elaborate on problem FYI: further testing made me realize that lots of other tests use solconfig.xml/schema.xml combinations that now fail to init because of hte stricter testing about signatureField, so the final commit will need to update all fo those schema.xml files to contain the signatureField(s) needed.
        Hide
        Hoss Man added a comment -

        Committed revision 944463.

        Thanks for reporting this Markus.

        Show
        Hoss Man added a comment - Committed revision 944463. Thanks for reporting this Markus.
        Hide
        Hoss Man added a comment -

        Correcting Fix Version based on CHANGES.txt, see this thread for more details...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Show
        Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          People

          • Assignee:
            Hoss Man
            Reporter:
            Markus Jelsma
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development