Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9883

example solr config files can lead to invalid tlog replays when using add-unknown-fields-to-schema updat chain

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.3
    • Fix Version/s: 6.4, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      The current basic_configs and data_driven_schema_configs try to create unknown fields. The problem is that the date processing "ParseDateFieldUpdateProcessorFactory" is not invoked if the doc is replayed from the tlog. Whether there are other places this is a problem I don't know, this is a concrete example that fails in the field.

      So say I have a pattern for dates that omits the trialing 'Z', as:
      yyyy-MM-dd'T'HH:mm:ss.SSS

      This work fine when the doc is initially indexed. Now say the doc must be replayed from the tlog. The doc errors out with "unknown date format" since (apparently) this doesn't go through the same update chain, perhaps due to the sample configs defining ParseDateFieldUpdateProcessorFactory after DistributedUpdateProcessorFactory?

      1. SOLR-9883.patch
        19 kB
        Steve Rowe
      2. SOLR-9883.patch
        18 kB
        Steve Rowe
      3. SOLR-9883.patch
        7 kB
        Steve Rowe

        Issue Links

          Activity

          Hide
          erickerickson Erick Erickson added a comment -

          There's quite a bit of discussion at SOLR-8030 that's relevant.

          I don't quite know whether the simple expedient of putting the URPs before the DistribUpdateProcessorFactory is sufficient (or safe).

          Show
          erickerickson Erick Erickson added a comment - There's quite a bit of discussion at SOLR-8030 that's relevant. I don't quite know whether the simple expedient of putting the URPs before the DistribUpdateProcessorFactory is sufficient (or safe).
          Hide
          steve_rowe Steve Rowe added a comment -

          Attaching a patch that switches example configs's add-unknown-fields-to-schema update chains so that the DUP is after the AddSchemaFields URPF. In my manual testing (see below), this prevents the data corruption: the buffered tlog entry includes the date normalization. I also made AddSchemaFields URPF implement UpdateRequestProcessorFactory.RunAlways, so that schema modifications will continue to be applied on all replicas (the original rationale for moving the DUP position on SOLR-6137).

          Following an offline reproduction suggestion from Hoss Man, I was able to manually reproduce the data corruption as follows:

          1. Added an artificial 1-minute delay in PeerSync
          2. bin/solr start -e cloud # nodes=2, coll=gettingstarted, shards=1, rf=2, configset=data_driven_schema_configs
          3. curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-09</field></doc></add>'
          4. kill -9 $(cat bin/solr-7574.pid)
          5. curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-10</field></doc></add>'
          6. bin/solr start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983
          7. curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-11</field></doc></add>'

          I had to add step #3 to create a transaction log entry on the 7574 replica prior to shutdown; otherwise on restart it would refuse to perform peer sync, because it didn't know where to start (due to no recent versions in the tlog) and instead initiated full recovery.

          I'm working on an automated data corruption test.

          I want to get this change into the 6.4 release.

          Show
          steve_rowe Steve Rowe added a comment - Attaching a patch that switches example configs's add-unknown-fields-to-schema update chains so that the DUP is after the AddSchemaFields URPF. In my manual testing (see below), this prevents the data corruption: the buffered tlog entry includes the date normalization. I also made AddSchemaFields URPF implement UpdateRequestProcessorFactory.RunAlways , so that schema modifications will continue to be applied on all replicas (the original rationale for moving the DUP position on SOLR-6137 ). Following an offline reproduction suggestion from Hoss Man , I was able to manually reproduce the data corruption as follows: Added an artificial 1-minute delay in PeerSync bin/solr start -e cloud # nodes=2, coll=gettingstarted, shards=1, rf=2, configset=data_driven_schema_configs curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-09</field></doc></add>' kill -9 $(cat bin/solr-7574.pid) curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-10</field></doc></add>' bin/solr start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983 curl -X POST -H 'Content-type: application/xml' http://localhost:8983/solr/gettingstarted/update -d '<add><doc><field name="f_dt">2015-06-11</field></doc></add>' I had to add step #3 to create a transaction log entry on the 7574 replica prior to shutdown; otherwise on restart it would refuse to perform peer sync, because it didn't know where to start (due to no recent versions in the tlog) and instead initiated full recovery. I'm working on an automated data corruption test. I want to get this change into the 6.4 release.
          Hide
          steve_rowe Steve Rowe added a comment -

          Forgot to mention: with the attached patch, I was no longer able to reproduce the data corruption with the above method.

          Show
          steve_rowe Steve Rowe added a comment - Forgot to mention: with the attached patch, I was no longer able to reproduce the data corruption with the above method.
          Hide
          steve_rowe Steve Rowe added a comment -

          Patch with a new automated data corruption test. I tried to make a cloud test, but I couldn't get it to work. Instead, the test in the patch simulates this situation by directly turning on tlog buffering mode in a single core, and sending in an update (with param update.distrib=fromleader) after manually running the "add-unknown-fields-to-schema" update chain on it up through the DUP. The test succeeds with the solr config modifications in the patch, and fails without it.

          The patch also fixes a typos in the replay failure log message (REYPLAY->REPLAY).

          I'm running all Solr tests and precommit now. When they succeed, I'll commit.

          Show
          steve_rowe Steve Rowe added a comment - Patch with a new automated data corruption test. I tried to make a cloud test, but I couldn't get it to work. Instead, the test in the patch simulates this situation by directly turning on tlog buffering mode in a single core, and sending in an update (with param update.distrib=fromleader ) after manually running the "add-unknown-fields-to-schema" update chain on it up through the DUP. The test succeeds with the solr config modifications in the patch, and fails without it. The patch also fixes a typos in the replay failure log message ( REYPLAY -> REPLAY ). I'm running all Solr tests and precommit now. When they succeed, I'll commit.
          Hide
          steve_rowe Steve Rowe added a comment -

          Updated patch, moves config files to temp dir to avoid permission failures when auto-upgrading the schema file to managed-schema. (Didn't see this failure when running from IntelliJ.)

          All Solr tests pass, and precommit passes. Committing shortly.

          Show
          steve_rowe Steve Rowe added a comment - Updated patch, moves config files to temp dir to avoid permission failures when auto-upgrading the schema file to managed-schema . (Didn't see this failure when running from IntelliJ.) All Solr tests pass, and precommit passes. Committing shortly.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9a6ff177b6f7c776cc6bf4625ed2d5dd7cce81d2 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9a6ff17 ]

          SOLR-9883: In example schemaless configs' default update chain, move the DUP to after the AddSchemaFields URP (which is now tagged as RunAlways), to avoid invalid buffered tlog entry replays.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9a6ff177b6f7c776cc6bf4625ed2d5dd7cce81d2 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9a6ff17 ] SOLR-9883 : In example schemaless configs' default update chain, move the DUP to after the AddSchemaFields URP (which is now tagged as RunAlways), to avoid invalid buffered tlog entry replays.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d817fd43eccd67a5d73c3bbc49561de65d3fc9cb in lucene-solr's branch refs/heads/master from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d817fd4 ]

          SOLR-9883: In example schemaless configs' default update chain, move the DUP to after the AddSchemaFields URP (which is now tagged as RunAlways), to avoid invalid buffered tlog entry replays.

          Show
          jira-bot ASF subversion and git services added a comment - Commit d817fd43eccd67a5d73c3bbc49561de65d3fc9cb in lucene-solr's branch refs/heads/master from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d817fd4 ] SOLR-9883 : In example schemaless configs' default update chain, move the DUP to after the AddSchemaFields URP (which is now tagged as RunAlways), to avoid invalid buffered tlog entry replays.

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              erickerickson Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development