Solr
  1. Solr
  2. SOLR-6145

Concurrent Schema API field additions can result in endless loop

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      The optimistic concurrency loop in ManagedIndexSchema.addFields() is the likely culprit.

      1. concurrent_updates_and_schema_api.patch
        3 kB
        Steve Rowe
      2. SOLR-6145.patch
        36 kB
        Steve Rowe
      3. SOLR-6145.patch
        34 kB
        Steve Rowe
      4. SOLR-6145.patch
        23 kB
        Steve Rowe
      5. SOLR-6145.patch
        4 kB
        Gregory Chanan
      6. SOLR-6145-tests.patch
        20 kB
        Gregory Chanan
      7. SOLR-6145v2.patch
        9 kB
        Gregory Chanan

        Activity

        Hide
        Steve Rowe added a comment -

        Alexey Serba gave me the attached patch, to TestCloudManagedSchemaAddField, that triggers the problem.

        Show
        Steve Rowe added a comment - Alexey Serba gave me the attached patch, to TestCloudManagedSchemaAddField , that triggers the problem.
        Hide
        Gregory Chanan added a comment -

        Here's a patch that fixes the test.

        The basic idea is that retrying on a BadVersionException won't help the ManagedIndexSchema, since the version will always be past our version. Instead, we just throw an exception that can be caught by the callers and retried. I only put the retry logic in FieldResource for now, but there's a few more places where it should go if we like this approach.

        I originally thought of doing the retry logic inside of ManagedIndexSchema, so we wouldn't need to put it in as many places, but that seemed not as clean, since the older schema would be getting the newer schema (presumably from the core), which seems wrong since a schema should exist independently of a core.

        Show
        Gregory Chanan added a comment - Here's a patch that fixes the test. The basic idea is that retrying on a BadVersionException won't help the ManagedIndexSchema, since the version will always be past our version. Instead, we just throw an exception that can be caught by the callers and retried. I only put the retry logic in FieldResource for now, but there's a few more places where it should go if we like this approach. I originally thought of doing the retry logic inside of ManagedIndexSchema, so we wouldn't need to put it in as many places, but that seemed not as clean, since the older schema would be getting the newer schema (presumably from the core), which seems wrong since a schema should exist independently of a core.
        Hide
        Gregory Chanan added a comment -

        Two patches:

        The tests patch is just taking the existing test as a template and testing copyFields and adding fields via POST. I verified that these tests fail consistently without the patch.

        The v2 patch handles the copyfields and adding fields via POST cases. The POST case has a bit of extra logic for making the optimistic concurrency control failure look like what would happen if the concurrent schema changes were actually serialized.

        Show
        Gregory Chanan added a comment - Two patches: The tests patch is just taking the existing test as a template and testing copyFields and adding fields via POST. I verified that these tests fail consistently without the patch. The v2 patch handles the copyfields and adding fields via POST cases. The POST case has a bit of extra logic for making the optimistic concurrency control failure look like what would happen if the concurrent schema changes were actually serialized.
        Hide
        Steve Rowe added a comment -

        +1, looks good Gregory Chanan.

        This patch includes your tests patch and your non-test patch. I removed some commented out test code, and added a verification step to TestCloudManagedSchemaCopyFields.

        I noticed that adding the same copy field directive multiple times results in multiple identical copies of the same directive in the schema. This is wrong. I'll open a separate issue.

        I'll commit this in a day or so.

        Show
        Steve Rowe added a comment - +1, looks good Gregory Chanan . This patch includes your tests patch and your non-test patch. I removed some commented out test code, and added a verification step to TestCloudManagedSchemaCopyFields . I noticed that adding the same copy field directive multiple times results in multiple identical copies of the same directive in the schema. This is wrong. I'll open a separate issue. I'll commit this in a day or so.
        Hide
        Steve Rowe added a comment -

        I noticed that adding the same copy field directive multiple times results in multiple identical copies of the same directive in the schema. This is wrong. I'll open a separate issue.

        SOLR-6155

        Show
        Steve Rowe added a comment - I noticed that adding the same copy field directive multiple times results in multiple identical copies of the same directive in the schema. This is wrong. I'll open a separate issue. SOLR-6155
        Hide
        Steve Rowe added a comment -

        Patch, removes the useless optimistic concurrency loops from ManagedIndexSchema.add(Copy)Fields(), and also modifies the consumers of those methods to handle failures that won't benefit from retrying.

        Committing shortly.

        Show
        Steve Rowe added a comment - Patch, removes the useless optimistic concurrency loops from ManagedIndexSchema.add(Copy)Fields(), and also modifies the consumers of those methods to handle failures that won't benefit from retrying. Committing shortly.
        Hide
        Steve Rowe added a comment -

        Final patch, added no-op catch clause for SchemaChangedInZkException to the optimistic concurrency loop in AddSchemaFieldsUpdateProcessor.processAdd().

        Committing now.

        Show
        Steve Rowe added a comment - Final patch, added no-op catch clause for SchemaChangedInZkException to the optimistic concurrency loop in AddSchemaFieldsUpdateProcessor.processAdd() . Committing now.
        Hide
        ASF subversion and git services added a comment -

        Commit 1601770 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1601770 ]

        SOLR-6145: Fix Schema API optimistic concurrency by moving it out of ManagedIndexSchema.add(Copy)Fields() into the consumers of those methods: CopyFieldCollectionResource, FieldCollectionResource, FieldResource, and AddSchemaFieldsUpdateProcessorFactory.

        Show
        ASF subversion and git services added a comment - Commit 1601770 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1601770 ] SOLR-6145 : Fix Schema API optimistic concurrency by moving it out of ManagedIndexSchema.add(Copy)Fields() into the consumers of those methods: CopyFieldCollectionResource, FieldCollectionResource, FieldResource, and AddSchemaFieldsUpdateProcessorFactory.
        Hide
        ASF subversion and git services added a comment -

        Commit 1601775 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1601775 ]

        SOLR-6145: In AddSchemaFieldsUpdateProcessor.processAdd(), handle ManagedIndexSchema.addFields() failures that won't benefit from retrying.

        Show
        ASF subversion and git services added a comment - Commit 1601775 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1601775 ] SOLR-6145 : In AddSchemaFieldsUpdateProcessor.processAdd(), handle ManagedIndexSchema.addFields() failures that won't benefit from retrying.
        Hide
        ASF subversion and git services added a comment -

        Commit 1601776 from Steve Rowe in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1601776 ]

        SOLR-6145: Fix Schema API optimistic concurrency by moving it out of ManagedIndexSchema.add(Copy)Fields() into the consumers of those methods: CopyFieldCollectionResource, FieldCollectionResource, FieldResource, and AddSchemaFieldsUpdateProcessorFactory. (merged trunk r1601770 and r1601775)

        Show
        ASF subversion and git services added a comment - Commit 1601776 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1601776 ] SOLR-6145 : Fix Schema API optimistic concurrency by moving it out of ManagedIndexSchema.add(Copy)Fields() into the consumers of those methods: CopyFieldCollectionResource, FieldCollectionResource, FieldResource, and AddSchemaFieldsUpdateProcessorFactory. (merged trunk r1601770 and r1601775)
        Hide
        Steve Rowe added a comment -

        Committed to trunk and branch_4x.

        Thanks Gregory and Alexey!

        Show
        Steve Rowe added a comment - Committed to trunk and branch_4x. Thanks Gregory and Alexey!

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Steve Rowe
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development