Solr
  1. Solr
  2. SOLR-6164

Copy Fields Schema additions are not distributed to other nodes

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I took a closer look at the tests added in SOLR-6145, and the copy fields aren't being distributed – it just looks that way b/c we are adding the same copy field directive multiple times to random servers and by chance, they'll be added to all the servers.

      1. SOLR-6164.patch
        31 kB
        Steve Rowe
      2. SOLR-6164.patch
        31 kB
        Gregory Chanan
      3. SOLR-6164v2.patch
        30 kB
        Gregory Chanan

        Activity

        Hide
        Gregory Chanan added a comment -

        Here's a patch. It does the following:

        • The ZkIndexSchemaReader now adds the copy fields to the managed schema from what is in Zk. This is done by clearing the copy field info and recreating it. That seemed more straightforward than trying to maintain the existing copy fields and diffing what is in Zk.
        • Improves the existing tests in a few ways:
        • Instead of a long sleep and then checking, we loop with short sleeps and exit as soon as we see success.
        • Combines the put (add field), post (add field) and copy field test cases into one. This ensures the commands "play nicely" together and reduces the amount of code / execution time.
        • Adds more testing to the copy fields commands. Randomly decide if the copy field source and dest will be an existing field, a new field, or a dynamic field.
        Show
        Gregory Chanan added a comment - Here's a patch. It does the following: The ZkIndexSchemaReader now adds the copy fields to the managed schema from what is in Zk. This is done by clearing the copy field info and recreating it. That seemed more straightforward than trying to maintain the existing copy fields and diffing what is in Zk. Improves the existing tests in a few ways: Instead of a long sleep and then checking, we loop with short sleeps and exit as soon as we see success. Combines the put (add field), post (add field) and copy field test cases into one. This ensures the commands "play nicely" together and reduces the amount of code / execution time. Adds more testing to the copy fields commands. Randomly decide if the copy field source and dest will be an existing field, a new field, or a dynamic field.
        Hide
        Gregory Chanan added a comment -

        Previous patch used its own Random rather than the RandomizedRunner's version.

        Show
        Gregory Chanan added a comment - Previous patch used its own Random rather than the RandomizedRunner's version.
        Hide
        Steve Rowe added a comment -

        +1, this looks great Gregory Chanan - the code to update copyfields just wasn't there at all previously. I want to get this into the 4.9 release - I'll commit shortly.

        Separately, ManagedIndexSchema.shallowCopy() is geared exclusively toward what addFields() needs - other modifications aren't dealt with. I think it should be reworked to just shallowly copy everything, and then the pieces that need to be can be handled in the reloading code. Not sure how to detect that reloading of any piece is required ... maybe some form of checksum over the sorted serialized pieces? I guess just always reloading - as is currently done for both fields and copyFields now too - is at least correct, if not the most performant solution.

        Show
        Steve Rowe added a comment - +1, this looks great Gregory Chanan - the code to update copyfields just wasn't there at all previously. I want to get this into the 4.9 release - I'll commit shortly. Separately, ManagedIndexSchema.shallowCopy() is geared exclusively toward what addFields() needs - other modifications aren't dealt with. I think it should be reworked to just shallowly copy everything, and then the pieces that need to be can be handled in the reloading code. Not sure how to detect that reloading of any piece is required ... maybe some form of checksum over the sorted serialized pieces? I guess just always reloading - as is currently done for both fields and copyFields now too - is at least correct, if not the most performant solution.
        Hide
        Gregory Chanan added a comment -

        Thanks for taking a look, Steve Rowe.

        I guess just always reloading - as is currently done for both fields and copyFields now too - is at least correct, if not the most performant solution.

        That was my thought for this patch – get something in that works and we can refine later. I think that the idea of a checksum over the serialized pieces is a good one, though of course there's a question of the granularity. Here's a complementary optimization: let's say you have a checksum over all the copyFields and a single copy field is added. As things are now, you'd have to recreate all the copy fields just to add a single one, though that may at least happen rarely. It may make sense to put some unique identifier on each copyfield, so you can easily detect we already have that, so can just skip adding it. It also makes implementing deleting fields/copyFields easier if we ever want to implement that. Although this depends on what we decide in SOLR-6155 I guess; if only one copyfield is allowed, the src/dst becomes a unique identifier.

        Show
        Gregory Chanan added a comment - Thanks for taking a look, Steve Rowe . I guess just always reloading - as is currently done for both fields and copyFields now too - is at least correct, if not the most performant solution. That was my thought for this patch – get something in that works and we can refine later. I think that the idea of a checksum over the serialized pieces is a good one, though of course there's a question of the granularity. Here's a complementary optimization: let's say you have a checksum over all the copyFields and a single copy field is added. As things are now, you'd have to recreate all the copy fields just to add a single one, though that may at least happen rarely. It may make sense to put some unique identifier on each copyfield, so you can easily detect we already have that, so can just skip adding it. It also makes implementing deleting fields/copyFields easier if we ever want to implement that. Although this depends on what we decide in SOLR-6155 I guess; if only one copyfield is allowed, the src/dst becomes a unique identifier.
        Hide
        Steve Rowe added a comment -

        Gregory's patch with a CHANGES.txt entry; also cleaned up the test a little.

        Committing shortly.

        Show
        Steve Rowe added a comment - Gregory's patch with a CHANGES.txt entry; also cleaned up the test a little. Committing shortly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1603300 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1603300 ]

        SOLR-6164: Copy Fields Schema additions are not distributed to other nodes

        Show
        ASF subversion and git services added a comment - Commit 1603300 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1603300 ] SOLR-6164 : Copy Fields Schema additions are not distributed to other nodes
        Hide
        ASF subversion and git services added a comment -

        Commit 1603301 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1603301 ]

        SOLR-6164: add eol-style property to TestCloudManagedSchemaConcurrent.java

        Show
        ASF subversion and git services added a comment - Commit 1603301 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1603301 ] SOLR-6164 : add eol-style property to TestCloudManagedSchemaConcurrent.java
        Hide
        ASF subversion and git services added a comment -

        Commit 1603303 from Steve Rowe in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1603303 ]

        SOLR-6164: Copy Fields Schema additions are not distributed to other nodes (merged trunk r1603300 and r1603301)

        Show
        ASF subversion and git services added a comment - Commit 1603303 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603303 ] SOLR-6164 : Copy Fields Schema additions are not distributed to other nodes (merged trunk r1603300 and r1603301)
        Hide
        ASF subversion and git services added a comment -

        Commit 1603308 from Steve Rowe in branch 'dev/branches/lucene_solr_4_9'
        [ https://svn.apache.org/r1603308 ]

        SOLR-6164: Copy Fields Schema additions are not distributed to other nodes (merged trunk r1603300 and r1603301)

        Show
        ASF subversion and git services added a comment - Commit 1603308 from Steve Rowe in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1603308 ] SOLR-6164 : Copy Fields Schema additions are not distributed to other nodes (merged trunk r1603300 and r1603301)
        Hide
        Steve Rowe added a comment -

        Comitted to trunk, branch_4x and lucene_solr_4_9.

        Thanks Gregory!

        Show
        Steve Rowe added a comment - Comitted to trunk, branch_4x and lucene_solr_4_9. Thanks Gregory!

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Gregory Chanan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development