Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9915

PeerSync alreadyInSync check is not backwards compatible

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 6.3
    • Fix Version/s: 6.4
    • Component/s: replication (java)
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    • Flags:
      Patch

      Description

      The fingerprint check added to PeerSync in SOLR-9446 works fine when all servers are running 6.3 but this means it's hard to do a rolling upgrade from e.g. 6.2.1 to 6.3 because the 6.3 server sends a request to a 6.2.1 server to get a fingerprint and then gets a NPE because the older server doesn't return the expected field in its response.

      This leads to the PeerSync completely failing, and results in a full index replication from scratch, copying all index files over the network. We noticed this happening when we tried to do a rolling upgrade on one of our 6.2.1 clusters to 6.3. Unfortunately this amount of replication was hammering our disks and network, so we had to do a full shutdown, upgrade all to 6.3 and restart, which was not ideal for a production cluster.

      The attached patch should behave more gracefully in this situation, as it will typically return false for alreadyInSync() and then carry on doing the normal re-sync based on versions.

        Activity

        Hide
        erickerickson Erick Erickson added a comment -

        Marking as "blocker" so we render a considered opinion before we release 6.4.

        Show
        erickerickson Erick Erickson added a comment - Marking as "blocker" so we render a considered opinion before we release 6.4.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5b1f6b2ba48f8afc6c822c097d0500eb2ed66815 in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b1f6b2 ]

        SOLR-9915: PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5b1f6b2ba48f8afc6c822c097d0500eb2ed66815 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b1f6b2 ] SOLR-9915 : PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1b9564a5dccb2938586f2f82f963bd1534b002cd in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b9564a ]

        SOLR-9915: PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1b9564a5dccb2938586f2f82f963bd1534b002cd in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b9564a ] SOLR-9915 : PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 122fa6cf64a56dd5ab5aff84f7f5c9a1305bde4e in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=122fa6c ]

        SOLR-9915: PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart

        Show
        jira-bot ASF subversion and git services added a comment - Commit 122fa6cf64a56dd5ab5aff84f7f5c9a1305bde4e in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=122fa6c ] SOLR-9915 : PeerSync alreadyInSync check is not backwards compatible and results in full replication during a rolling restart

          People

          • Assignee:
            noble.paul Noble Paul
            Reporter:
            TimOwen Tim Owen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development