Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.4, 5.5.2, 6.2
    • Fix Version/s: 6.2.1, 6.3, 7.0
    • Component/s: SolrCloud
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      As found in SOLR-9438, the Shard split can fail to write commit data on shutdown because it doesn't explicitly call commit. This causes replication to fail because without the commit data, the master version is always 0 which is assumed to mean an empty index.

      1. SOLR-9488.patch
        14 kB
        Shalin Shekhar Mangar
      2. SOLR-9488.patch
        14 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Patch with test and fix.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Patch with test and fix.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Patch updated to use control node to issue add replica. This reduces false failures due to NoHttpResponseExceptions because the cloud client can choose the node being restarted for the add replica command. This is ready. I'll commit shortly.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Patch updated to use control node to issue add replica. This reduces false failures due to NoHttpResponseExceptions because the cloud client can choose the node being restarted for the add replica command. This is ready. I'll commit shortly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0c5c0df6bc8d3738ef6ed071a0f51913f804dde1 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0c5c0df ]

        SOLR-9488: Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0c5c0df6bc8d3738ef6ed071a0f51913f804dde1 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0c5c0df ] SOLR-9488 : Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 7370407d225e4c2e36b1cdd86e2ada3130b2840d in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7370407 ]

        SOLR-9488: Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index
        (cherry picked from commit 0c5c0df)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 7370407d225e4c2e36b1cdd86e2ada3130b2840d in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7370407 ] SOLR-9488 : Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index (cherry picked from commit 0c5c0df)
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        I was wondering why SolrCloud is not affected by this bug. This is because on shutdown, the DirectUpdateHandler2 commits explicitly and writes the commit data instead of relying on IndexWriter.close. The reason why this did not happen for shard split is because DUH2 commits on close only if the update log has uncommitted changes and is active. Both of those conditions are false when we split shards and merge the index directly into a core.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - I was wondering why SolrCloud is not affected by this bug. This is because on shutdown, the DirectUpdateHandler2 commits explicitly and writes the commit data instead of relying on IndexWriter.close. The reason why this did not happen for shard split is because DUH2 commits on close only if the update log has uncommitted changes and is active. Both of those conditions are false when we split shards and merge the index directly into a core.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Re-opened to back-port to 6.2.1

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Re-opened to back-port to 6.2.1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0af3d4e8e6ff1fa38d59739f30cfffea65a8033c in lucene-solr's branch refs/heads/branch_6_2 from Shalin Shekhar Mangar
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0af3d4e ]

        SOLR-9488: Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index
        (cherry picked from commit 0c5c0df)

        (cherry picked from commit 7370407)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0af3d4e8e6ff1fa38d59739f30cfffea65a8033c in lucene-solr's branch refs/heads/branch_6_2 from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0af3d4e ] SOLR-9488 : Shard split can fail to write commit data on shutdown/restart causing replicas to recover without replicating the index (cherry picked from commit 0c5c0df) (cherry picked from commit 7370407)
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Closing after 6.2.1 release

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.2.1 release

          People

          • Assignee:
            shalinmangar Shalin Shekhar Mangar
            Reporter:
            shalinmangar Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development