Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9236

AutoAddReplicas feature with one replica loses some documents not committed during failover

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: trunk, 6.2
    • Component/s: hdfs, SolrCloud
    • Labels:
      None
    • Flags:
      Patch

      Description

      I need to index huge amount of logs, so I decide to use AutoAddReplica feature with only one replica.
      When using AutoAddReplicas with one replica, some benefits are expected.

      • no redundant data files for replicas
        • saving disk usage
      • best indexing performance

      I expected that Solr fails over just like HBase.
      The feature worked almost as it was expected, except for some missing documents during failover.
      I found two reasons for the missing.

      1. The leader replica does not replay any transaction logs. But when there is only one replica, it should be the leader.
      So I made the leader replica replay the transaction logs when using AutoAddReplicas with on replica.

      But the above fix did not resolve the problem.

      2. As failover occurred, the transaction log directory had a deeper directory depth. Just like this, tlog/tlog/tlog/...
      The transaction log could not be replayed, because the transaction log directory was changed during failover.
      So I made the transaction log directory not changed during failover.

      After these fixes, AutoAddReplicas with one replica fails over well without losing any documents.

      1. SOLR-9236.patch
        13 kB
        Mark Miller
      2. SOLR-9236.patch
        16 kB
        Eungsop Yoo

        Activity

        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        the leader does not replay any transaction logs

        I think that may actually be a mistake.

        Here is a patch that uses your test additions and attempts to fix a bit differently.

        Show
        markrmiller@gmail.com Mark Miller added a comment - the leader does not replay any transaction logs I think that may actually be a mistake. Here is a patch that uses your test additions and attempts to fix a bit differently.
        Hide
        Eungsop Yoo Eungsop Yoo added a comment -

        LGTM

        Show
        Eungsop Yoo Eungsop Yoo added a comment - LGTM
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        the leader does not replay any transaction logs

        I think that may actually be a mistake.

        Actually, I guess I was reading it wrong. That should have no affect here. That isleader check is only taken into account when a slice is under construction for shard spitting or migration or something.

        I think we just need the extra /tlog appending fix.

        Show
        markrmiller@gmail.com Mark Miller added a comment - the leader does not replay any transaction logs I think that may actually be a mistake. Actually, I guess I was reading it wrong. That should have no affect here. That isleader check is only taken into account when a slice is under construction for shard spitting or migration or something. I think we just need the extra /tlog appending fix.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 546093812c34610075ee130f7466eca1979cfbeb in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5460938 ]

        SOLR-9236: AutoAddReplicas will append an extra /tlog to the update log location on replica failover.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 546093812c34610075ee130f7466eca1979cfbeb in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5460938 ] SOLR-9236 : AutoAddReplicas will append an extra /tlog to the update log location on replica failover.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit dc763e383d4293f9dc235bcc63ae2fba582574ff in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc763e3 ]

        SOLR-9236: AutoAddReplicas will append an extra /tlog to the update log location on replica failover.

        Show
        jira-bot ASF subversion and git services added a comment - Commit dc763e383d4293f9dc235bcc63ae2fba582574ff in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc763e3 ] SOLR-9236 : AutoAddReplicas will append an extra /tlog to the update log location on replica failover.
        Hide
        steve_rowe Steve Rowe added a comment -

        Mark Miller, test compilation is broken on master & branch_6x, I think because of this issue - here's the branch_6x error I get:

        common.compile-test:
            [mkdir] Created dir: /Users/sarowe/git/lucene-solr-2/solr/build/solr-core/classes/test
            [javac] Compiling 675 source files to /Users/sarowe/git/lucene-solr-2/solr/build/solr-core/classes/test
            [javac] /Users/sarowe/git/lucene-solr-2/solr/core/src/test/org/apache/solr/cloud/SharedFSAutoReplicaFailoverTest.java:65: error: cannot find symbol
            [javac] @Nightly
            [javac]  ^
            [javac]   symbol: class Nightly
            [javac] Note: Some input files use or override a deprecated API.
            [javac] Note: Recompile with -Xlint:deprecation for details.
            [javac] Note: Some input files use unchecked or unsafe operations.
            [javac] Note: Recompile with -Xlint:unchecked for details.
            [javac] 1 error
        
        Show
        steve_rowe Steve Rowe added a comment - Mark Miller , test compilation is broken on master & branch_6x, I think because of this issue - here's the branch_6x error I get: common.compile-test: [mkdir] Created dir: /Users/sarowe/git/lucene-solr-2/solr/build/solr-core/classes/test [javac] Compiling 675 source files to /Users/sarowe/git/lucene-solr-2/solr/build/solr-core/classes/test [javac] /Users/sarowe/git/lucene-solr-2/solr/core/src/test/org/apache/solr/cloud/SharedFSAutoReplicaFailoverTest.java:65: error: cannot find symbol [javac] @Nightly [javac] ^ [javac] symbol: class Nightly [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 1 error
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Whoops. Removed commenting out the nightly annotation, but bringing back the missing import didn't make the commit.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Whoops. Removed commenting out the nightly annotation, but bringing back the missing import didn't make the commit.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit c58841558354c9b27fe502cad60907a62645bf3b in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c588415 ]

        SOLR-9236: Fix import.

        Show
        jira-bot ASF subversion and git services added a comment - Commit c58841558354c9b27fe502cad60907a62645bf3b in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c588415 ] SOLR-9236 : Fix import.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit e91863b8e1b09c31e7b0c0e828b594ec9c022547 in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e91863b ]

        SOLR-9236: Fix import.

        Show
        jira-bot ASF subversion and git services added a comment - Commit e91863b8e1b09c31e7b0c0e828b594ec9c022547 in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e91863b ] SOLR-9236 : Fix import.
        Hide
        steve_rowe Steve Rowe added a comment -

        Mark Miller, did you see the following Jenkins precommit failure?:

        [forbidden-apis] Forbidden method invocation: java.lang.System#currentTimeMillis() [Use RTimer/TimeOut/System.nanoTime for time comparisons, and `new Date()` output/debugging/stats of timestamps. If for some miscellaneous reason, you absolutely need to use this, use a SuppressForbidden.]
        [forbidden-apis]   in org.apache.solr.cloud.SharedFSAutoReplicaFailoverTest (SharedFSAutoReplicaFailoverTest.java:317)
        [forbidden-apis] Forbidden method invocation: java.lang.System#currentTimeMillis() [Use RTimer/TimeOut/System.nanoTime for time comparisons, and `new Date()` output/debugging/stats of timestamps. If for some miscellaneous reason, you absolutely need to use this, use a SuppressForbidden.]
        [forbidden-apis]   in org.apache.solr.cloud.SharedFSAutoReplicaFailoverTest (SharedFSAutoReplicaFailoverTest.java:321)
        [forbidden-apis] Scanned 3254 (and 2112 related) class file(s) for forbidden API invocations (in 5.11s), 2 error(s).
        
        Show
        steve_rowe Steve Rowe added a comment - Mark Miller , did you see the following Jenkins precommit failure?: [forbidden-apis] Forbidden method invocation: java.lang.System#currentTimeMillis() [Use RTimer/TimeOut/System.nanoTime for time comparisons, and `new Date()` output/debugging/stats of timestamps. If for some miscellaneous reason, you absolutely need to use this, use a SuppressForbidden.] [forbidden-apis] in org.apache.solr.cloud.SharedFSAutoReplicaFailoverTest (SharedFSAutoReplicaFailoverTest.java:317) [forbidden-apis] Forbidden method invocation: java.lang.System#currentTimeMillis() [Use RTimer/TimeOut/System.nanoTime for time comparisons, and `new Date()` output/debugging/stats of timestamps. If for some miscellaneous reason, you absolutely need to use this, use a SuppressForbidden.] [forbidden-apis] in org.apache.solr.cloud.SharedFSAutoReplicaFailoverTest (SharedFSAutoReplicaFailoverTest.java:321) [forbidden-apis] Scanned 3254 (and 2112 related) class file(s) for forbidden API invocations (in 5.11s), 2 error(s).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 91a9d96454c80f5f414170c4231c5b22fb094215 in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=91a9d96 ]

        SOLR-9236: Don't use System.currentTimeMillis.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 91a9d96454c80f5f414170c4231c5b22fb094215 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=91a9d96 ] SOLR-9236 : Don't use System.currentTimeMillis.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b0386f0d643febdaed32575cccc84eb06af08f5c in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0386f0 ]

        SOLR-9236: Don't use System.currentTimeMillis.

        Show
        jira-bot ASF subversion and git services added a comment - Commit b0386f0d643febdaed32575cccc84eb06af08f5c in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b0386f0 ] SOLR-9236 : Don't use System.currentTimeMillis.
        Hide
        mdrob Mike Drob added a comment -

        Is there work left on this or can we close? Looks done to me?

        Show
        mdrob Mike Drob added a comment - Is there work left on this or can we close? Looks done to me?
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            markrmiller@gmail.com Mark Miller
            Reporter:
            Eungsop Yoo Eungsop Yoo
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development