Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1247 Support rolling upgrade and rollback
  3. RATIS-1769

Avoid changing priorities in TransferCommand unless necessary

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0, 2.5.0
    • None
    • None

    Description

      This is a followup of RATIS-1762. TransferCommand should not change priority of peers (or at least not by default).

      Sadly this will break backward compatibility. But version 3.0 hasn't been released, so it might be OK.

      Add a new TransferLeadershipCommand which will not change priority of peers when transfer leadership.
      The old TransferCommand is deprecated and keeped as is for backward compatibility reasons.

      Try to avoid changing priorities before transfer leadership in TransferCommand.
      It will fallback to "transfer leadership by changing priority" for backward compatibility.

      If the new leader's priority is lower than the highest priority, it will first set the its priority to be the same with highest priority.
      Then try to transfer leadership by the process described in RATIS-1762.
      If it still gets TransferLeadershipException("it does not have the highest priority"),
      it will fallback to legacy mode and set the new leader's priority to highest + 1.

      Example

      $ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
      [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 urls, producing 5 keys and 18 values
      [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
      Transferring leadership to peer n1 with address 127.0.0.1:10124
      Transferring leadership initiated

      Backward compatibility

      Ratis shell version: 3.0.0-SNAPSHOT.
      Ratis server version: 2.4.1.

      $ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
      [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls, producing 5 keys and 18 values
      [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
      Transferring leadership to peer n1 with address 127.0.0.1:10124
      Changing priority of peer n1 with address 127.0.0.1:10124 to 4
      Transferring leadership to peer n1 with address 127.0.0.1:10124
      Changing priority of peer n1 with address 127.0.0.1:10124 to 5
      Transferring leadership initiated

      In case of failure

      In most cases, just a retry will fix the problem. And users can also set timeout manually by -timeout option.

      $ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124
      [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls, producing 5 keys and 18 values
      [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
      Transferring leadership to peer n1 with address 127.0.0.1:10124
      Failed to transfer peer n1 with address 127.0.0.1:10124: org.apache.ratis.protocol.exceptions.TransferLeadershipException: n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms): current leader is n2
      	at org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
      	at org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
      	at java.util.Optional.ifPresent(Optional.java:159)
      	at org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
      	at org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
      	at org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
      	at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
      	at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
      	at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55)
      	at java.util.TimerThread.mainLoop(Timer.java:555)
      	at java.util.TimerThread.run(Timer.java:505)

      Attachments

        Issue Links

          Activity

            People

              ckj Kaijie Chen
              ckj Kaijie Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m