Description
This is a followup of RATIS-1762. TransferCommand should not change priority of peers (or at least not by default).
Sadly this will break backward compatibility. But version 3.0 hasn't been released, so it might be OK.
Add a new TransferLeadershipCommand which will not change priority of peers when transfer leadership.
The old TransferCommand is deprecated and keeped as is for backward compatibility reasons.
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward compatibility.
If the new leader's priority is lower than the highest priority, it will first set the its priority to be the same with highest priority.
Then try to transfer leadership by the process described in RATIS-1762.
If it still gets TransferLeadershipException("it does not have the highest priority"),
it will fallback to legacy mode and set the new leader's priority to highest + 1.
Example
$ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124 [main] INFO org.reflections.Reflections - Reflections took 122 ms to scan 1 urls, producing 5 keys and 18 values [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849 Transferring leadership to peer n1 with address 127.0.0.1:10124 Transferring leadership initiated
Backward compatibility
Ratis shell version: 3.0.0-SNAPSHOT.
Ratis server version: 2.4.1.
$ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124 [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls, producing 5 keys and 18 values [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849 Transferring leadership to peer n1 with address 127.0.0.1:10124 Changing priority of peer n1 with address 127.0.0.1:10124 to 4 Transferring leadership to peer n1 with address 127.0.0.1:10124 Changing priority of peer n1 with address 127.0.0.1:10124 to 5 Transferring leadership initiated
In case of failure
In most cases, just a retry will fix the problem. And users can also set timeout manually by -timeout option.
$ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124 [main] INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls, producing 5 keys and 18 values [main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849 Transferring leadership to peer n1 with address 127.0.0.1:10124 Failed to transfer peer n1 with address 127.0.0.1:10124: org.apache.ratis.protocol.exceptions.TransferLeadershipException: n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms): current leader is n2 at org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67) at org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163) at org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136) at org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101) at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38) at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79) at org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505)
Attachments
Issue Links
- links to