[KAFKA-4051] Strange behavior during rebalance when turning the OS clock back - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.0.0
Fix Version/s: 0.10.1.0
Component/s: consumer
Labels:
None
Environment:
OS: Ubuntu 14.04 - 64bits

Description

If a rebalance is performed after turning the OS clock back, then the kafka server enters in a loop and the rebalance cannot be completed until the system returns to the previous date/hour.

Steps to Reproduce:

Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner of all the partitions.
Turn the system (OS) clock back. For instance 1 hour.
Start a new consumer for TOPIC_NAME using the same group id, it will force a rebalance.

After these actions the kafka server logs constantly display the messages below, and after a while both consumers do not receive more packages. This condition lasts at least the time that the clock went back, for this example 1 hour, and finally after this time kafka comes back to work.

[2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
[2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)

Due to the fact that some systems could have enabled NTP or an administrator option to change the system clock (date/time) it's important to do it safely, currently the only way to do it safely is following the next steps:

1- Tear down the Kafka server.
2- Change the date/time
3- Tear up the Kafka server.

But, this approach can be done only if the change was performed by the administrator, not for NTP. Also in many systems turning down the Kafka server might cause the INFORMATION TO BE LOST.

Attachments

Issue Links

links to

GitHub Pull Request #1768

Activity

Ascending order - Click to sort in descending order

Ismael Juma added a comment - 18/Aug/16 10:51

As discussed in the thread, Kafka uses System.currentTimeMillis() in a number of places, which means that changing the system clock backwards is bound to cause issues. I wouldn't be surprised if there are other problems in addition to the one mentioned here.

Ismael Juma added a comment - 18/Aug/16 10:51 As discussed in the thread, Kafka uses System.currentTimeMillis() in a number of places, which means that changing the system clock backwards is bound to cause issues. I wouldn't be surprised if there are other problems in addition to the one mentioned here.

Rajini Sivaram added a comment - 18/Aug/16 11:14

I think the issue is around the handling of timer tasks in Kafka. While task expiry is set using System.currentTimeMillis which can move backwards (as reported in this JIRA), the internal timers in the TimingWheel in Kafka used to handle expiry is a monotonically increasing timer that starts with System.currentTimeMillis. This mismatch causes expiry of tasks until System.currentTimeMillis catches up with the internal timer.

As ijuma has pointed out on the mailing list, Kafka uses System.currentTimeMillis in a lot of places and switching to System.nanoTime everywhere could impact performance. We have a few choices on fixing this JIRA (in increasing order of complexity)

We could switch over to System.nanoTime for TimerTasks alone to fix the issue with delayed tasks reported here
It may be possible to change the timer implementation to recover better when wall clock time moves backwards
Replace System.currentTimeMillis with System.nanoTime in time comparisons throughout Kafka code

I am inclined to do 1) and run performance tests, but am interested in what others think.

Rajini Sivaram added a comment - 18/Aug/16 11:14 I think the issue is around the handling of timer tasks in Kafka. While task expiry is set using System.currentTimeMillis which can move backwards (as reported in this JIRA), the internal timers in the TimingWheel in Kafka used to handle expiry is a monotonically increasing timer that starts with System.currentTimeMillis . This mismatch causes expiry of tasks until System.currentTimeMillis catches up with the internal timer. As ijuma has pointed out on the mailing list, Kafka uses System.currentTimeMillis in a lot of places and switching to System.nanoTime everywhere could impact performance. We have a few choices on fixing this JIRA (in increasing order of complexity) We could switch over to System.nanoTime for TimerTasks alone to fix the issue with delayed tasks reported here It may be possible to change the timer implementation to recover better when wall clock time moves backwards Replace System.currentTimeMillis with System.nanoTime in time comparisons throughout Kafka code I am inclined to do 1) and run performance tests, but am interested in what others think.

Rajini Sivaram added a comment - 18/Aug/16 11:26

ijuma As you have pointed out, there are inevitably going to be issues since Kafka uses System.currentTimeMillis in so many places. But typically, you would expect a single clock change to cause one expiry and thereafter continue to work with the changed timer (eg. producer metadata get expires and retry works since it is on the updated clock). The issue in this JIRA is that the broker doesn't recover until the wall clock time reaches the previously set time. I imagine changing the clock back by an hour is an uncommon scenario, but the impact is quite big if it does happen. If we are fixing this issue, it will be useful to have a system test to check that Kafka continues to function after a major clock change.

Rajini Sivaram added a comment - 18/Aug/16 11:26 ijuma As you have pointed out, there are inevitably going to be issues since Kafka uses System.currentTimeMillis in so many places. But typically, you would expect a single clock change to cause one expiry and thereafter continue to work with the changed timer (eg. producer metadata get expires and retry works since it is on the updated clock). The issue in this JIRA is that the broker doesn't recover until the wall clock time reaches the previously set time. I imagine changing the clock back by an hour is an uncommon scenario, but the impact is quite big if it does happen. If we are fixing this issue, it will be useful to have a system test to check that Kafka continues to function after a major clock change.

Ismael Juma added a comment - 18/Aug/16 11:29

rsivaram, option 1 seems worth a try. Note that it would have to be tested on Linux and Mac to verify that it fixes the underlying issue (it would be good to test on Windows too, but we have known issues in that platform so should not be a blocker, I guess). The JDK implementation relies on the OS (and with some JDK-level workarounds sometimes), so it's platform dependent.

gwenshap had some concerns on the usage of System.nanoTime with regards to the impact on loaded systems so it would be good to get her thoughts too.

Ismael Juma added a comment - 18/Aug/16 11:29 rsivaram , option 1 seems worth a try. Note that it would have to be tested on Linux and Mac to verify that it fixes the underlying issue (it would be good to test on Windows too, but we have known issues in that platform so should not be a blocker, I guess). The JDK implementation relies on the OS (and with some JDK-level workarounds sometimes), so it's platform dependent. gwenshap had some concerns on the usage of System.nanoTime with regards to the impact on loaded systems so it would be good to get her thoughts too.

Gwen Shapira added a comment - 19/Aug/16 03:32

I was definitely concerned about complete replacement throughout the code. Especially when there was no justification in most places.

I think that going with option 1, doing performance comparisons and documenting exactly why we use nanotime there will be good.

Gwen Shapira added a comment - 19/Aug/16 03:32 I was definitely concerned about complete replacement throughout the code. Especially when there was no justification in most places. I think that going with option 1, doing performance comparisons and documenting exactly why we use nanotime there will be good.

Rajini Sivaram added a comment - 19/Aug/16 08:47

ijuma gwenshap Thank you both for the feedback. I will try it out locally, and run performance tests. I can test on Linux and Mac.

Rajini Sivaram added a comment - 19/Aug/16 08:47 ijuma gwenshap Thank you both for the feedback. I will try it out locally, and run performance tests. I can test on Linux and Mac.

Onur Karaman added a comment - 19/Aug/16 23:01

Is there a legitimate reason for rolling back your clock, especially by an hour? Unless there's a legitimate reason, I don't see the value in investigating a solution further.

Based on some quick reading, when the delta in times grows large (it was hard for me to tell if the threshold is 128 ms or 128 s, the docs mention both), ntpd may either step the time back or do slew-based correction depending on how you configure it.

warning: I know next to nothing about ntp!

reference: http://doc.ntp.org/4.2.4/ntpd.html#op

Onur Karaman added a comment - 19/Aug/16 23:01 Is there a legitimate reason for rolling back your clock, especially by an hour? Unless there's a legitimate reason, I don't see the value in investigating a solution further. Based on some quick reading, when the delta in times grows large (it was hard for me to tell if the threshold is 128 ms or 128 s, the docs mention both), ntpd may either step the time back or do slew-based correction depending on how you configure it. warning: I know next to nothing about ntp! reference: http://doc.ntp.org/4.2.4/ntpd.html#op

ASF GitHub Bot added a comment - 22/Aug/16 09:59

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1768

~~KAFKA-4051~~: Use nanosecond clock for timers in broker

Use System.nanoseconds instead of System.currentTimeMillis in broker timer tasks to cope with changes to wall-clock time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka ~~KAFKA-4051~~

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1768.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1768

commit 366659f9a6a0212030349bc54e0c798e3d6c2122
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Date: 2016-08-19T10:13:03Z

~~KAFKA-4051~~: Use nanosecond clock for timers in broker

ASF GitHub Bot added a comment - 22/Aug/16 09:59 GitHub user rajinisivaram opened a pull request: https://github.com/apache/kafka/pull/1768 KAFKA-4051 : Use nanosecond clock for timers in broker Use System.nanoseconds instead of System.currentTimeMillis in broker timer tasks to cope with changes to wall-clock time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajinisivaram/kafka KAFKA-4051 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1768.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1768 commit 366659f9a6a0212030349bc54e0c798e3d6c2122 Author: Rajini Sivaram <rajinisivaram@googlemail.com> Date: 2016-08-19T10:13:03Z KAFKA-4051 : Use nanosecond clock for timers in broker

Rajini Sivaram added a comment - 22/Aug/16 10:26

Agree with onurkaraman that this is an unusual scenario. But the fix is a small change that can be seen as one small step towards improving Kafka's resilience to wall-clock time changes. The PR avoids the need for a broker restart in this case, but perhaps ibarra can provide more context to the problem scenario.

Rajini Sivaram added a comment - 22/Aug/16 10:26 Agree with onurkaraman that this is an unusual scenario. But the fix is a small change that can be seen as one small step towards improving Kafka's resilience to wall-clock time changes. The PR avoids the need for a broker restart in this case, but perhaps ibarra can provide more context to the problem scenario.

Gabriel Ibarra added a comment - 22/Aug/16 15:23

Hi, I'm sorry for the delay.
I agree with you guys, it is not a typical scenario, it is even difficult for me to think about the reasons for a System Administrator to change the data/time; but he can, and as Rajini Sivaram said the impact is quite big if it does happen.

Added to that, our system is working in a VM with NTP, and we see some spurious changes on the system time (this way we detected this issue), we are now analyzing why the time is changing, but we suspect that the system start with the hardware time and then NTP synchronize the system clock using the time zone configured in the VM.

It is a good news that it could be fixed with small changes. Great Job Rajini

Gabriel Ibarra added a comment - 22/Aug/16 15:23 Hi, I'm sorry for the delay. I agree with you guys, it is not a typical scenario, it is even difficult for me to think about the reasons for a System Administrator to change the data/time; but he can, and as Rajini Sivaram said the impact is quite big if it does happen. Added to that, our system is working in a VM with NTP, and we see some spurious changes on the system time (this way we detected this issue), we are now analyzing why the time is changing, but we suspect that the system start with the hardware time and then NTP synchronize the system clock using the time zone configured in the VM. It is a good news that it could be fixed with small changes. Great Job Rajini

Gwen Shapira added a comment - 22/Aug/16 22:58

Issue resolved by pull request 1768
https://github.com/apache/kafka/pull/1768

Gwen Shapira added a comment - 22/Aug/16 22:58 Issue resolved by pull request 1768 https://github.com/apache/kafka/pull/1768

ASF GitHub Bot added a comment - 22/Aug/16 22:59

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1768

ASF GitHub Bot added a comment - 22/Aug/16 22:59 Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/1768