Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4051

Strange behavior during rebalance when turning the OS clock back

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0.0
    • 0.10.1.0
    • consumer
    • None
    • OS: Ubuntu 14.04 - 64bits

    Description

      If a rebalance is performed after turning the OS clock back, then the kafka server enters in a loop and the rebalance cannot be completed until the system returns to the previous date/hour.

      Steps to Reproduce:

      • Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner of all the partitions.
      • Turn the system (OS) clock back. For instance 1 hour.
      • Start a new consumer for TOPIC_NAME using the same group id, it will force a rebalance.

      After these actions the kafka server logs constantly display the messages below, and after a while both consumers do not receive more packages. This condition lasts at least the time that the clock went back, for this example 1 hour, and finally after this time kafka comes back to work.

      [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
      [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)

      Due to the fact that some systems could have enabled NTP or an administrator option to change the system clock (date/time) it's important to do it safely, currently the only way to do it safely is following the next steps:

      1- Tear down the Kafka server.
      2- Change the date/time
      3- Tear up the Kafka server.

      But, this approach can be done only if the change was performed by the administrator, not for NTP. Also in many systems turning down the Kafka server might cause the INFORMATION TO BE LOST.

      Attachments

        Issue Links

          Activity

            People

              rsivaram Rajini Sivaram
              gabriel.ibarra Gabriel Ibarra
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: