ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-366

Session timeout detection can go wrong if the leader system time changes

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: quorum, server
    • Labels:
      None

      Description

      the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire.

      this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions.

      1. ZOOKEEPER-366.patch
        0.8 kB
        Benjamin Reed

        Issue Links

          Activity

          Hide
          Benjamin Reed added a comment -

          after discussion this on the list, we realized that we can detect a big jump in time change in the session expiration thread. since we expire a bucket of sessions each tick, if we run into the situation where we are going to expire more than one bucket in a row, we know we have jumped forward in time. we can "smooth" the jump by requiring at least a 1/2 ticktime wait between each bucket.

          Show
          Benjamin Reed added a comment - after discussion this on the list, we realized that we can detect a big jump in time change in the session expiration thread. since we expire a bucket of sessions each tick, if we run into the situation where we are going to expire more than one bucket in a row, we know we have jumped forward in time. we can "smooth" the jump by requiring at least a 1/2 ticktime wait between each bucket.
          Hide
          Benjamin Reed added a comment -

          this patch smooths out the effect of a radical time change by always sleeping at least 1/2 tickTime. this means that if we really needed to do a big jump forward, it will take up 1/2 of the jump to converge on the real time. because clients ping for idle times of 1/3 the timeout, there should be few sessions that expire. we could reduce that number, but take even longer to converge if we always sleep at least 3/4 of the tickTime.

          Show
          Benjamin Reed added a comment - this patch smooths out the effect of a radical time change by always sleeping at least 1/2 tickTime. this means that if we really needed to do a big jump forward, it will take up 1/2 of the jump to converge on the real time. because clients ping for idle times of 1/3 the timeout, there should be few sessions that expire. we could reduce that number, but take even longer to converge if we always sleep at least 3/4 of the tickTime.
          Hide
          Benjamin Reed added a comment -

          anyone have an idea of how to test this? i need to mock System.currentTimeMillis().

          Show
          Benjamin Reed added a comment - anyone have an idea of how to test this? i need to mock System.currentTimeMillis().
          Hide
          Patrick Hunt added a comment -

          perhaps we should have a "Clock" utility that normally wraps System.currentTimeMillis() but can be mocked for testing purposes. If you want to do mocks we should do it via mockito.

          Show
          Patrick Hunt added a comment - perhaps we should have a "Clock" utility that normally wraps System.currentTimeMillis() but can be mocked for testing purposes. If you want to do mocks we should do it via mockito.
          Hide
          Holger Hoffstätte added a comment -

          You can avoid the entire problem by only comparing deltas from nanoTime, which is independent of "time" and simply increases monotonously.

          Show
          Holger Hoffstätte added a comment - You can avoid the entire problem by only comparing deltas from nanoTime, which is independent of "time" and simply increases monotonously.
          Hide
          Benjamin Reed added a comment -

          holger you are correct. nanoTime is the way to go. i'll prepare a fix. one problem with it is that the fix will be impossible to test.

          Show
          Benjamin Reed added a comment - holger you are correct. nanoTime is the way to go. i'll prepare a fix. one problem with it is that the fix will be impossible to test.
          Hide
          Patrick Hunt added a comment -

          One thing we should do - add sufficient logging (warn level or higher I would say) to ensure if this does happen in production we have a record of it in the log.

          Show
          Patrick Hunt added a comment - One thing we should do - add sufficient logging (warn level or higher I would say) to ensure if this does happen in production we have a record of it in the log.
          Hide
          Patrick Hunt added a comment -

          FYI, this came up again today on hbase list:

          14:59 < hp> man this system time update on a bunch of machines causing zookeeper session timeouts causing hr's to die is really taking its toll, count on a table now hangs, i disabled and enabled the table, tried count again, and it hangs at the same place still. Arg.

          Ben any progress on this? Should we try to get it into 3.3.3?

          Show
          Patrick Hunt added a comment - FYI, this came up again today on hbase list: 14:59 < hp > man this system time update on a bunch of machines causing zookeeper session timeouts causing hr's to die is really taking its toll, count on a table now hangs, i disabled and enabled the table, tried count again, and it hangs at the same place still. Arg. Ben any progress on this? Should we try to get it into 3.3.3?
          Hide
          Benjamin Reed added a comment -

          i haven't had a chance to get back to this. we really need to convert all the currentTimeMillis() to nanoTime(). we need to do a similar change in the C client.

          i don't think we can do a test for this.

          Show
          Benjamin Reed added a comment - i haven't had a chance to get back to this. we really need to convert all the currentTimeMillis() to nanoTime(). we need to do a similar change in the C client. i don't think we can do a test for this.
          Hide
          Mahadev konar added a comment -

          I am all for making it for 3.3.3. I'd be willing to fix it probably not this week.

          Show
          Mahadev konar added a comment - I am all for making it for 3.3.3. I'd be willing to fix it probably not this week.
          Hide
          Chang Song added a comment -

          Mahadev.

          Can we have some traction on this fix?
          We have experienced the same problem, and thus it wreak havoc on our deployment.

          Can we have the patch updated for current release?

          Thank you very much.

          Chang

          Show
          Chang Song added a comment - Mahadev. Can we have some traction on this fix? We have experienced the same problem, and thus it wreak havoc on our deployment. Can we have the patch updated for current release? Thank you very much. Chang
          Hide
          Mahadev konar added a comment -

          Chang,
          I probably wont have time over the next few weeks to fix this (given the 3.4 release). Please feel free to take it over or if someone else wants to fix it, he/she is welcome to do so.

          Show
          Mahadev konar added a comment - Chang, I probably wont have time over the next few weeks to fix this (given the 3.4 release). Please feel free to take it over or if someone else wants to fix it, he/she is welcome to do so.
          Hide
          Ted Dunning added a comment -

          History repeats itself. I have patches on ZOOKEEPER-1366 that use nanoTime to avoid these problems, but don't include the limit on number of expirations per tick.

          I suggest that we mark this issue as a duplicate of that issue and apply that patch since it is up to date. Any objections?

          Show
          Ted Dunning added a comment - History repeats itself. I have patches on ZOOKEEPER-1366 that use nanoTime to avoid these problems, but don't include the limit on number of expirations per tick. I suggest that we mark this issue as a duplicate of that issue and apply that patch since it is up to date. Any objections?
          Hide
          Patrick Hunt added a comment -

          sounds good to me.

          Show
          Patrick Hunt added a comment - sounds good to me.
          Hide
          Ted Dunning added a comment -

          We now have a measure of the memory half life for the community. Half of us forget an issue after 1000 JIRA's. Happily Pat is part of the other half.

          In any case the fix for ZOOKEEPER-1366 supersedes the proposed fix here.

          Show
          Ted Dunning added a comment - We now have a measure of the memory half life for the community. Half of us forget an issue after 1000 JIRA's. Happily Pat is part of the other half. In any case the fix for ZOOKEEPER-1366 supersedes the proposed fix here.
          Hide
          Benjamin Reed added a comment -

          hey, i vaguely remembered it! ZOOKEEPER-1366 is the correct fix

          Show
          Benjamin Reed added a comment - hey, i vaguely remembered it! ZOOKEEPER-1366 is the correct fix

            People

            • Assignee:
              Benjamin Reed
              Reporter:
              Benjamin Reed
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development