ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1366

Zookeeper should be tolerant of clock adjustments

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: None
    • Labels:
      None

      Description

      If you want to wreak havoc on a ZK based system just do [date -s "+1hour"] and watch the mayhem as all sessions expire at once.

      This shouldn't happen. Zookeeper could easily know handle elapsed times as elapsed times rather than as differences between absolute times. The absolute times are subject to adjustment when the clock is set while a timer is not subject to this problem. In Java, System.currentTimeMillis() gives you absolute time while System.nanoTime() gives you time based on a timer from an arbitrary epoch.

      I have done this and have been running tests now for some tens of minutes with no failures. I will set up a test machine to redo the build again on Ubuntu and post a patch here for discussion.

      1. ZOOKEEPER-1366.patch
        47 kB
        Michi Mutsuzaki
      2. ZOOKEEPER-1366.patch
        45 kB
        Marshall McMullen
      3. ZOOKEEPER-1366.patch
        45 kB
        Marshall McMullen
      4. ZOOKEEPER-1366.patch
        45 kB
        Marshall McMullen
      5. ZOOKEEPER-1366.patch
        47 kB
        Ted Dunning
      6. ZOOKEEPER-1366.patch
        47 kB
        Ted Dunning
      7. ZOOKEEPER-1366.patch
        43 kB
        Ted Dunning
      8. ZOOKEEPER-1366.patch
        43 kB
        Ted Dunning
      9. ZOOKEEPER-1366-3.3.3.patch
        11 kB
        Ted Dunning
      10. zookeeper-3.4.5-ZK1366-SC01.patch
        55 kB
        Chris Seawood

        Issue Links

          Activity

          Hide
          Ted Dunning added a comment -

          See https://github.com/tdunning/zookeeper for a work in progress.

          Tests seem good except for ReadOnlyModeTest. That may be failing due to unrelated issues.

          Show
          Ted Dunning added a comment - See https://github.com/tdunning/zookeeper for a work in progress. Tests seem good except for ReadOnlyModeTest. That may be failing due to unrelated issues.
          Hide
          Ted Dunning added a comment -

          The ReadOnlyModeTest failure was an environmental fluke. It passes on Ubuntu and OSX.

          Patch coming up.

          Show
          Ted Dunning added a comment - The ReadOnlyModeTest failure was an environmental fluke. It passes on Ubuntu and OSX. Patch coming up.
          Hide
          Ted Dunning added a comment -

          This patch changes to using relative time everywhere that System.currentTimeMillis was previously used.

          Show
          Ted Dunning added a comment - This patch changes to using relative time everywhere that System.currentTimeMillis was previously used.
          Hide
          Flavio Junqueira added a comment -

          Hi Ted, I was wondering if this change fits into the framework of ZOOKEEPER-702.

          Show
          Flavio Junqueira added a comment - Hi Ted, I was wondering if this change fits into the framework of ZOOKEEPER-702 .
          Hide
          Ted Dunning added a comment -

          I don't think so. This change makes ZK work better. It doesn't detect failures.

          Show
          Ted Dunning added a comment - I don't think so. This change makes ZK work better. It doesn't detect failures.
          Hide
          Flavio Junqueira added a comment -

          You're talking about timers in your description, and we use timers for failure detection. The main problem is getting that patch in, there hasn't been enough agreement to get it committed.

          Show
          Flavio Junqueira added a comment - You're talking about timers in your description, and we use timers for failure detection. The main problem is getting that patch in, there hasn't been enough agreement to get it committed.
          Hide
          Ted Dunning added a comment -

          Yes. You are correct. Both bugs mention timers.

          That doesn't make them all that related. The recent ZOOKEEPER-1239 uses timers as well, but it isn't really all that related to either ZOOKEEPER-702 or this issue.

          As I read it ZOOKEEPER-702 is about failure detection. This bug is about making zookeeper work correct even when the system clock is advanced or retarded. It has nothing to do with failure detection and ZOOKEEPER-702 has really nothing to do with this issue.

          Show
          Ted Dunning added a comment - Yes. You are correct. Both bugs mention timers. That doesn't make them all that related. The recent ZOOKEEPER-1239 uses timers as well, but it isn't really all that related to either ZOOKEEPER-702 or this issue. As I read it ZOOKEEPER-702 is about failure detection. This bug is about making zookeeper work correct even when the system clock is advanced or retarded. It has nothing to do with failure detection and ZOOKEEPER-702 has really nothing to do with this issue.
          Hide
          Benjamin Reed added a comment -

          +1 nice patch. i vaguely remember a duplicate issue on this. i think ted is right, this is about how we detect time. any idea how we can test this ted?

          Show
          Benjamin Reed added a comment - +1 nice patch. i vaguely remember a duplicate issue on this. i think ted is right, this is about how we detect time. any idea how we can test this ted?
          Hide
          Ted Yu added a comment -

          The following would be replaced with Apache License header, I assume:

          +/**
          + * Created by IntelliJ IDEA.
          + * User: tdunning
          
          Show
          Ted Yu added a comment - The following would be replaced with Apache License header, I assume: +/** + * Created by IntelliJ IDEA. + * User: tdunning
          Hide
          Ted Dunning added a comment -

          Yes.

          The correct response on my part is "Ooops".

          On Thu, Jan 19, 2012 at 11:54 PM, Zhihong Yu (Commented) (JIRA) <

          Show
          Ted Dunning added a comment - Yes. The correct response on my part is "Ooops". On Thu, Jan 19, 2012 at 11:54 PM, Zhihong Yu (Commented) (JIRA) <
          Hide
          Benjamin Reed added a comment -

          good catch zhihong! i wish hudson would check that for us...

          Show
          Benjamin Reed added a comment - good catch zhihong! i wish hudson would check that for us...
          Hide
          Ted Dunning added a comment -

          Here is an updated patch that fixes the header.

          Show
          Ted Dunning added a comment - Here is an updated patch that fixes the header.
          Hide
          Ted Dunning added a comment -

          Ben,

          Regarding the header, I think that RAT can do something like that. Also, I can configure IntelliJ to do the right thing by default (I have that set for Mahout... forgot on ZK).

          Regarding testing, I have been unable to come up with a substantive unit test. It is trivial to test the Time.currentElapsedTime method, but that provides little information.

          A manual test is easy. Start a client that creates an ephemeral. Set system time +1hour. Note that ephemeral vanishes and client goes away in shame because it's session is marked as expired. Now repeat with patch.

          I will do the test and attach a transcript.

          Show
          Ted Dunning added a comment - Ben, Regarding the header, I think that RAT can do something like that. Also, I can configure IntelliJ to do the right thing by default (I have that set for Mahout... forgot on ZK). Regarding testing, I have been unable to come up with a substantive unit test. It is trivial to test the Time.currentElapsedTime method, but that provides little information. A manual test is easy. Start a client that creates an ephemeral. Set system time +1hour. Note that ephemeral vanishes and client goes away in shame because it's session is marked as expired. Now repeat with patch. I will do the test and attach a transcript.
          Hide
          Henry Robinson added a comment -

          The nice thing is that this is a small step towards a properly mockable time API in ZK, which would a) make tests much faster and b) make tests much more deterministic. There's a way to go still because of Thread.sleep and other complications, but this is a good first step.

          Show
          Henry Robinson added a comment - The nice thing is that this is a small step towards a properly mockable time API in ZK, which would a) make tests much faster and b) make tests much more deterministic. There's a way to go still because of Thread.sleep and other complications, but this is a good first step.
          Hide
          Henry Robinson added a comment -

          Oh, and my main concern here was non-monotonicity across cores, but this page alleviates that concern for modern Linux kernels:

          1. Is clock_gettime(CLOCK_REALTIME) consistent across all processors/cores?
          > (Does arch matter? e.g. ppc, arm, x86, amd64, sparc).
          It should or it's considered buggy.

          However, on x86/x86_64, it is possible to see unsynced or variable freq TSCs cause time inconsistencies. 2.4 kernels really had no protection against this, and early 2.6 kernels didn't do too well here either. As of 2.6.18 and up the logic for detecting this is better and we'll usually fall back to a safe clocksource.

          ppc always has a synced timebase, so that shouldn't be an issue.

          arm, i'm not so familiar with, but i assume they do the right thing (i've not seen many arm bugs on this issue).

          Show
          Henry Robinson added a comment - Oh, and my main concern here was non-monotonicity across cores, but this page alleviates that concern for modern Linux kernels: 1. Is clock_gettime(CLOCK_REALTIME) consistent across all processors/cores? > (Does arch matter? e.g. ppc, arm, x86, amd64, sparc). It should or it's considered buggy. However, on x86/x86_64, it is possible to see unsynced or variable freq TSCs cause time inconsistencies. 2.4 kernels really had no protection against this, and early 2.6 kernels didn't do too well here either. As of 2.6.18 and up the logic for detecting this is better and we'll usually fall back to a safe clocksource. ppc always has a synced timebase, so that shouldn't be an issue. arm, i'm not so familiar with, but i assume they do the right thing (i've not seen many arm bugs on this issue).
          Hide
          Ted Dunning added a comment -

          For the test, I have created a branch in my github fork of zookeeper called timer-test and one called ZK-1366. Both have the test case below but only ZK-1366 has the fix.

          The test procedure is to do this in one window:

          git checkout timer-test
          ant clean compile-test
          echo build/test/lib/*.jar build/lib/*.jar build/classes build/test/classes | sed -e 's/ /:/g' > cp
          java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-without-patch
          

          After the test program starts, in a second window, do these commands separated by a second or three

          date -s '+1hour'
          date -s '-1hour'
          

          This will produce output that looks like this

          ubuntu@ip-10-113-27-85:~/zookeeper$ java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-without-patch
          1
          log4j:WARN No appenders could be found for logger (org.apache.zookeeper.PortAssignment).
          log4j:WARN Please initialize the log4j system properly.
          log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
          after construct
          server start
          client
          1       true
          1       true
          0       true
          1       true
          3600000 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Disconnected type:None path:null
          3600001 event = WatchedEvent state:Expired type:None path:null
          3600001 event = WatchedEvent state:Expired type:None path:null
          3600001 event = WatchedEvent state:Expired type:None path:null
          3600001 event = WatchedEvent state:Expired type:None path:null
          3600001 event = WatchedEvent state:Expired type:None path:null
          

          The number is the time discrepancy between the system clock and the elapsed time from start of the program.

          With the patch,

          git checkout ZK-1366
          ant clean compile-test
          echo build/test/lib/*.jar build/lib/*.jar build/classes build/test/classes | sed -e 's/ /:/g' > cp
          java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-with-patch
          

          The output looks like this:

          ubuntu@ip-10-113-27-85:~/zookeeper$ less log-with-patch 
          1
          after construct
          server start
          client
          0       true
          0       true
          1       true
          0       true
          1       true
          0       true
          0       true
          3600000 true
          3600000 true
          3600000 true
          3600000 true
          0       true
          0       true
          0       true
          0       true
          -3600000        true
          -3600001        true
          -3600000        true
          -1      true
          -1      true
          -1      true
          -1      true
          

          As you can see, the timing discrepancy goes up and down but the ephemerals never disappear.

          I will update the patch shortly to match the ZK-1366 branch on my github clone.

          Show
          Ted Dunning added a comment - For the test, I have created a branch in my github fork of zookeeper called timer-test and one called ZK-1366. Both have the test case below but only ZK-1366 has the fix. The test procedure is to do this in one window: git checkout timer-test ant clean compile-test echo build/test/lib/*.jar build/lib/*.jar build/classes build/test/classes | sed -e 's/ /:/g' > cp java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-without-patch After the test program starts, in a second window, do these commands separated by a second or three date -s '+1hour' date -s '-1hour' This will produce output that looks like this ubuntu@ip-10-113-27-85:~/zookeeper$ java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-without-patch 1 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.PortAssignment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http: //logging.apache.org/log4j/1.2/faq.html#noconfig for more info. after construct server start client 1 true 1 true 0 true 1 true 3600000 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Disconnected type:None path: null 3600001 event = WatchedEvent state:Expired type:None path: null 3600001 event = WatchedEvent state:Expired type:None path: null 3600001 event = WatchedEvent state:Expired type:None path: null 3600001 event = WatchedEvent state:Expired type:None path: null 3600001 event = WatchedEvent state:Expired type:None path: null The number is the time discrepancy between the system clock and the elapsed time from start of the program. With the patch, git checkout ZK-1366 ant clean compile-test echo build/test/lib/*.jar build/lib/*.jar build/classes build/test/classes | sed -e 's/ /:/g' > cp java -cp $(cat cp) org.apache.zookeeper.common.TimeTest | tee log-with-patch The output looks like this: ubuntu@ip-10-113-27-85:~/zookeeper$ less log-with-patch 1 after construct server start client 0 true 0 true 1 true 0 true 1 true 0 true 0 true 3600000 true 3600000 true 3600000 true 3600000 true 0 true 0 true 0 true 0 true -3600000 true -3600001 true -3600000 true -1 true -1 true -1 true -1 true As you can see, the timing discrepancy goes up and down but the ephemerals never disappear. I will update the patch shortly to match the ZK-1366 branch on my github clone.
          Hide
          Ted Dunning added a comment -

          Contains the manual and a trivial automated test.

          Show
          Ted Dunning added a comment - Contains the manual and a trivial automated test.
          Hide
          Ted Dunning added a comment -

          Updated to current version of trunk

          Show
          Ted Dunning added a comment - Updated to current version of trunk
          Hide
          Ted Dunning added a comment -

          Here is a patch against the 3.3.3 release if desired. This had to be done by restarting the edits since I couldn't come close to cherry picking the final commit to trunk. I was able to cherry pick the test/timer code, however.

          Show
          Ted Dunning added a comment - Here is a patch against the 3.3.3 release if desired. This had to be done by restarting the edits since I couldn't come close to cherry picking the final commit to trunk. I was able to cherry pick the test/timer code, however.
          Hide
          Camille Fournier added a comment -

          The test in TimerTest is missing the @Test annotation which I presume is an oversight.

          Show
          Camille Fournier added a comment - The test in TimerTest is missing the @Test annotation which I presume is an oversight.
          Hide
          Camille Fournier added a comment -

          So in general, I think this is a good patch and a very good thing for us to do. But I feel like Henry's comment is most interesting:

          The nice thing is that this is a small step towards a properly mockable time API in ZK, which would a) make tests much faster and b) make tests much more deterministic. There's a way to go still because of Thread.sleep and other complications, but this is a good first step.

          We really aren't doing all that much towards that end by replacing one static method call with another. You still can't mock that in mockito. So the only question I have here is, if we're going to touch all those places anyway, should we just be creating an actual thin object that wraps "time" and use non-static methods on that object to make these calls, in order to allow more mocking of timing issues in the future? Or should we save that for another patch?

          Show
          Camille Fournier added a comment - So in general, I think this is a good patch and a very good thing for us to do. But I feel like Henry's comment is most interesting: The nice thing is that this is a small step towards a properly mockable time API in ZK, which would a) make tests much faster and b) make tests much more deterministic. There's a way to go still because of Thread.sleep and other complications, but this is a good first step. We really aren't doing all that much towards that end by replacing one static method call with another. You still can't mock that in mockito. So the only question I have here is, if we're going to touch all those places anyway, should we just be creating an actual thin object that wraps "time" and use non-static methods on that object to make these calls, in order to allow more mocking of timing issues in the future? Or should we save that for another patch?
          Hide
          Patrick Hunt added a comment -

          See ZOOKEEPER-366 for a previous discussion/solution for this issue. Would be good to close that one out if this is addressing the issue more directly.

          Show
          Patrick Hunt added a comment - See ZOOKEEPER-366 for a previous discussion/solution for this issue. Would be good to close that one out if this is addressing the issue more directly.
          Hide
          Henry Robinson added a comment -

          My feeling is that Ted's fixing a legitimate issue here, so we shouldn't hold up the patch for a separate effort. Reworking how we deal with time is going to be a big effort (Thread.sleep really does complicate things, plus there's the question of how to actually inject a mock clock - as you say, such method calls would need to be non-static but then we need to figure out how to get the right implementation behind those methods). This patch doesn't get in the way of doing a better job with time, and gives us the beginnings of a nice integration point to mock clocks out.

          So I'll file a separate JIRA to track being able to change our clock implementation, and we can evaluate this on its own merits (might be nice to run a soak test for a few hours here to make sure that there are no weird edge cases that somehow got broken). Sound good?

          Show
          Henry Robinson added a comment - My feeling is that Ted's fixing a legitimate issue here, so we shouldn't hold up the patch for a separate effort. Reworking how we deal with time is going to be a big effort (Thread.sleep really does complicate things, plus there's the question of how to actually inject a mock clock - as you say, such method calls would need to be non-static but then we need to figure out how to get the right implementation behind those methods). This patch doesn't get in the way of doing a better job with time, and gives us the beginnings of a nice integration point to mock clocks out. So I'll file a separate JIRA to track being able to change our clock implementation, and we can evaluate this on its own merits (might be nice to run a soak test for a few hours here to make sure that there are no weird edge cases that somehow got broken). Sound good?
          Hide
          Mahadev konar added a comment -

          @Ted,
          Seems like a good change, only one issue I see here. I'd like this to go into trunk and not into 3.4 unless its really a bug. I think 3.4 will take sometime to stabilize and would really like to avoid big changes in 3.4. Thoughts?

          Show
          Mahadev konar added a comment - @Ted, Seems like a good change, only one issue I see here. I'd like this to go into trunk and not into 3.4 unless its really a bug. I think 3.4 will take sometime to stabilize and would really like to avoid big changes in 3.4. Thoughts?
          Hide
          Camille Fournier added a comment -

          @Henry: I am fine with doing it as a separate ticket. I do think it's pretty trivial to rework this and get ourselves far down the road with a non-static impl, and not sure that we need to address Thread.sleep() to get a lot of mileage out of the solution. But I don't think I'll have time to rework this patch to do that so might as well do it in a separate ticket if Ted doesn't want to worry about that.

          Show
          Camille Fournier added a comment - @Henry: I am fine with doing it as a separate ticket. I do think it's pretty trivial to rework this and get ourselves far down the road with a non-static impl, and not sure that we need to address Thread.sleep() to get a lot of mileage out of the solution. But I don't think I'll have time to rework this patch to do that so might as well do it in a separate ticket if Ted doesn't want to worry about that.
          Hide
          Ted Dunning added a comment -

          @Mahadev,

          I will produce patches for several versions as well as pre-built tar files. This will include trunk, 3.4, 3.3.3 and 3.3.2.

          The rate at which we encounter this kind of problem even at facilities that are pretty sophisticated says to me that even if this isn't strictly a bug, then it seriously decreases the probability that ZK will function well averaged over all plausible users. Developers and users then think ZK is seriously buggy.

          As such, we view this internally as a required fix that we will be deploying as a patch on our current production version of ZK. Whether the ZK community views it that way relative to 3.4 is an entirely separate question, of course.

          @Pat, @Henry,

          I really do think that doing the simple small thing (this patch) is important without waiting on the resolution of the larger more comprehensive fix (fixing time management in general). Thanks for your efforts in that vein.

          @Camille,

          I will have a slightly revised set of patches by tonight that fix the @Test issue.

          I also think that mocking was not at all the point of this patch. The only way that this makes mocking easier is that we now have a (slightly) more distinctive string to search for but as you point out, that isn't the real issue anyway. Mocking static functions is becoming a more common capability, but I would rather do it right when we do it.

          Show
          Ted Dunning added a comment - @Mahadev, I will produce patches for several versions as well as pre-built tar files. This will include trunk, 3.4, 3.3.3 and 3.3.2. The rate at which we encounter this kind of problem even at facilities that are pretty sophisticated says to me that even if this isn't strictly a bug, then it seriously decreases the probability that ZK will function well averaged over all plausible users. Developers and users then think ZK is seriously buggy. As such, we view this internally as a required fix that we will be deploying as a patch on our current production version of ZK. Whether the ZK community views it that way relative to 3.4 is an entirely separate question, of course. @Pat, @Henry, I really do think that doing the simple small thing (this patch) is important without waiting on the resolution of the larger more comprehensive fix (fixing time management in general). Thanks for your efforts in that vein. @Camille, I will have a slightly revised set of patches by tonight that fix the @Test issue. I also think that mocking was not at all the point of this patch. The only way that this makes mocking easier is that we now have a (slightly) more distinctive string to search for but as you point out, that isn't the real issue anyway. Mocking static functions is becoming a more common capability, but I would rather do it right when we do it.
          Hide
          Patrick Hunt added a comment -

          @Ted my comment has nothing to do with Henry's suggestion re refactoring - I was pointing out that there is a similar JIRA to this that should probably be closed out - please see ZOOKEEPER-366

          Show
          Patrick Hunt added a comment - @Ted my comment has nothing to do with Henry's suggestion re refactoring - I was pointing out that there is a similar JIRA to this that should probably be closed out - please see ZOOKEEPER-366
          Hide
          Patrick Hunt added a comment -

          I have similar concerns as Mahadev re stability - this is a big change to introduce into a fix release. I'd rather see it go into trunk.

          Show
          Patrick Hunt added a comment - I have similar concerns as Mahadev re stability - this is a big change to introduce into a fix release. I'd rather see it go into trunk.
          Hide
          Ted Dunning added a comment -

          @Pat,

          ZOOKEEPER-366 is essentially the same issue except that the suggested solution is weaker since it doesn't handle the problem of time jumping backwards, nor suggest a method to detect the time jumps.

          The numerological coincidence on the issue number is quite striking.

          Show
          Ted Dunning added a comment - @Pat, ZOOKEEPER-366 is essentially the same issue except that the suggested solution is weaker since it doesn't handle the problem of time jumping backwards, nor suggest a method to detect the time jumps. The numerological coincidence on the issue number is quite striking.
          Hide
          Patrick Hunt added a comment -

          Ted - ha I thought the same thing (re #)

          Show
          Patrick Hunt added a comment - Ted - ha I thought the same thing (re #)
          Hide
          Benjamin Reed added a comment -

          once the @Test annotation is in, i think this is good to go. right? did we decide if this was going to be applied to any branches?

          Show
          Benjamin Reed added a comment - once the @Test annotation is in, i think this is good to go. right? did we decide if this was going to be applied to any branches?
          Hide
          Patrick Hunt added a comment -

          I don't think so Ben (on both counts).

          Has this been reviewed fully? I'm looking at it now for the first time and I notice a pretty serious issue right off the bat.

          Given the docs say the following for nanotime

          This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.

          the following is going to be a major headache:

                           cnxn.updateStatsForResponse(request.cxid, request.zxid, lastOp,
          -                        request.createTime, System.currentTimeMillis());
          +                        request.createTime, Time.currentElapsedTime());
          

          We shouldn't be using nanotime in any place where we display a date/time to a user. (ie ONLY use it to measure elapsed time)

          I think we should just put this into trunk only.

          Show
          Patrick Hunt added a comment - I don't think so Ben (on both counts). Has this been reviewed fully? I'm looking at it now for the first time and I notice a pretty serious issue right off the bat. Given the docs say the following for nanotime This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. the following is going to be a major headache: cnxn.updateStatsForResponse(request.cxid, request.zxid, lastOp, - request.createTime, System.currentTimeMillis()); + request.createTime, Time.currentElapsedTime()); We shouldn't be using nanotime in any place where we display a date/time to a user. (ie ONLY use it to measure elapsed time) I think we should just put this into trunk only.
          Hide
          Ted Dunning added a comment -

          Pat is right that this should not be used as a substitute for elapsed time.

          I think that the right ways to proceed are to either

          • mark all of the changes with TODO's that can be removed progressively (risk of not finishing)
          • do a comprehensive review now

          Option 1 should only be committed to trunk and only back ported when done. Option 2 delays the commit for an important operational issue.

          Show
          Ted Dunning added a comment - Pat is right that this should not be used as a substitute for elapsed time. I think that the right ways to proceed are to either mark all of the changes with TODO's that can be removed progressively (risk of not finishing) do a comprehensive review now Option 1 should only be committed to trunk and only back ported when done. Option 2 delays the commit for an important operational issue.
          Hide
          Patrick Hunt added a comment -

          This issue has been in the code base since day one, a few weeks or even months spent getting it right aren't going to hurt anything. Also given the "devil you know" aspect of this, the fact it's so cross-cutting, i'd push this into trunk and only consider it for backporting into a fix release once it's been well proven.

          Show
          Patrick Hunt added a comment - This issue has been in the code base since day one, a few weeks or even months spent getting it right aren't going to hurt anything. Also given the "devil you know" aspect of this, the fact it's so cross-cutting, i'd push this into trunk and only consider it for backporting into a fix release once it's been well proven.
          Hide
          Benjamin Reed added a comment -

          i don't think we should commit with TODO's. it shouldn't be hard to go through them before we commit.

          in the line that pat identified, it is passing the correct thing operationally, but it does mess up the last response time. do we want to add a System.currentTimeMillis() just to get the correct last response time? we could actually convert the time in getLastResponseTime() using System.currentTimeMillis() - (Time.currentElapsedTime()-lastResponseTime), so that the extra call to currentTimeMillis would only happen on getLastResponseTime, which happens very infrequently.

          Show
          Benjamin Reed added a comment - i don't think we should commit with TODO's. it shouldn't be hard to go through them before we commit. in the line that pat identified, it is passing the correct thing operationally, but it does mess up the last response time. do we want to add a System.currentTimeMillis() just to get the correct last response time? we could actually convert the time in getLastResponseTime() using System.currentTimeMillis() - (Time.currentElapsedTime()-lastResponseTime), so that the extra call to currentTimeMillis would only happen on getLastResponseTime, which happens very infrequently.
          Hide
          Patrick Hunt added a comment -

          It seems like Timer/elapsedTime should be wrapped with some abstraction that keeps this problem from happening w/o making it explicit. Rather than passing a long around say, it would instead have a "TimerStart startElapsedTime()" pass TimerStart around, then eventually "timerStart.elapsed()" which returns a long. This would keep us from making the same mistake in other places (well, w/o making it explicit).

          Show
          Patrick Hunt added a comment - It seems like Timer/elapsedTime should be wrapped with some abstraction that keeps this problem from happening w/o making it explicit. Rather than passing a long around say, it would instead have a "TimerStart startElapsedTime()" pass TimerStart around, then eventually "timerStart.elapsed()" which returns a long. This would keep us from making the same mistake in other places (well, w/o making it explicit).
          Hide
          Benjamin Reed added a comment -

          i don't like that idea. if we have to go that direction i would much rather use Date in cases where it will be human readable, but i'm not really a fan of that either. there are very few places that we need System.currentTimeMillis() lets just fix those. did we just put getLastResponseTime in the connection statistics because we happened to have it or because someone requested it?

          Show
          Benjamin Reed added a comment - i don't like that idea. if we have to go that direction i would much rather use Date in cases where it will be human readable, but i'm not really a fan of that either. there are very few places that we need System.currentTimeMillis() lets just fix those. did we just put getLastResponseTime in the connection statistics because we happened to have it or because someone requested it?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3648/
          -----------------------------------------------------------

          Review request for zookeeper.

          Summary
          -------

          ZOOKEEPER-1366. Zookeeper should be tolerant of clock adjustments

          This addresses bug ZOOKEEPER-1366.
          https://issues.apache.org/jira/browse/ZOOKEEPER-1366

          Diffs


          src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40
          src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 269f8e8
          src/java/main/org/apache/zookeeper/Login.java 6d2a38c
          src/java/main/org/apache/zookeeper/Shell.java f2c899a
          src/java/main/org/apache/zookeeper/ZKUtil.java 4713a08
          src/java/main/org/apache/zookeeper/common/Time.java PRE-CREATION
          src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 9721f7c
          src/java/main/org/apache/zookeeper/server/Request.java c6a2249
          src/java/main/org/apache/zookeeper/server/ServerStats.java acbc97c
          src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java cbae57a
          src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 14f8c8f
          src/java/main/org/apache/zookeeper/server/quorum/AuthFastLeaderElection.java 8ba12c5
          src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 473d01e
          src/java/main/org/apache/zookeeper/server/quorum/Follower.java 29ef3f9
          src/java/main/org/apache/zookeeper/server/quorum/Leader.java 27a5d20
          src/java/systest/org/apache/zookeeper/test/system/GenerateLoad.java 57d0dcb
          src/java/systest/org/apache/zookeeper/test/system/InstanceManager.java 809fa48
          src/java/systest/org/apache/zookeeper/test/system/SimpleSysTest.java 9cdf4d9
          src/java/test/org/apache/zookeeper/common/TimeTest.java PRE-CREATION
          src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java c2cfd7a
          src/java/test/org/apache/zookeeper/test/ClientBase.java 02638a1
          src/java/test/org/apache/zookeeper/test/ClientHammerTest.java b807dbb
          src/java/test/org/apache/zookeeper/test/CnxManagerTest.java fd59403
          src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 8e0b0eb
          src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java c705a7b
          src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java 1fed88a
          src/java/test/org/apache/zookeeper/test/TestHammer.java 09a678b
          src/java/test/org/apache/zookeeper/test/ZooKeeperTestClient.java cc82c87
          src/recipes/queue/src/c/tests/Util.cc 26a9a09

          Diff: https://reviews.apache.org/r/3648/diff

          Testing
          -------

          Thanks,

          Benjamin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3648/ ----------------------------------------------------------- Review request for zookeeper. Summary ------- ZOOKEEPER-1366 . Zookeeper should be tolerant of clock adjustments This addresses bug ZOOKEEPER-1366 . https://issues.apache.org/jira/browse/ZOOKEEPER-1366 Diffs src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 269f8e8 src/java/main/org/apache/zookeeper/Login.java 6d2a38c src/java/main/org/apache/zookeeper/Shell.java f2c899a src/java/main/org/apache/zookeeper/ZKUtil.java 4713a08 src/java/main/org/apache/zookeeper/common/Time.java PRE-CREATION src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 9721f7c src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/ServerStats.java acbc97c src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java cbae57a src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 14f8c8f src/java/main/org/apache/zookeeper/server/quorum/AuthFastLeaderElection.java 8ba12c5 src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 473d01e src/java/main/org/apache/zookeeper/server/quorum/Follower.java 29ef3f9 src/java/main/org/apache/zookeeper/server/quorum/Leader.java 27a5d20 src/java/systest/org/apache/zookeeper/test/system/GenerateLoad.java 57d0dcb src/java/systest/org/apache/zookeeper/test/system/InstanceManager.java 809fa48 src/java/systest/org/apache/zookeeper/test/system/SimpleSysTest.java 9cdf4d9 src/java/test/org/apache/zookeeper/common/TimeTest.java PRE-CREATION src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java c2cfd7a src/java/test/org/apache/zookeeper/test/ClientBase.java 02638a1 src/java/test/org/apache/zookeeper/test/ClientHammerTest.java b807dbb src/java/test/org/apache/zookeeper/test/CnxManagerTest.java fd59403 src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 8e0b0eb src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java c705a7b src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java 1fed88a src/java/test/org/apache/zookeeper/test/TestHammer.java 09a678b src/java/test/org/apache/zookeeper/test/ZooKeeperTestClient.java cc82c87 src/recipes/queue/src/c/tests/Util.cc 26a9a09 Diff: https://reviews.apache.org/r/3648/diff Testing ------- Thanks, Benjamin
          Hide
          Benjamin Reed added a comment -

          for some reason the review board comments didn't come through, but they are on review board.

          i think Login.java should all be based on wall time since i think that is what kerberos is based on. perhaps eugene can comment on this.

          in ZooKeeperServer.java getTime() should return the wall time since that is used for the mtime.

          in GenerateLoad.java when we print the timestamp we probably should be doing wall time, but since we don't format it, it probably doesn't matter much.

          in LoadFromLogTest.java when we do datatree ops, usually the second to the last parameter is the wall time.

          you missed one change to elapsed time in the ReadOnlyModeTest.java

          one file that isn't in the patch that we need to fix is ServerCnxn.java. i think lastResponseTime should have a comment to say that it is the system time and we can make getLastResponseTime() return the wall time using "return System.currentMillis() - (Time.currentElapsedTime() - lastResponseTime);"

          after crawling through everything, i think it would be nice if Time also had Time.currentWallTime() that did System.currentMillis(). that way if we saw an instance of System.currentMillis() in the code it would be a flag that the use should be reviewed.

          oh and we should probably create a jira to fix the C client.

          Show
          Benjamin Reed added a comment - for some reason the review board comments didn't come through, but they are on review board. i think Login.java should all be based on wall time since i think that is what kerberos is based on. perhaps eugene can comment on this. in ZooKeeperServer.java getTime() should return the wall time since that is used for the mtime. in GenerateLoad.java when we print the timestamp we probably should be doing wall time, but since we don't format it, it probably doesn't matter much. in LoadFromLogTest.java when we do datatree ops, usually the second to the last parameter is the wall time. you missed one change to elapsed time in the ReadOnlyModeTest.java one file that isn't in the patch that we need to fix is ServerCnxn.java. i think lastResponseTime should have a comment to say that it is the system time and we can make getLastResponseTime() return the wall time using "return System.currentMillis() - (Time.currentElapsedTime() - lastResponseTime);" after crawling through everything, i think it would be nice if Time also had Time.currentWallTime() that did System.currentMillis(). that way if we saw an instance of System.currentMillis() in the code it would be a flag that the use should be reviewed. oh and we should probably create a jira to fix the C client.
          Hide
          Ted Dunning added a comment -

          I can put another method in Time that is the same as currentTimeMillis, but which does the time skew thing (sounds like Rocky Horror to me)

          Show
          Ted Dunning added a comment - I can put another method in Time that is the same as currentTimeMillis, but which does the time skew thing (sounds like Rocky Horror to me)
          Hide
          Patrick Hunt added a comment -

          Hi Ben:

          i don't like that idea.

          why not? it would make it explicit and keep this from happening again. (one would hope) Similar to Date, except in this case we're capturing the concept of elapsed time. It seems like it might also make things more testable (the impl could be mucked with, mocked, etc...).

          Show
          Patrick Hunt added a comment - Hi Ben: i don't like that idea. why not? it would make it explicit and keep this from happening again. (one would hope) Similar to Date, except in this case we're capturing the concept of elapsed time. It seems like it might also make things more testable (the impl could be mucked with, mocked, etc...).
          Hide
          Patrick Hunt added a comment -

          Here's an example from Hadoop, specifically to enable what i'm talking about - testability: org/apache/hadoop/mapred/Clock.java

          /**
           * A clock class - can be mocked out for testing.
           */
          class Clock {
          
          Show
          Patrick Hunt added a comment - Here's an example from Hadoop, specifically to enable what i'm talking about - testability: org/apache/hadoop/mapred/Clock.java /** * A clock class - can be mocked out for testing. */ class Clock {
          Hide
          Mahadev konar added a comment -

          Pat/Ben,
          I think the issue here is the api with Clock. The static api is one that ruins mocking in it. In Hadoop we make sure we pass around the same clock object when creating all the sub sequent objects (the constructs in MR next gen are more DI compliant). We could try doing that here but again I think its a bit of an effort (should be manual work). But as Henry/Camille mentioned we could do that in another jira. I think thats the right solution instead of creating another layer which hides the longs (as Pat suggested).

          Show
          Mahadev konar added a comment - Pat/Ben, I think the issue here is the api with Clock. The static api is one that ruins mocking in it. In Hadoop we make sure we pass around the same clock object when creating all the sub sequent objects (the constructs in MR next gen are more DI compliant). We could try doing that here but again I think its a bit of an effort (should be manual work). But as Henry/Camille mentioned we could do that in another jira. I think thats the right solution instead of creating another layer which hides the longs (as Pat suggested).
          Hide
          Benjamin Reed added a comment -

          agreed. i think also that you can still mock a static api if you provide a method and interface to allow the implementation to get the time from a user provided object. but again that is a different issue.

          for now i would really really like to get this in asap. it's a shame that it missed the next release since this is a real problem that we have a fix for. even without the C client, the big problem is what happens at the server with the session timeouts and this patch fixes that problem.

          Show
          Benjamin Reed added a comment - agreed. i think also that you can still mock a static api if you provide a method and interface to allow the implementation to get the time from a user provided object. but again that is a different issue. for now i would really really like to get this in asap. it's a shame that it missed the next release since this is a real problem that we have a fix for. even without the C client, the big problem is what happens at the server with the session timeouts and this patch fixes that problem.
          Hide
          Ted Dunning added a comment -

          On the off topic of mocking, I was just able to mock System.nanoTime directly using jmockit. Very easy to do.

          See https://github.com/tdunning/storm-counts/blob/master/src/test/java/com/mapr/storm/Fake.java especially Fake.clock near the bottom. Fake.clock() returns an object that adjusts the value that is returned by System.nanoTime(). I haven't verified that the effect goes away at the end of the test, but jmockit usually handles that correctly.

          Show
          Ted Dunning added a comment - On the off topic of mocking, I was just able to mock System.nanoTime directly using jmockit. Very easy to do. See https://github.com/tdunning/storm-counts/blob/master/src/test/java/com/mapr/storm/Fake.java especially Fake.clock near the bottom. Fake.clock() returns an object that adjusts the value that is returned by System.nanoTime(). I haven't verified that the effect goes away at the end of the test, but jmockit usually handles that correctly.
          Hide
          Ted Dunning added a comment -

          This patch never got committed. It should have been committed long ago.

          I am happy to resurrect it. Which branches should I generate patches for?

          Show
          Ted Dunning added a comment - This patch never got committed. It should have been committed long ago. I am happy to resurrect it. Which branches should I generate patches for?
          Hide
          Marshall McMullen added a comment -

          I think there are changes in the C client that should be made as well to make it more resilent to clock adjustments as well. Doesn't look like any of those were accounted for under this patch.

          Show
          Marshall McMullen added a comment - I think there are changes in the C client that should be made as well to make it more resilent to clock adjustments as well. Doesn't look like any of those were accounted for under this patch.
          Hide
          Colin Patrick McCabe added a comment -

          I think there are changes in the C client that should be made as well to make it more resilent to clock adjustments as well. Doesn't look like any of those were accounted for under this patch.

          I filed ZOOKEEPER-1626 to deal with the C client changes.

          Show
          Colin Patrick McCabe added a comment - I think there are changes in the C client that should be made as well to make it more resilent to clock adjustments as well. Doesn't look like any of those were accounted for under this patch. I filed ZOOKEEPER-1626 to deal with the C client changes.
          Hide
          Benjamin Reed added a comment -

          i was just reviewing this issue and reading my comments that i had recently made. and they sounded like me but i couldn't remember making them last month. then i realized that i made them in 2012!!! we really need to get this patch in. is someone working on it?

          Show
          Benjamin Reed added a comment - i was just reviewing this issue and reading my comments that i had recently made. and they sounded like me but i couldn't remember making them last month. then i realized that i made them in 2012!!! we really need to get this patch in. is someone working on it?
          Hide
          Marshall McMullen added a comment -

          This has become a blocker issues for us, so I'm resurrecting this patch so it applies cleanly to the tip of trunk (r1448845).

          Three issues:

          (1) A few unit tests are failing (on my local machine at least) and I can't readily tell what's wrong. I would really like Ted (or someone else) to look at these failures.

          (2) I did NOT go through and find other uses of System.currentTimeMillis to see if they should be converted to Time.currentElapsedTime(). I think someone more knowledgeable should look these over with a fine tooth comb.

          (3) Ted's original patch made changes to SessionTrackerImpl.java but those parts of that file that were originally patched don't seem to be present anymore. It looks to me like that code was moved into ExpiryQueue.java. I'd really like some to take a close look at that as well.

          Any assistance with these three issues is extremely appreciated!!

          Show
          Marshall McMullen added a comment - This has become a blocker issues for us, so I'm resurrecting this patch so it applies cleanly to the tip of trunk (r1448845). Three issues: (1) A few unit tests are failing (on my local machine at least) and I can't readily tell what's wrong. I would really like Ted (or someone else) to look at these failures. (2) I did NOT go through and find other uses of System.currentTimeMillis to see if they should be converted to Time.currentElapsedTime(). I think someone more knowledgeable should look these over with a fine tooth comb. (3) Ted's original patch made changes to SessionTrackerImpl.java but those parts of that file that were originally patched don't seem to be present anymore. It looks to me like that code was moved into ExpiryQueue.java. I'd really like some to take a close look at that as well. Any assistance with these three issues is extremely appreciated!!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570423/ZOOKEEPER-1366.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570423/ZOOKEEPER-1366.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1403//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          This should fix the failing TimeTest that was missing @Test.

          Show
          Marshall McMullen added a comment - This should fix the failing TimeTest that was missing @Test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570427/ZOOKEEPER-1366.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570427/ZOOKEEPER-1366.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1404//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          Fix compile failure in TestTime

          Show
          Marshall McMullen added a comment - Fix compile failure in TestTime
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570508/ZOOKEEPER-1366.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570508/ZOOKEEPER-1366.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1405//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          My latest patch fixes all the unit test failures.

          My prior two statements still apply:

          Show
          Marshall McMullen added a comment - My latest patch fixes all the unit test failures. My prior two statements still apply:
          Hide
          Marshall McMullen added a comment -

          My prior two caveats still apply:

          (2) I did NOT go through and find other uses of System.currentTimeMillis to see if they should be converted to Time.currentElapsedTime(). I think someone more knowledgeable should look these over with a fine tooth comb.

          (3) Ted's original patch made changes to SessionTrackerImpl.java but those parts of that file that were originally patched don't seem to be present anymore. It looks to me like that code was moved into ExpiryQueue.java. I'd really like some to take a close look at that as well.
          Any assistance with these three issues is extremely appreciated!!

          Show
          Marshall McMullen added a comment - My prior two caveats still apply: (2) I did NOT go through and find other uses of System.currentTimeMillis to see if they should be converted to Time.currentElapsedTime(). I think someone more knowledgeable should look these over with a fine tooth comb. (3) Ted's original patch made changes to SessionTrackerImpl.java but those parts of that file that were originally patched don't seem to be present anymore. It looks to me like that code was moved into ExpiryQueue.java. I'd really like some to take a close look at that as well. Any assistance with these three issues is extremely appreciated!!
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570508/ZOOKEEPER-1366.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570508/ZOOKEEPER-1366.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1408//console This message is automatically generated.
          Hide
          Hangjun Ye added a comment -

          Not committed yet?

          Show
          Hangjun Ye added a comment - Not committed yet?
          Hide
          Chris Seawood added a comment -

          This issue has hit critical mass for us as well and we need the fix applied against a current stable release tree. I took the patch from 2013-02-22-10:31 and applied it to the 3.4.5 tree.

          I looked at the remaining System.currentTimeMillis() calls and fixed a couple of instances. The remaining calls are made in:

          • the MergedLogSource() console output
          • the LeaderElectionSupportTest.java test which uses the value as part of a unique identifier
          • the TimeTest.java test which makes sure that Time.currentElapsedTime() works as advertised.

          So far, I've only done the simple [date -s "+1hour"] test and that passed. I'll let you know if we hit any other issues.

          Show
          Chris Seawood added a comment - This issue has hit critical mass for us as well and we need the fix applied against a current stable release tree. I took the patch from 2013-02-22-10:31 and applied it to the 3.4.5 tree. I looked at the remaining System.currentTimeMillis() calls and fixed a couple of instances. The remaining calls are made in: the MergedLogSource() console output the LeaderElectionSupportTest.java test which uses the value as part of a unique identifier the TimeTest.java test which makes sure that Time.currentElapsedTime() works as advertised. So far, I've only done the simple [date -s "+1hour"] test and that passed. I'll let you know if we hit any other issues.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596213/zookeeper-3.4.5-ZK1366-SC01.patch
          against trunk revision 1503101.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 52 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1527//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596213/zookeeper-3.4.5-ZK1366-SC01.patch against trunk revision 1503101. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 52 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1527//console This message is automatically generated.
          Hide
          Asad Saeed added a comment -

          These patches will break Kerberos login. It mixes monotonic and actual time.

          Show
          Asad Saeed added a comment - These patches will break Kerberos login. It mixes monotonic and actual time.
          Hide
          Ted Dunning added a comment -

          Why do you say this?

          What scenario would cause a breakage?

          Show
          Ted Dunning added a comment - Why do you say this? What scenario would cause a breakage?
          Hide
          Asad Saeed added a comment -

          A few issues with this patch. It is much broader than it should be.

          In Login.java: 2 timesources are being utilized and incorrectly compared:
          Time.currentElapsedTime is not compatible with the (KerberosTicket).getEndTime which will return the expiry based on the system time. The 2 time values are compared in the following block


          if (nextRefresh > expiry) {
          LOG.error("next refresh: " + nextRefreshDate + " is later than expiry " + expiryDate
          + ". This may indicate a clock skew problem. Check that this host and the KDC's "
          + "hosts' clocks are in sync. Exiting refresh thread.");
          return;

          ZookeeperServer::getTime: Utilization of Time.currentElapsedTime here will lead to unusable timestamps for the mtime and ctime for nodes. DataTree::processTxn uses the TxnHeader.time to populate the Stat structure. TxnHeader.time is assigned by PrepRequestProcessor::pRequest2Txn using ZookeeperServer.getTime.

          Show
          Asad Saeed added a comment - A few issues with this patch. It is much broader than it should be. In Login.java: 2 timesources are being utilized and incorrectly compared: Time.currentElapsedTime is not compatible with the (KerberosTicket).getEndTime which will return the expiry based on the system time. The 2 time values are compared in the following block – if (nextRefresh > expiry) { LOG.error("next refresh: " + nextRefreshDate + " is later than expiry " + expiryDate + ". This may indicate a clock skew problem. Check that this host and the KDC's " + "hosts' clocks are in sync. Exiting refresh thread."); return; – ZookeeperServer::getTime: Utilization of Time.currentElapsedTime here will lead to unusable timestamps for the mtime and ctime for nodes. DataTree::processTxn uses the TxnHeader.time to populate the Stat structure. TxnHeader.time is assigned by PrepRequestProcessor::pRequest2Txn using ZookeeperServer.getTime.
          Hide
          Michi Mutsuzaki added a comment -

          Updated the patch so that it applies cleanly, and addressed Asad's comment.

          Show
          Michi Mutsuzaki added a comment - Updated the patch so that it applies cleanly, and addressed Asad's comment.
          Hide
          Michi Mutsuzaki added a comment -

          review: https://reviews.apache.org/r/20635/

          We should fix this in 3.5.

          Show
          Michi Mutsuzaki added a comment - review: https://reviews.apache.org/r/20635/ We should fix this in 3.5.
          Hide
          Raul Gutierrez Segales added a comment -

          Michi Mutsuzaki: I think the RB is missing some new files (i.e.: Time.java), no?

          Show
          Raul Gutierrez Segales added a comment - Michi Mutsuzaki : I think the RB is missing some new files (i.e.: Time.java), no?
          Hide
          Michi Mutsuzaki added a comment -

          ... that's what I get for not using 'svn patch' command. Thanks for catching Raul. I'll re-upload the patch to reviewboard.

          Show
          Michi Mutsuzaki added a comment - ... that's what I get for not using 'svn patch' command. Thanks for catching Raul. I'll re-upload the patch to reviewboard.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641585/ZOOKEEPER-1366.patch
          against trunk revision 1588584.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 48 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641585/ZOOKEEPER-1366.patch against trunk revision 1588584. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2056//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641585/ZOOKEEPER-1366.patch
          against trunk revision 1588584.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 48 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641585/ZOOKEEPER-1366.patch against trunk revision 1588584. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2057//console This message is automatically generated.

            People

            • Assignee:
              Ted Dunning
              Reporter:
              Ted Dunning
            • Votes:
              6 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:

                Development