ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-829

Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: server
    • Labels:
      None

      Description

      For some use cases in HBase (HBASE-1316 in particular) we'd like the ability to forcible expire someone else's ZK session. Patrick and I discussed on IRC and came up with an idea of creating nodes in /zookeeper/sessions/<session id> that can be read in order to get basic stats about a session, and written in order to manipulate one. The manipulation we need in HBase is the ability to write a command like "kill", but others might be useful as well.

      1. 829.diff
        8 kB
        Michael Xu
      2. case-120.patch
        8 kB
        Marshall McMullen

        Issue Links

          Activity

          Hide
          Patrick Hunt added a comment -

          Implementing as part of /zookeeper (a znode) allows us to easily add this feature (vs changing the wire protocol). Integration with existing zk clients/libs/bindings/etc... will be simplified. Also we get ACLs in the bargain.

          Note that this also solves a lot of problems wrt client side testing - session expiration and disconnection will both be easy features to provide.

          I suggest we use commands like "expire" and "disconnect" rather than "kill"

          Another idea comes to mind - we might have /zookeeper/sessions/<id> and also /zookeeper/<serverid>/sessions/<id>, this second option would allow for a "supervisory" process to monitor load on the servers and forcibly disconnect clients, thereby redistributing the load. This has been a bane for us (equitable load distribution on the servers) and a feature like this seems like a great solution.

          Show
          Patrick Hunt added a comment - Implementing as part of /zookeeper (a znode) allows us to easily add this feature (vs changing the wire protocol). Integration with existing zk clients/libs/bindings/etc... will be simplified. Also we get ACLs in the bargain. Note that this also solves a lot of problems wrt client side testing - session expiration and disconnection will both be easy features to provide. I suggest we use commands like "expire" and "disconnect" rather than "kill" Another idea comes to mind - we might have /zookeeper/sessions/<id> and also /zookeeper/<serverid>/sessions/<id>, this second option would allow for a "supervisory" process to monitor load on the servers and forcibly disconnect clients, thereby redistributing the load. This has been a bane for us (equitable load distribution on the servers) and a feature like this seems like a great solution.
          Hide
          Patrick Hunt added a comment -

          btw, a getData on one of these znodes should return useful information about the session - similar to "cons" 4letter word I would think. (but not limited to)

          Show
          Patrick Hunt added a comment - btw, a getData on one of these znodes should return useful information about the session - similar to "cons" 4letter word I would think. (but not limited to)
          Hide
          Benjamin Reed added a comment -

          should we kill the session immediately or wait until the sessionTimeout. killing it immediate seems like it is violating a contract.

          Show
          Benjamin Reed added a comment - should we kill the session immediately or wait until the sessionTimeout. killing it immediate seems like it is violating a contract.
          Hide
          Todd Lipcon added a comment -

          The feature we need in HBase is the ability to kill it immediately (and hence fence it from making any further actions in ZK)

          Show
          Todd Lipcon added a comment - The feature we need in HBase is the ability to kill it immediately (and hence fence it from making any further actions in ZK)
          Hide
          Patrick Hunt added a comment -

          Perhaps we should think of it as close or disconnect rather than expire/disconnect. We already have this feature in JMX btw (just not as easily accessible) and you can also do it by connecting a new session with the same id/pass and then closing that...

          Show
          Patrick Hunt added a comment - Perhaps we should think of it as close or disconnect rather than expire/disconnect. We already have this feature in JMX btw (just not as easily accessible) and you can also do it by connecting a new session with the same id/pass and then closing that...
          Hide
          Michael Xu added a comment -

          hi everyone,

          i plan on taking this on my plate.

          patrick and i spoke over irc, there are issues with the current method of logging in and closing client id's for the purposes of explicitly expiring sessions.

          being able to have explicit control over sessions is a very desirable feature for those doing testing with zookeeper.

          i'll put some code in SessionTracker.java, that adds to DataTree.java on session creation on the cluster leader to create a /zookeeper/sessions/<id> and set a watch on it. clients may write a "close" to a zknode, which the server will acknowledge by removing the session.

          Show
          Michael Xu added a comment - hi everyone, i plan on taking this on my plate. patrick and i spoke over irc, there are issues with the current method of logging in and closing client id's for the purposes of explicitly expiring sessions. being able to have explicit control over sessions is a very desirable feature for those doing testing with zookeeper. i'll put some code in SessionTracker.java, that adds to DataTree.java on session creation on the cluster leader to create a /zookeeper/sessions/<id> and set a watch on it. clients may write a "close" to a zknode, which the server will acknowledge by removing the session.
          Hide
          Michael Xu added a comment -

          hi everyone,

          here is a patch for this issue.

          the server creates zknodes at /zookeeper/sessions with the sessionID as the name (NOT hex-encoded, uses base-10 representation of the sessionID ).

          Any client writing anything to a session node causes the session to be expired.

          I will write unit tests when I get a chance, but I'd like feedback on design-wise what may need to be changed in order to include it officially.

          Likely, the next step will be making a test mode specifically for zookeeperserver so clients can use this functionality of managing sessions themselves.

          michael

          Show
          Michael Xu added a comment - hi everyone, here is a patch for this issue. the server creates zknodes at /zookeeper/sessions with the sessionID as the name (NOT hex-encoded, uses base-10 representation of the sessionID ). Any client writing anything to a session node causes the session to be expired. I will write unit tests when I get a chance, but I'd like feedback on design-wise what may need to be changed in order to include it officially. Likely, the next step will be making a test mode specifically for zookeeperserver so clients can use this functionality of managing sessions themselves. michael
          Hide
          Michael Xu added a comment -

          typo correction to patch above

          Show
          Michael Xu added a comment - typo correction to patch above
          Hide
          Patrick Hunt added a comment -

          Looks pretty good to me, some nits but in general ok. Ben/Mahadev?

          1) use spaces not tabs
          2) methods start with lower
          3) use Long.toHexString to convert sessionid to hex (see other examples in the code, esp logging a sessionid)
          4) you probably want to support both disconnect and close/expire to enable testing
          5) add some tests, both to verify this code, but also to provide a "blueprint" for other users
          5.1) as part of 5 provide some reuseable code to testForceClose(sessionid) and such.

          Show
          Patrick Hunt added a comment - Looks pretty good to me, some nits but in general ok. Ben/Mahadev? 1) use spaces not tabs 2) methods start with lower 3) use Long.toHexString to convert sessionid to hex (see other examples in the code, esp logging a sessionid) 4) you probably want to support both disconnect and close/expire to enable testing 5) add some tests, both to verify this code, but also to provide a "blueprint" for other users 5.1) as part of 5 provide some reuseable code to testForceClose(sessionid) and such.
          Hide
          Patrick Hunt added a comment -

          Oh yea, the "close" feature should be disabled unless explicitly turned on the by the ensemble admin - we want ppl to only use this particular aspect for testing.

          Show
          Patrick Hunt added a comment - Oh yea, the "close" feature should be disabled unless explicitly turned on the by the ensemble admin - we want ppl to only use this particular aspect for testing.
          Hide
          Mark Miller added a comment -

          Would love to see this one get in soon.

          Show
          Mark Miller added a comment - Would love to see this one get in soon.
          Hide
          Marshall McMullen added a comment -

          This patch worked for me in 3.3.3, but doesn't seem to work correctly in 3.4.0 anymore. I haven't really dug into it to find out why yet. Wondered if anyone else has tried using this in 3.4.0 yet before I dive into it.... thanks.

          Show
          Marshall McMullen added a comment - This patch worked for me in 3.3.3, but doesn't seem to work correctly in 3.4.0 anymore. I haven't really dug into it to find out why yet. Wondered if anyone else has tried using this in 3.4.0 yet before I dive into it.... thanks.
          Hide
          Marshall McMullen added a comment -

          This patch doesn't apply cleanly to 3.4.3 anymore. I will upload a new patch once I get it modified.

          Show
          Marshall McMullen added a comment - This patch doesn't apply cleanly to 3.4.3 anymore. I will upload a new patch once I get it modified.
          Hide
          Marshall McMullen added a comment -

          Revised patch to work with zookeeper 3.4.3.

          Show
          Marshall McMullen added a comment - Revised patch to work with zookeeper 3.4.3.
          Hide
          Patrick Hunt added a comment -

          Marshall are you ready for review on this? If so you want to click on "submit patch". Note this is unlikely to go into 3.4 as it's a new feature. This would go into trunk instead (3.5.0).
          also see: https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute

          Show
          Patrick Hunt added a comment - Marshall are you ready for review on this? If so you want to click on "submit patch". Note this is unlikely to go into 3.4 as it's a new feature. This would go into trunk instead (3.5.0). also see: https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
          Hide
          Marshall McMullen added a comment -

          Patrick, I never addressed any of the issues you brought up with the original submission by Michael Xu back on 15 Sep... All I did was update the patch to work with 3.4.3. I don't have time right now to address the concerns you had with the original patch at this time. Maybe in a few weeks....

          Show
          Marshall McMullen added a comment - Patrick, I never addressed any of the issues you brought up with the original submission by Michael Xu back on 15 Sep... All I did was update the patch to work with 3.4.3. I don't have time right now to address the concerns you had with the original patch at this time. Maybe in a few weeks....
          Hide
          Patrick Hunt added a comment -

          Ok, no worries. In that case my main point is that you should target trunk. Thanks!

          Show
          Patrick Hunt added a comment - Ok, no worries. In that case my main point is that you should target trunk. Thanks!

            People

            • Assignee:
              Marshall McMullen
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development