Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None

      Description

      Zookeeper =~ Chubby. This means that we could take advantage of a distributed lock manager to coordinate things like failover masters, regionservers staying online when master is dead, atomic region->regionserver assignments, etc. There are a lot of opportunities for improvements here. Please add discussions of particular features in comments or sub-tasks.

      1. DistributedLockInterface.java
        2 kB
        Jean-Daniel Cryans
      2. HBASE-546.patch
        24 kB
        Nitay Joffe
      3. HBASE-546-v2.patch
        44 kB
        Nitay Joffe
      4. hbase-546-v3.patch
        47 kB
        Nitay Joffe
      5. zookeeper-config.patch
        11 kB
        Jean-Daniel Cryans

        Issue Links

          Activity

          Hide
          Nitay Joffe added a comment -

          All of the subtasks have been fixed. Closing this.

          Show
          Nitay Joffe added a comment - All of the subtasks have been fixed. Closing this.
          Hide
          Nitay Joffe added a comment -

          See https://issues.apache.org/jira/browse/HBASE-1144 for replies to comments made by JD/Stack.

          Show
          Nitay Joffe added a comment - See https://issues.apache.org/jira/browse/HBASE-1144 for replies to comments made by JD/Stack.
          Hide
          Jean-Daniel Cryans added a comment -

          When single-instance of zk that has been started by hbase runs, where does it write its logs. Can that be configurable?

          Look in the scripts-v2 patch, it writes the logs at the same place as all other HBase components.

          For the sake of performance and reliability, HBase should start a small ZK cluster itself (not necessary every HRS, but could be a list of nodes in conf/hbase-zk).

          But I agree with what stack listed in Nov 11 above, esp, #5. We can start with a single-instance ZK then change it if necessary. For example, if people mainly deal with HBase, they shouldn't be bothered to get a ZK cluster up and running by themselves.

          Stack's comment was in fact a group decision so we will mainly stick to it. The way the scripts are currently written (see my patch), HBase starts by default 1 ZK on the localhost. You can add other machines in conf/zookeepers so that a cluster of ZK is started BUT you still need to add a file yourself of each machine so that they know that they are in a quorum (see ZK doc). You can also disable the HBase managing of ZK by simply emptying the conf/zookeepers file.

          Regards the part where people shouldn't bother with ZK well... if you keep the default behavior this will create a not very available cluster so I'd say that a new user won't be bothered by ZK, but someone who wants real availability needs to get his hands dirty.

          Show
          Jean-Daniel Cryans added a comment - When single-instance of zk that has been started by hbase runs, where does it write its logs. Can that be configurable? Look in the scripts-v2 patch, it writes the logs at the same place as all other HBase components. For the sake of performance and reliability, HBase should start a small ZK cluster itself (not necessary every HRS, but could be a list of nodes in conf/hbase-zk). But I agree with what stack listed in Nov 11 above, esp, #5. We can start with a single-instance ZK then change it if necessary. For example, if people mainly deal with HBase, they shouldn't be bothered to get a ZK cluster up and running by themselves. Stack's comment was in fact a group decision so we will mainly stick to it. The way the scripts are currently written (see my patch), HBase starts by default 1 ZK on the localhost. You can add other machines in conf/zookeepers so that a cluster of ZK is started BUT you still need to add a file yourself of each machine so that they know that they are in a quorum (see ZK doc). You can also disable the HBase managing of ZK by simply emptying the conf/zookeepers file. Regards the part where people shouldn't bother with ZK well... if you keep the default behavior this will create a not very available cluster so I'd say that a new user won't be bothered by ZK, but someone who wants real availability needs to get his hands dirty.
          Hide
          Rong-En Fan added a comment -

          For the sake of performance and reliability, HBase should start a small ZK cluster itself (not necessary every HRS, but could be a list of nodes in conf/hbase-zk).

          But I agree with what stack listed in Nov 11 above, esp, #5. We can start with a single-instance ZK then change it if necessary. For example, if people mainly deal with HBase, they shouldn't be bothered to get a ZK cluster up and running by themselves.

          Show
          Rong-En Fan added a comment - For the sake of performance and reliability, HBase should start a small ZK cluster itself (not necessary every HRS, but could be a list of nodes in conf/hbase-zk). But I agree with what stack listed in Nov 11 above, esp, #5. We can start with a single-instance ZK then change it if necessary. For example, if people mainly deal with HBase, they shouldn't be bothered to get a ZK cluster up and running by themselves.
          Hide
          stack added a comment -

          For zookeeper.servers, you specify a quorum by listing the quorum members?

          When single-instance of zk that has been started by hbase runs, where does it write its logs. Can that be configurable?

          What if I specify a full-path for zookeeper.znode.rootserver, will that write root region location outside of zookeeper.znode.parent?

          (Regards a question you asked on IRC a few days ago) If things like DEFAULT_ZOOKEEPER_SERVERS , are only used in one place, I'd say don't need to be in HConstants... just do it in place used. If used more than once, its good to define HConstants AND they are used by more than one class (otherwise, do the define inside that class)

          Should below be synchronized nitay?

          + private ZooKeeperWrapper getZooKeeperWrapper() throws IOException {

          Is there danger that two threads could be asking for it at about same time?

          This is an interesting change Nitay: - public HServerAddress findRootRegion();... removing it from the Master. I like it. ZK rules now!

          Below....
          + * Copyright 2008 The Apache Software Foundation
          ... should be 2009

          Below has trailing '\'

          + * Wraps a ZooKeeper instance and adds HBase specific functionality.\

          Can these be final in ZKWrapper?

          + private ZooKeeper zooKeeper;
          + private WatcherWrapper watcher;

          ... same in HRS:

          + private ZooKeeperWrapper zooKeeperWrapper;

          In below...

          + rootRegionZNode = parentZNode + "/" + rootServerZNodeName;

          ... does ZK have a define for path separator?

          We don't throw exception if we fail to get root:

          +    try {
          +      data = zooKeeper.getData(rootRegionZNode, false, null);
          +    } catch (InterruptedException e) {
          +      return null;
          +    } catch (KeeperException e) {
          +      return null;
          +    }
          

          ... is that good? Does caller handle null?

          Whats story w/ data in ZK? Its byte arrays? Should they be UTF-8? If so, Bytes.toBytes over in hbase util might help. E.g. rather than + String addressString = new String(data); ...
          which could give different answers if the client was in different local than original writer, be explicit its utf-8 and do Bytes.toBytes(....) when writing and Bytes.toString(... when getting?

          Show
          stack added a comment - For zookeeper.servers, you specify a quorum by listing the quorum members? When single-instance of zk that has been started by hbase runs, where does it write its logs. Can that be configurable? What if I specify a full-path for zookeeper.znode.rootserver, will that write root region location outside of zookeeper.znode.parent? (Regards a question you asked on IRC a few days ago) If things like DEFAULT_ZOOKEEPER_SERVERS , are only used in one place, I'd say don't need to be in HConstants... just do it in place used. If used more than once, its good to define HConstants AND they are used by more than one class (otherwise, do the define inside that class) Should below be synchronized nitay? + private ZooKeeperWrapper getZooKeeperWrapper() throws IOException { Is there danger that two threads could be asking for it at about same time? This is an interesting change Nitay: - public HServerAddress findRootRegion();... removing it from the Master. I like it. ZK rules now! Below.... + * Copyright 2008 The Apache Software Foundation ... should be 2009 Below has trailing '\' + * Wraps a ZooKeeper instance and adds HBase specific functionality.\ Can these be final in ZKWrapper? + private ZooKeeper zooKeeper; + private WatcherWrapper watcher; ... same in HRS: + private ZooKeeperWrapper zooKeeperWrapper; In below... + rootRegionZNode = parentZNode + "/" + rootServerZNodeName; ... does ZK have a define for path separator? We don't throw exception if we fail to get root: + try { + data = zooKeeper.getData(rootRegionZNode, false , null ); + } catch (InterruptedException e) { + return null ; + } catch (KeeperException e) { + return null ; + } ... is that good? Does caller handle null? Whats story w/ data in ZK? Its byte arrays? Should they be UTF-8? If so, Bytes.toBytes over in hbase util might help. E.g. rather than + String addressString = new String(data); ... which could give different answers if the client was in different local than original writer, be explicit its utf-8 and do Bytes.toBytes(....) when writing and Bytes.toString(... when getting?
          Hide
          Jean-Daniel Cryans added a comment -

          Nitay,

          The config in hbase-default.xml should be expressed as numbers and not operations like zookeeper.pause should be 2000 and not 2 * 1000. Also, since zookeeper.servers requires to be set for a fully distributed HBase,so you should modify the src/java/overview.html file.

          Show
          Jean-Daniel Cryans added a comment - Nitay, The config in hbase-default.xml should be expressed as numbers and not operations like zookeeper.pause should be 2000 and not 2 * 1000. Also, since zookeeper.servers requires to be set for a fully distributed HBase,so you should modify the src/java/overview.html file.
          Hide
          Nitay Joffe added a comment -

          Added config option defaults in conf/hbase-default.xml

          Show
          Nitay Joffe added a comment - Added config option defaults in conf/hbase-default.xml
          Hide
          Nitay Joffe added a comment -

          The first version of the patch, renamed.

          Show
          Nitay Joffe added a comment - The first version of the patch, renamed.
          Hide
          Nitay Joffe added a comment -

          Fixes to the issues discussed above. Also added a test that ensures master writes root region location to ZooKeeper once scanning is complete.

          Show
          Nitay Joffe added a comment - Fixes to the issues discussed above. Also added a test that ensures master writes root region location to ZooKeeper once scanning is complete.
          Hide
          Jean-Daniel Cryans added a comment -

          Had this chat with Nitay on IRC:

          [19:42] <nitay> got your updates, gonna make the write do some retries and then throw all the way up, which should shutdown the HMaster?
          [19:43] <nitay> also, about the safe mode, so i saw that i removed that but i thought that was safe b/c the client will just keep trying to read the root region location?
          [19:43] <jdcryans> the retries should be done in RegionManager
          [19:43] <nitay> jdcryans: oh ok i can move them there
          [19:43] <jdcryans> yeah but the root region will be assigned first, but that doesn't mean that the meta regions are all assigned
          [19:44] <jdcryans> so the safe mode should really work like in hdfs
          [19:44] <nitay> how does it work in hdfs?
          [19:44] <jdcryans> you can't do anything until safe mode is off
          [19:45] <nitay> ok, i can make that a direct rpc then instead of having it be injected through the getRootRegion
          [19:45] <jdcryans> It's one way to do it
          [19:45] <jdcryans> another one would be to store that value in ZK
          [19:45] <nitay> better suggestions?
          [19:45] <jdcryans> so that we don't rely on the master for that info
          [19:46] <nitay> ah ok so just some empty ephemeral file?
          [19:46] <jdcryans> for example, during normal operations the master fails and a new client tries to instantiate a HCM
          [19:46] <nitay> when master comes back up its in safe mode again?
          [19:47] <jdcryans> was thinking about exactly the same
          [19:47] <jdcryans> but I don't know...
          [19:47] <jdcryans> we should try it first in a busy cluster
          [19:47] <jdcryans> for the mo, I would make it an ephemeral file
          [19:48] <jdcryans> the less the client is coupled to HMaster the better
          [19:48] <nitay> k
          [19:48] <nitay> so the RM owned by HMaster would start up in safe mode
          [19:49] <jdcryans> yep
          [19:50] <nitay> ok, ephemeral it is, makes sense
          [19:50] <jdcryans> I think it's a safe design
          [19:51] <nitay> oh i also found another interesting thing
          [19:52] <nitay> when we get a regionServerStartup
          [19:53] <nitay> if the old data from that server tells us it was serving the root region we clear it out by calling setRootRegionLocation(null)
          [19:53] <nitay> right now that would NPE in ZKW
          [19:54] <jdcryans> so that would be a delete?
          [19:54] <nitay> so im making it so that if u pass in null it deletes the file

          Show
          Jean-Daniel Cryans added a comment - Had this chat with Nitay on IRC: [19:42] <nitay> got your updates, gonna make the write do some retries and then throw all the way up, which should shutdown the HMaster? [19:43] <nitay> also, about the safe mode, so i saw that i removed that but i thought that was safe b/c the client will just keep trying to read the root region location? [19:43] <jdcryans> the retries should be done in RegionManager [19:43] <nitay> jdcryans: oh ok i can move them there [19:43] <jdcryans> yeah but the root region will be assigned first, but that doesn't mean that the meta regions are all assigned [19:44] <jdcryans> so the safe mode should really work like in hdfs [19:44] <nitay> how does it work in hdfs? [19:44] <jdcryans> you can't do anything until safe mode is off [19:45] <nitay> ok, i can make that a direct rpc then instead of having it be injected through the getRootRegion [19:45] <jdcryans> It's one way to do it [19:45] <jdcryans> another one would be to store that value in ZK [19:45] <nitay> better suggestions? [19:45] <jdcryans> so that we don't rely on the master for that info [19:46] <nitay> ah ok so just some empty ephemeral file? [19:46] <jdcryans> for example, during normal operations the master fails and a new client tries to instantiate a HCM [19:46] <nitay> when master comes back up its in safe mode again? [19:47] <jdcryans> was thinking about exactly the same [19:47] <jdcryans> but I don't know... [19:47] <jdcryans> we should try it first in a busy cluster [19:47] <jdcryans> for the mo, I would make it an ephemeral file [19:48] <jdcryans> the less the client is coupled to HMaster the better [19:48] <nitay> k [19:48] <nitay> so the RM owned by HMaster would start up in safe mode [19:49] <jdcryans> yep [19:50] <nitay> ok, ephemeral it is, makes sense [19:50] <jdcryans> I think it's a safe design [19:51] <nitay> oh i also found another interesting thing [19:52] <nitay> when we get a regionServerStartup [19:53] <nitay> if the old data from that server tells us it was serving the root region we clear it out by calling setRootRegionLocation(null) [19:53] <nitay> right now that would NPE in ZKW [19:54] <jdcryans> so that would be a delete? [19:54] <nitay> so im making it so that if u pass in null it deletes the file
          Hide
          Jean-Daniel Cryans added a comment -

          Some comments on the patch.

          • Lines should be capped at 80 chars, ZKW is not very compliant.
          • Currently how readRootRegionLocation handles exception is OK since we do retries. How we handle writeRootRegionLocation is something else... either do retries or shutdown HBase since we can't run without ZK.
          • I tried the PE sequentialWrite on my laptop, went fine. It also restarted fine.
          • Something important that this patch does is that it removes the check on HMaster's safe mode. Maybe we should verify if we are in safe mode when starting new clients? (client starts, verifies state of HMaster in HCM).
          Show
          Jean-Daniel Cryans added a comment - Some comments on the patch. Lines should be capped at 80 chars, ZKW is not very compliant. Currently how readRootRegionLocation handles exception is OK since we do retries. How we handle writeRootRegionLocation is something else... either do retries or shutdown HBase since we can't run without ZK. I tried the PE sequentialWrite on my laptop, went fine. It also restarted fine. Something important that this patch does is that it removes the check on HMaster's safe mode. Maybe we should verify if we are in safe mode when starting new clients? (client starts, verifies state of HMaster in HCM).
          Hide
          Nitay Joffe added a comment -

          Version 2 attached with fixes from IRC conversation

          Show
          Nitay Joffe added a comment - Version 2 attached with fixes from IRC conversation
          Hide
          Jean-Daniel Cryans added a comment -

          Review of that patch:

          <jdcryans> nitay: little reminder, lines must be < than 80 chars
          <nitay> k
          <jdcryans> nitay: in ZookeeperWrapper (let's call it ZKW), at line 76 there's a weird char
          <nitay> hehe ye u'r eright, no idea how that got in there
          <jdcryans> yeah, was giving a javadoc warning
          <jdcryans> in RegionManager, should we initializeZookeeper before reassignRootRegion?
          <jdcryans> nitay: what do you think?
          <nitay> looking
          <jdcryans> also in RegionManager, I think writeRootRegionLocationToZooKeeper begs to belong to ZKW
          <nitay> ye i was gonna put it there only reason i didnt was cause i didnt want everyone to have access to it
          <nitay> and yes it makes sense to me that it be before reassignRootRegion
          <jdcryans> yeah prevents race conditions IMO
          <nitay> oh, explain please, im not seeing it
          <nitay> what has a race condition
          <jdcryans> if for some reason it takes a long time to connect to ZK, the connection may not be ready when we assign the root region
          <nitay> that's what waitForConnection is for
          <nitay> ZooKeeper zooKeeper = zooKeeperWrapper.waitForZooKeeperConnection();
          <nitay> now mind u, if we can't reach ZK, RM will block on this, inside its constructor, right now
          <nitay> i was seeing cases where ZK wasn't ready yet
          <nitay> (particulaly b/c im on your old set of patches which run it last)
          <jdcryans> hehe
          <jdcryans> ok, but I would still put it before as a safety
          <nitay> sure sounds good
          <jdcryans> in the case we don't want to provide an access to writeRootRegionLocationToZooKeeper (but in a way even if you don't have the method, you can still do it the same way it is done in that method), I think you should provide a nicer method in ZKW to write into ZK
          <jdcryans> or we would have that same code everywhere in hbase soon
          <nitay> jdcryans: ok, that's a good point that the clients have raw access to the ZK object, so yeah i'll move it into ZKW
          <jdcryans> also it's more coherent with readRootRegionLocation
          <jdcryans> but maybe we will still want a generic method to write into ZK.... we'll see later when we add other features
          <jdcryans> nitay: you will also be able to clean those imports
          <nitay> right
          <jdcryans> nitay: ah but there is still that initializeZookeeper with lots of ZK stuff in it...
          <nitay> jdcryans: yeah i can make write do that stuff, its basically just making sure the parent node exists
          <nitay> which really shouldn't need to be done till the first write
          <jdcryans> you mean adding a generic "write" method in ZKW?
          <nitay> no i meant make writeRootRegionLocation create the parent node if it doesnt exist
          <nitay> rather than having the RM constructor make it
          <jdcryans> ah ok good
          <jdcryans> nitay: my last nitpick, why lazy instantiating ZKW in HCM? (wondering)
          <nitay> ah hehe
          <nitay> constructor doesn't throw
          <nitay> so i figured instead of making it throw something new that all the users of it would now have to handle, or making it do something sensible, i can just add on to the method that uses it which already throws IOException
          nitay gives really long explanation for laziness?
          <jdcryans> lol
          <jdcryans> maybe just add that explanation in your code
          <nitay> ok

          Show
          Jean-Daniel Cryans added a comment - Review of that patch: <jdcryans> nitay: little reminder, lines must be < than 80 chars <nitay> k <jdcryans> nitay: in ZookeeperWrapper (let's call it ZKW), at line 76 there's a weird char <nitay> hehe ye u'r eright, no idea how that got in there <jdcryans> yeah, was giving a javadoc warning <jdcryans> in RegionManager, should we initializeZookeeper before reassignRootRegion? <jdcryans> nitay: what do you think? <nitay> looking <jdcryans> also in RegionManager, I think writeRootRegionLocationToZooKeeper begs to belong to ZKW <nitay> ye i was gonna put it there only reason i didnt was cause i didnt want everyone to have access to it <nitay> and yes it makes sense to me that it be before reassignRootRegion <jdcryans> yeah prevents race conditions IMO <nitay> oh, explain please, im not seeing it <nitay> what has a race condition <jdcryans> if for some reason it takes a long time to connect to ZK, the connection may not be ready when we assign the root region <nitay> that's what waitForConnection is for <nitay> ZooKeeper zooKeeper = zooKeeperWrapper.waitForZooKeeperConnection(); <nitay> now mind u, if we can't reach ZK, RM will block on this, inside its constructor, right now <nitay> i was seeing cases where ZK wasn't ready yet <nitay> (particulaly b/c im on your old set of patches which run it last) <jdcryans> hehe <jdcryans> ok, but I would still put it before as a safety <nitay> sure sounds good <jdcryans> in the case we don't want to provide an access to writeRootRegionLocationToZooKeeper (but in a way even if you don't have the method, you can still do it the same way it is done in that method), I think you should provide a nicer method in ZKW to write into ZK <jdcryans> or we would have that same code everywhere in hbase soon <nitay> jdcryans: ok, that's a good point that the clients have raw access to the ZK object, so yeah i'll move it into ZKW <jdcryans> also it's more coherent with readRootRegionLocation <jdcryans> but maybe we will still want a generic method to write into ZK.... we'll see later when we add other features <jdcryans> nitay: you will also be able to clean those imports <nitay> right <jdcryans> nitay: ah but there is still that initializeZookeeper with lots of ZK stuff in it... <nitay> jdcryans: yeah i can make write do that stuff, its basically just making sure the parent node exists <nitay> which really shouldn't need to be done till the first write <jdcryans> you mean adding a generic "write" method in ZKW? <nitay> no i meant make writeRootRegionLocation create the parent node if it doesnt exist <nitay> rather than having the RM constructor make it <jdcryans> ah ok good <jdcryans> nitay: my last nitpick, why lazy instantiating ZKW in HCM? (wondering) <nitay> ah hehe <nitay> constructor doesn't throw <nitay> so i figured instead of making it throw something new that all the users of it would now have to handle, or making it do something sensible, i can just add on to the method that uses it which already throws IOException nitay gives really long explanation for laziness? <jdcryans> lol <jdcryans> maybe just add that explanation in your code <nitay> ok
          Hide
          Nitay Joffe added a comment -

          Initial stab at using zookeeper to store root region location. Missing tests and probably has a few issues within the code that warrant some discussion.

          Show
          Nitay Joffe added a comment - Initial stab at using zookeeper to store root region location. Missing tests and probably has a few issues within the code that warrant some discussion.
          Hide
          Jean-Daniel Cryans added a comment -

          Little cleanup, made that ZK starts first and ends last. Requires that zookeeper is in that libs.

          Show
          Jean-Daniel Cryans added a comment - Little cleanup, made that ZK starts first and ends last. Requires that zookeeper is in that libs.
          Hide
          Jean-Daniel Cryans added a comment -

          Rough cut for the scripts. Doing bin/start-all.sh starts a ZK server on the server specified in conf/zookeepers. More work is obviously needed because ZK is very different from what we see in Hadoop/HBase.

          Show
          Jean-Daniel Cryans added a comment - Rough cut for the scripts. Doing bin/start-all.sh starts a ZK server on the server specified in conf/zookeepers. More work is obviously needed because ZK is very different from what we see in Hadoop/HBase.
          Hide
          stack added a comment -

          Bunch of us had chat on ZK integration for 0.20.0 up on IRC. Below was result:

          19:01 < st^Ack_> So, let me try summarize.
          19:01 < st^Ack_> 1. Zk all the time
          19:01 < st^Ack_> 2. We manage a single instance as default
          19:01 < st^Ack_> 3. Unless, user supplies URL to a ZK cluster, then we stop starting/stopping zk instance as part of cluster start/stop
          19:02 < st^Ack_> 4. Look at having our single instance log to hdfs, can be done later
          ...
          19:11 < jdcryans> 5. Look at managing a bigger ZK cluster later
          
          Show
          stack added a comment - Bunch of us had chat on ZK integration for 0.20.0 up on IRC. Below was result: 19:01 < st^Ack_> So, let me try summarize. 19:01 < st^Ack_> 1. Zk all the time 19:01 < st^Ack_> 2. We manage a single instance as default 19:01 < st^Ack_> 3. Unless, user supplies URL to a ZK cluster, then we stop starting/stopping zk instance as part of cluster start/stop 19:02 < st^Ack_> 4. Look at having our single instance log to hdfs, can be done later ... 19:11 < jdcryans> 5. Look at managing a bigger ZK cluster later
          Hide
          Jean-Daniel Cryans added a comment -

          The ZK interface won't contain any method to store the schema for the moment. See HBASE-451.

          Show
          Jean-Daniel Cryans added a comment - The ZK interface won't contain any method to store the schema for the moment. See HBASE-451 .
          Hide
          Jean-Daniel Cryans added a comment -

          First try at an Interface for the distributed lock system.

          Show
          Jean-Daniel Cryans added a comment - First try at an Interface for the distributed lock system.
          Hide
          Jim Kellerman added a comment -

          ZK integration is post 0.2. as we are trying to wrap up the enough of the nasty issues so we can release something that works with hadoop 0.17

          Show
          Jim Kellerman added a comment - ZK integration is post 0.2. as we are trying to wrap up the enough of the nasty issues so we can release something that works with hadoop 0.17
          Hide
          Andrew Purtell added a comment -

          Will ZK integration be targeted for 0.2, or beyond? What say ye? I have some work pending for HBASE-42 that would not be that profitable to do if ZK will go into 0.2. Better to work on HTableDescriptor hosting on ZK now in that case.

          Show
          Andrew Purtell added a comment - Will ZK integration be targeted for 0.2, or beyond? What say ye? I have some work pending for HBASE-42 that would not be that profitable to do if ZK will go into 0.2. Better to work on HTableDescriptor hosting on ZK now in that case.
          Hide
          Jean-Daniel Cryans added a comment -

          As Jim said on the mailing list, storing the schema in ZK would solve HBASE-451 instead of having another catalog table named TABLE

          Show
          Jean-Daniel Cryans added a comment - As Jim said on the mailing list, storing the schema in ZK would solve HBASE-451 instead of having another catalog table named TABLE
          Hide
          Jean-Daniel Cryans added a comment -

          In Bigtable paper, page 4, it describes what Chubby is used for exactly :

          • Ensure there is at most 1 active master at any time
          • Store the bootstrap location
          • Discover tablet servers and finalize tabler server death
          • Store the schema information
          • Store access control lists (not sure what it is)

          Ze big question : do we want to implement all of these or just some parts? For example, what would be the benefit of storing all schema information in ZK instead of what we do right now?

          Show
          Jean-Daniel Cryans added a comment - In Bigtable paper, page 4, it describes what Chubby is used for exactly : Ensure there is at most 1 active master at any time Store the bootstrap location Discover tablet servers and finalize tabler server death Store the schema information Store access control lists (not sure what it is) Ze big question : do we want to implement all of these or just some parts? For example, what would be the benefit of storing all schema information in ZK instead of what we do right now?
          Hide
          Jean-Daniel Cryans added a comment -

          Completely overlooked something. This solution misses the fact that if we want to start the processes with start-hbase.sh we need a file with the zk servers in it (like st^ack was saying on IRC, I just woke up). So it means having to duplicate the information. But, if ZK cluster is for more than just HBase usage, leaving the zkservers file blank would enable this really easily.

          Show
          Jean-Daniel Cryans added a comment - Completely overlooked something. This solution misses the fact that if we want to start the processes with start-hbase.sh we need a file with the zk servers in it (like st^ack was saying on IRC, I just woke up). So it means having to duplicate the information. But, if ZK cluster is for more than just HBase usage, leaving the zkservers file blank would enable this really easily.
          Hide
          Jean-Daniel Cryans added a comment -

          This is the kind of patch i was talking on IRC. Works on 2.2.1

          With this, we would have to distribute Zookeeper with HBase. The processes would be started using start-hbase.sh as any other process. In HBase, we read the ZK server config (which would be in hbase-site.xml) and there you go, you know where to connect.

          Show
          Jean-Daniel Cryans added a comment - This is the kind of patch i was talking on IRC. Works on 2.2.1 With this, we would have to distribute Zookeeper with HBase. The processes would be started using start-hbase.sh as any other process. In HBase, we read the ZK server config (which would be in hbase-site.xml) and there you go, you know where to connect.
          Hide
          stack added a comment -

          (after chatting w/ J-D on IRC), I wonder if we should just treat ZK as we currently treat HDFS – its just an address in the hbase-*.xml? Something like:

          <property>
          <name>hbase.zookeeper</name>
          </name>
          <value>ZK_HOST:ZK_PORT/ZK_DATADIR
          </value>
          </property>
          

          Would be grand if no zookeeper configured, we fell back on hbase-only implementation of ZK interface (Would we need to have master and regionservers listening on two ports to do this?)

          Would mean user would have to configure the ZK cluster independent of hbase. Would have to also do the clean independently of hbase too, just as you do hdfs currently.

          Alternatively, we'd have a zookeepers file which would have zookeeper configurations and a list of ZK cluster members with demarcation of leader. On startup, we'd parse ZK settings and write out a ZK configuration on all ZK members of ZK cluster – probably in shell – and start up all members as part of bringing up ZK cluster. Downside to latter is that it would take a bit more work and what to do if ZK cluster is for more than just hbase usage? Upside, user wouldn't have to worry about ZK.

          Show
          stack added a comment - (after chatting w/ J-D on IRC), I wonder if we should just treat ZK as we currently treat HDFS – its just an address in the hbase-*.xml? Something like: <property> <name>hbase.zookeeper</name> </name> <value>ZK_HOST:ZK_PORT/ZK_DATADIR </value> </property> Would be grand if no zookeeper configured, we fell back on hbase-only implementation of ZK interface (Would we need to have master and regionservers listening on two ports to do this?) Would mean user would have to configure the ZK cluster independent of hbase. Would have to also do the clean independently of hbase too, just as you do hdfs currently. Alternatively, we'd have a zookeepers file which would have zookeeper configurations and a list of ZK cluster members with demarcation of leader. On startup, we'd parse ZK settings and write out a ZK configuration on all ZK members of ZK cluster – probably in shell – and start up all members as part of bringing up ZK cluster. Downside to latter is that it would take a bit more work and what to do if ZK cluster is for more than just hbase usage? Upside, user wouldn't have to worry about ZK.
          Hide
          Jean-Daniel Cryans added a comment -

          Some comments :

          • The integration of HBase and ZK will have to be really well done since many people already seems to find it hard to do. Best case scenario would be to not mess with ZK config at all, that it is somehow handled by HBase but difficulties will be to set the servers address, dataDir and leaderServes.
          • To connect to ZK, you have to specify an address and a port. Since we will do some integration, it will be best to know internally all available servers and handle server failure.
          • Suppose we format HDFS to start with a clean system, for the moment it cleans everything that is linked to HBase. With ZK, files will be left there too so ROOT address will be there. We will have to do something really nice to make sure nobody has to do 2 cleanup (HDFS and /var/data/zookeeper) everytime. Maybe something like "read ROOT address; if inexsistant, flush ZK data".
          Show
          Jean-Daniel Cryans added a comment - Some comments : The integration of HBase and ZK will have to be really well done since many people already seems to find it hard to do. Best case scenario would be to not mess with ZK config at all, that it is somehow handled by HBase but difficulties will be to set the servers address, dataDir and leaderServes. To connect to ZK, you have to specify an address and a port. Since we will do some integration, it will be best to know internally all available servers and handle server failure. Suppose we format HDFS to start with a clean system, for the moment it cleans everything that is linked to HBase. With ZK, files will be left there too so ROOT address will be there. We will have to do something really nice to make sure nobody has to do 2 cleanup (HDFS and /var/data/zookeeper) everytime. Maybe something like "read ROOT address; if inexsistant, flush ZK data".

            People

            • Assignee:
              Jean-Daniel Cryans
              Reporter:
              Bryan Duxbury
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development