Uploaded image for project: 'Apache Blur'
  1. Apache Blur
  2. BLUR-259 Add Full Controller list to ZookeeperClusterStatus
  3. BLUR-277

Create an API to fetch the list of total Controller Servers in the Controller Cluster.

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3.0
    • 0.3.0, 0.2.1
    • Blur
    • None

    Description

      Right now we have an API to fetch the list of online controller servers. The method says getControllerServerList() which in turn gets only the online list.
      Suppose the Blur Shell needs a list of both the online and the offline controller list we have no way of getting it today. This API gets the total controller list and with the existing method to get the online controller list, the clients can do a Set difference and get the offline list.

      Attachments

        1. BLUR-259-Subtask_1-MASTER.patch
          7 kB
          Vikrant Navalgund

        Activity

          amccurry Aaron McCurry added a comment -

          Thanks Vikrant! The patch looks good!

          Aaron

          amccurry Aaron McCurry added a comment - Thanks Vikrant! The patch looks good! Aaron
          amccurry Aaron McCurry added a comment - https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=8691ee015c14917b0ee3eacfb563065c3391211e https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=commit;h=6536d48e1bea0ac7a73c42250eed93450c94e70b

          Hi Aaron,
          You are right, its better to be defensive when it comes to zk logical operations.

          I have replaced the old patch with the new one.
          Please review it and let me know.

          Regards,
          Vikrant

          nvikrant Vikrant Navalgund added a comment - Hi Aaron, You are right, its better to be defensive when it comes to zk logical operations. I have replaced the old patch with the new one. Please review it and let me know. Regards, Vikrant
          amccurry Aaron McCurry added a comment -

          If at all possible it is best to avoid exceptions as logical execution paths, especially with ZooKeeper. While this code will work, the exception is logged in the ZooKeeper server logs as well. Overall this makes the logs noisy and hard to read.

          Also since the registering of a controller is a one time thing (the PERSISTENT node) and because it's only done at process startup, I think is would be better to check and create if missing. In most cases since controllers should have unique names (unless there is a config problem), by checking and then creating an exception should never be thrown.

          Also if there is a strange race condition between controller servers where they are both starting at the same time and they are named the same thing (bad config). Both could pass the looping exists check and both go to register the EPHEMERAL nodes at the same time. One wold create the EPHEMERAL node and the other would throw an exception. A NodeExistsException exception and then that rogue process would continue on because it ate the exception and not blow up like it should. While I realize this is an edge case, but if it can happen it will likely happen to someone and take a long time to debug the problem.

          This is how I would rework the method:

          private void registerMyself() {
          String version = BlurUtil.getVersion();
          String controllerPath = ZookeeperPathConstants.getControllersPath() + "/" + _nodeName;
          try {
          if (_zookeeper.exists(controllerPath, false) == null)

          { // Don't set the version in the registered controllers, only in the online _zookeeper.create(controllerPath, null, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); }

          } catch (KeeperException e)

          { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); }

          try {
          String onlineControllerPath = ZookeeperPathConstants.getOnlineControllersPath() + "/" + _nodeName;
          while (_zookeeper.exists(onlineControllerPath, false) != null) {
          LOG.info("Node [

          {0}

          ] already registered, waiting for path [

          {1}

          ] to be released", _nodeName,
          onlineControllerPath);
          Thread.sleep(3000);
          }
          _zookeeper.create(onlineControllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
          } catch (KeeperException e)

          { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); }

          }

          Thanks!

          Aaron

          amccurry Aaron McCurry added a comment - If at all possible it is best to avoid exceptions as logical execution paths, especially with ZooKeeper. While this code will work, the exception is logged in the ZooKeeper server logs as well. Overall this makes the logs noisy and hard to read. Also since the registering of a controller is a one time thing (the PERSISTENT node) and because it's only done at process startup, I think is would be better to check and create if missing. In most cases since controllers should have unique names (unless there is a config problem), by checking and then creating an exception should never be thrown. Also if there is a strange race condition between controller servers where they are both starting at the same time and they are named the same thing (bad config). Both could pass the looping exists check and both go to register the EPHEMERAL nodes at the same time. One wold create the EPHEMERAL node and the other would throw an exception. A NodeExistsException exception and then that rogue process would continue on because it ate the exception and not blow up like it should. While I realize this is an edge case, but if it can happen it will likely happen to someone and take a long time to debug the problem. This is how I would rework the method: private void registerMyself() { String version = BlurUtil.getVersion(); String controllerPath = ZookeeperPathConstants.getControllersPath() + "/" + _nodeName; try { if (_zookeeper.exists(controllerPath, false) == null) { // Don't set the version in the registered controllers, only in the online _zookeeper.create(controllerPath, null, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } } catch (KeeperException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } try { String onlineControllerPath = ZookeeperPathConstants.getOnlineControllersPath() + "/" + _nodeName; while (_zookeeper.exists(onlineControllerPath, false) != null) { LOG.info("Node [ {0} ] already registered, waiting for path [ {1} ] to be released", _nodeName, onlineControllerPath); Thread.sleep(3000); } _zookeeper.create(onlineControllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); } catch (KeeperException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } Thanks! Aaron

          Hi Aaron,
          I suppose I can do that. Here is my reasoning for the changes I made. Please correct me if I am wrong.
          In BlurControllerServer.java:286, we are already looping on the zk.exists("Ephemeral node") and we caught a KeeperException for this.

          With my changes:
          _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
          } catch (KeeperException.NodeExistsException e) {
          LOG.info("Node [

          {0}

          ] already registered, skipping znode creation for controller path", _nodeName);
          } catch (KeeperException e) {

          I am tying the specific exception of NodeExistsException to only the persistent node, because the above loop at line:286 would not throw a NodeExistsException, does it ?
          Also, since we are doing an optimistic node creation for a persistent node, is it not better to go ahead and create or eat the NodeExistsException instead of one query and one set in the worst case.

          I can have each of the node creation in a try/catch block, I suppose that would be a much cleaner code.
          Please let me know your thoughts.

          Regards,
          Vikrant

          nvikrant Vikrant Navalgund added a comment - Hi Aaron, I suppose I can do that. Here is my reasoning for the changes I made. Please correct me if I am wrong. In BlurControllerServer.java:286, we are already looping on the zk.exists("Ephemeral node") and we caught a KeeperException for this. With my changes: _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException e) { LOG.info("Node [ {0} ] already registered, skipping znode creation for controller path", _nodeName); } catch (KeeperException e) { I am tying the specific exception of NodeExistsException to only the persistent node, because the above loop at line:286 would not throw a NodeExistsException, does it ? Also, since we are doing an optimistic node creation for a persistent node, is it not better to go ahead and create or eat the NodeExistsException instead of one query and one set in the worst case. I can have each of the node creation in a try/catch block, I suppose that would be a much cleaner code. Please let me know your thoughts. Regards, Vikrant
          amccurry Aaron McCurry added a comment -

          The patch looks very good, there is one thing I would like to see changed.

          In the BlurControllerServer on the line 288+a couple of lines.
          "+ _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);"

          Since the registering of the controller will only happen once and the normal condition is to not create a new node (since the controller is already registered). I think we should do something like:

          Stat stat = _zookeeper.exists(controllerPath);
          if (stat == null) {
          _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
          } else {
          LOG.info("Controller [

          {0}

          ] is already registered", nodeName);
          }

          Also we should put the EPHEMERAL creation and the PERSISTENT creation in separate try/catch blocks so we know which one may have caused the error.

          Thanks,

          Aaron

          amccurry Aaron McCurry added a comment - The patch looks very good, there is one thing I would like to see changed. In the BlurControllerServer on the line 288+a couple of lines. "+ _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);" Since the registering of the controller will only happen once and the normal condition is to not create a new node (since the controller is already registered). I think we should do something like: Stat stat = _zookeeper.exists(controllerPath); if (stat == null) { _zookeeper.create(controllerPath, version.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } else { LOG.info("Controller [ {0} ] is already registered", nodeName); } Also we should put the EPHEMERAL creation and the PERSISTENT creation in separate try/catch blocks so we know which one may have caused the error. Thanks, Aaron

          This provides the API to fetch the total no. of controllers. With this the clients can fetch the list of online controllers and the total list of controllers, do a set difference and fetch the offline list of Controllers.

          nvikrant Vikrant Navalgund added a comment - This provides the API to fetch the total no. of controllers. With this the clients can fetch the list of online controllers and the total list of controllers, do a set difference and fetch the offline list of Controllers.

          People

            Unassigned Unassigned
            nvikrant Vikrant Navalgund
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 6h
                6h
                Remaining:
                Remaining Estimate - 6h
                6h
                Logged:
                Time Spent - Not Specified
                Not Specified