Flume
  1. Flume
  2. FLUME-1491

Dynamic configuration from Zookeeper watcher

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.2.0
    • Fix Version/s: None
    • Component/s: Configuration
    • Labels:
    • Release Note:
      Hide
      - Added the ability to specify custom configuration provider to flume node (use new optional '-confprovider' option to specify class name
      - Added new configuration provider (org.apache.flume.conf.zookeeper.ZooKeeperConfigurationProvider) to load configuration from ZooKeeper node(s) (need to use option '-confprovider' and specify System property 'zookeeper.connectionstring' with the ZooKeeper connection string when starting the flume-node)
      Show
      - Added the ability to specify custom configuration provider to flume node (use new optional '-confprovider' option to specify class name - Added new configuration provider (org.apache.flume.conf.zookeeper.ZooKeeperConfigurationProvider) to load configuration from ZooKeeper node(s) (need to use option '-confprovider' and specify System property 'zookeeper.connectionstring' with the ZooKeeper connection string when starting the flume-node)

      Description

      Currently, Flume only support file-level dynamic configuration. Another frequent usage in practical environment, we would like to manage configuration with Zookeeper, and modify configuration from Web UI to stored file in Zookeeper.

      Flume should support this method with Zookeeper watcher.

      1. FLUME-1491-2.patch
        62 kB
        Christopher Nagy
      2. FLUME-1491-3.patch
        67 kB
        Christopher Nagy
      3. FLUME-1491-4.patch
        55 kB
        Christopher Nagy
      4. FLUME-1491-5.patch
        17 kB
        Ashish Paliwal
      5. FLUME-1491-6.patch
        20 kB
        Ashish Paliwal
      6. FLUME-1491-7.patch
        22 kB
        Ashish Paliwal

        Issue Links

          Activity

          Hide
          Denny Ye added a comment -

          I have puzzled from structure of 'AbstractFileConfigurationProvider' and 'PropertiesFileConfigurationProvider'. In my opinion, subclass should check periodically from different destination, local or remote file, ... The abstract class can take responsibility for loading configuration that it obtains notification from sub-class. Original structure of that two classes have reversed design with my thought. Could someone give me your suggestion for that where should I expand Zookeeper file as the destination?

          Show
          Denny Ye added a comment - I have puzzled from structure of 'AbstractFileConfigurationProvider' and 'PropertiesFileConfigurationProvider'. In my opinion, subclass should check periodically from different destination, local or remote file, ... The abstract class can take responsibility for loading configuration that it obtains notification from sub-class. Original structure of that two classes have reversed design with my thought. Could someone give me your suggestion for that where should I expand Zookeeper file as the destination?
          Hide
          Christopher Nagy added a comment - - edited

          It looks like PropertiesFileConfigurationProvider does a lot of work besides loading the configuration from the property file. Probably the idea was to have other config file formats.
          My idea for Zookeeper would be to add another ConfigurationProvider at the top level (and have it switchable in the Application). So the class hierarchy would be:

          ConfigurationProvider
          |
          |- AbstractConfigurationProvider
             |
             |- AbstractFileConfigurationProvider
             |  |
             |  |- PropertiesFileConfigurationProvider
             |
             |- ZookeeperConfigurationProvider
          

          AbstractConfigurationProvider would contain all the non-file related methods in AbstractFileConfigurationProvider and PropertiesFileConfigurationProvider. ZookeeperConfigurationProvider would create a FlumeConfiguration from data stored in Zookeeper and listen for changes on Zookeeper.

          Show
          Christopher Nagy added a comment - - edited It looks like PropertiesFileConfigurationProvider does a lot of work besides loading the configuration from the property file. Probably the idea was to have other config file formats. My idea for Zookeeper would be to add another ConfigurationProvider at the top level (and have it switchable in the Application). So the class hierarchy would be: ConfigurationProvider | |- AbstractConfigurationProvider | |- AbstractFileConfigurationProvider | | | |- PropertiesFileConfigurationProvider | |- ZookeeperConfigurationProvider AbstractConfigurationProvider would contain all the non-file related methods in AbstractFileConfigurationProvider and PropertiesFileConfigurationProvider. ZookeeperConfigurationProvider would create a FlumeConfiguration from data stored in Zookeeper and listen for changes on Zookeeper.
          Hide
          Jarek Jarcec Cecho added a comment -

          Idea to have configuration files in zookeeper is something that is sitting in my queue for quite some time, so thank you very much for picking this issue up.

          Would you mind putting together some prototype patch for your suggested class hierarchy and functionality?

          Jarcec

          Show
          Jarek Jarcec Cecho added a comment - Idea to have configuration files in zookeeper is something that is sitting in my queue for quite some time, so thank you very much for picking this issue up. Would you mind putting together some prototype patch for your suggested class hierarchy and functionality? Jarcec
          Hide
          Christopher Nagy added a comment -

          Prototype for ZooKeeper configuration provider. Summary of changes:

          • Moved from flume-ng-node to flume-ng-core (SimpleNodeConfiguration, ConfigurationProvider, NodeConfiguration, NodeConfigurationAware)
          • Changed package of SimpleNodeConfiguration to org.apache.flume.node
          • Created org.apache.flume.node.AbstractConfigurationProvider and moved most getters/setters from AbstractFileConfigurationProvider and loadChannels, loadSources and loadSinks from PropertiesFileConfigurationProvider
          • Created flume-ng-configuration-zookeeper with ZooKeeperConfigurationProvider (and unit test). It requires a system property called 'zookeeper.connectionstring' which is the connection string used to connect to ZooKeeper
          • Changed Application to load the custom config provider when the 'confprovider' is specified (with the classname)
          Show
          Christopher Nagy added a comment - Prototype for ZooKeeper configuration provider. Summary of changes: Moved from flume-ng-node to flume-ng-core (SimpleNodeConfiguration, ConfigurationProvider, NodeConfiguration, NodeConfigurationAware) Changed package of SimpleNodeConfiguration to org.apache.flume.node Created org.apache.flume.node.AbstractConfigurationProvider and moved most getters/setters from AbstractFileConfigurationProvider and loadChannels, loadSources and loadSinks from PropertiesFileConfigurationProvider Created flume-ng-configuration-zookeeper with ZooKeeperConfigurationProvider (and unit test). It requires a system property called 'zookeeper.connectionstring' which is the connection string used to connect to ZooKeeper Changed Application to load the custom config provider when the 'confprovider' is specified (with the classname)
          Hide
          Denny Ye added a comment -

          hi Christopher, thanks for your patch. I agree with most of your changes. Only two points I would like to explain my thought.
          1. I reviewed patch and found that you chose configuration tree at Zookeeper. In my opinion, it's not easy to maintain and management. Original design of mine is only storing whole configuration file at Zookeeper, moving from local file. We can talk about tradeoff of both choices.
          2. If we are using the simple structure talked above. Class hierarchy of ConfigurationProvider might be

          ConfigurationProvider
          |
          |- AbstractFileConfigurationProvider
             |
             |- PropertiesFileConfigurationProvider
             |
             |- ZookeeperConfigurationProvider
          

          The difference between configuration provider is configuration file source, local file, Zookeeper or remote network,...

          Wish to discuss with you

          -Denny Ye

          Show
          Denny Ye added a comment - hi Christopher, thanks for your patch. I agree with most of your changes. Only two points I would like to explain my thought. 1. I reviewed patch and found that you chose configuration tree at Zookeeper. In my opinion, it's not easy to maintain and management. Original design of mine is only storing whole configuration file at Zookeeper, moving from local file. We can talk about tradeoff of both choices. 2. If we are using the simple structure talked above. Class hierarchy of ConfigurationProvider might be ConfigurationProvider | |- AbstractFileConfigurationProvider | |- PropertiesFileConfigurationProvider | |- ZookeeperConfigurationProvider The difference between configuration provider is configuration file source, local file, Zookeeper or remote network,... Wish to discuss with you -Denny Ye
          Hide
          Christopher Nagy added a comment -

          Hi Danny,

          Do you mean that the whole properties string would be stored as a node value in ZooKeeper? If so, the ZooKeeperConfigurationProvider's reload method could be changed to read the value and call Properties#load (and remove the ZooKeeperConfigurationProvider#loadProperties method).
          I'm not sure that putting the ZooKeeperConfigurationProvider under AbstractFileConfigurationProvider is a good idea. The only functionality that AbstractFileConfigurationProvider provides (in the patch) is monitoring a local file for changes, while ZooKeeperConfigurationProvider needs to monitor a node (or nodes).
          I guess it also comes down to how you want to specify which configuration provider to use. In the patch, it would be a new option (and a System property to specify the connection string). Maybe we could specify the connection string in the conf parameter (as a uri or something similar) and use a factory to decide which configuration provider.

          Show
          Christopher Nagy added a comment - Hi Danny, Do you mean that the whole properties string would be stored as a node value in ZooKeeper? If so, the ZooKeeperConfigurationProvider's reload method could be changed to read the value and call Properties#load (and remove the ZooKeeperConfigurationProvider#loadProperties method). I'm not sure that putting the ZooKeeperConfigurationProvider under AbstractFileConfigurationProvider is a good idea. The only functionality that AbstractFileConfigurationProvider provides (in the patch) is monitoring a local file for changes, while ZooKeeperConfigurationProvider needs to monitor a node (or nodes). I guess it also comes down to how you want to specify which configuration provider to use. In the patch, it would be a new option (and a System property to specify the connection string). Maybe we could specify the connection string in the conf parameter (as a uri or something similar) and use a factory to decide which configuration provider.
          Hide
          Denny Ye added a comment -

          hi Christopher,
          Well, I would like to choose unified Zookeeper file to store the whole properties. It's nice for us to reuse the 'loadProperties' method that doing same operation with local file.

          Even if we have similar loading process for both Zookeeper and local file, it's better to merge or recognize that we want to obtain whole file from anywhere. Thus, they can be treated with 'FileConfiguration', can use common abstract class legally.

          I agree with you about the initial configuration for Zookeeper. It can be set into Agent startup process. What's your mind?

          Show
          Denny Ye added a comment - hi Christopher, Well, I would like to choose unified Zookeeper file to store the whole properties. It's nice for us to reuse the 'loadProperties' method that doing same operation with local file. Even if we have similar loading process for both Zookeeper and local file, it's better to merge or recognize that we want to obtain whole file from anywhere. Thus, they can be treated with 'FileConfiguration', can use common abstract class legally. I agree with you about the initial configuration for Zookeeper. It can be set into Agent startup process. What's your mind?
          Hide
          Christopher Nagy added a comment -

          Hi Danny,

          I'll modify the ZooKeeperConfigurationProvider to detect if the root node has the contents of the whole flume configuration. Since FlumeConfiguration only accepts a Properties object, I don't see the point of having a FileConfiguration object. Also, all the code (methods) that will be reused by PropertiesFileConfigurationProvider and ZooKeeperConfigurationProvider is now in the AbstractConfigurationProvider.

          Show
          Christopher Nagy added a comment - Hi Danny, I'll modify the ZooKeeperConfigurationProvider to detect if the root node has the contents of the whole flume configuration. Since FlumeConfiguration only accepts a Properties object, I don't see the point of having a FileConfiguration object. Also, all the code (methods) that will be reused by PropertiesFileConfigurationProvider and ZooKeeperConfigurationProvider is now in the AbstractConfigurationProvider.
          Hide
          Christopher Nagy added a comment -

          Same as FLUME-1491-2.patch, but added the capability to read the contents of a property file from the root flume node

          Show
          Christopher Nagy added a comment - Same as FLUME-1491 -2.patch, but added the capability to read the contents of a property file from the root flume node
          Hide
          Hari Shreedharan added a comment -

          Good work on the patch, some comments before we start a full review:

          • AbstractConfigurationProvider and related classes should not be in flume-ng-core. Either leave them in the current package or move them to flume-ng-configuration.
          • Ideally flume-ng-configuration should not have dependencies on any other flume modules. The original idea was to allow this to be used for validating individual components' configuration. Eventually, I'd like to complete that work.
          • The Zookeeper provider can simply be a separate package and not a submodule right? Do we need it to be a separate module - I am ok with either, just asking.

          When you feel the patch is ready for review/commit, can you submit this for review on reviewboard?

          Show
          Hari Shreedharan added a comment - Good work on the patch, some comments before we start a full review: AbstractConfigurationProvider and related classes should not be in flume-ng-core. Either leave them in the current package or move them to flume-ng-configuration. Ideally flume-ng-configuration should not have dependencies on any other flume modules. The original idea was to allow this to be used for validating individual components' configuration. Eventually, I'd like to complete that work. The Zookeeper provider can simply be a separate package and not a submodule right? Do we need it to be a separate module - I am ok with either, just asking. When you feel the patch is ready for review/commit, can you submit this for review on reviewboard?
          Hide
          Hari Shreedharan added a comment -

          Assigning to Christopher since he submitted the first 2 patches.

          Show
          Hari Shreedharan added a comment - Assigning to Christopher since he submitted the first 2 patches.
          Hide
          Christopher Nagy added a comment -

          If it's alright with flume-ng-node now depending on ZooKeeper than I'll move all the code back into flume-ng-node

          Show
          Christopher Nagy added a comment - If it's alright with flume-ng-node now depending on ZooKeeper than I'll move all the code back into flume-ng-node
          Hide
          Christopher Nagy added a comment -

          Move changed code into flume-ng-node.

          Show
          Christopher Nagy added a comment - Move changed code into flume-ng-node.
          Hide
          Christopher Nagy added a comment -

          Refactored code for configuration provider so that common functions to start source, channel and sink are in an abstract base class that is used by the PropertiesFileConfigurationProvider and ZooKeeperConfigurationProvider.
          The implementation of those classes now just load the configuration and detect when changes occur so that the flume node can update it's sources, channels and sinks.

          Show
          Christopher Nagy added a comment - Refactored code for configuration provider so that common functions to start source, channel and sink are in an abstract base class that is used by the PropertiesFileConfigurationProvider and ZooKeeperConfigurationProvider. The implementation of those classes now just load the configuration and detect when changes occur so that the flume node can update it's sources, channels and sinks.
          Hide
          Mike Percy added a comment -

          This one needs a lot more discussion and agreement, will not make it into v1.4.0. Clearing fixVersion.

          Show
          Mike Percy added a comment - This one needs a lot more discussion and agreement, will not make it into v1.4.0. Clearing fixVersion.
          Hide
          Ashish Paliwal added a comment -

          This would be a nice addition for configuration option. IMHO, reading complete file from root would be preferred, since it makes it easy to upload for end users, else we need to work on a script that would parse and upload in the Node tree format. Also, we can use Curator for Zookeeper interaction. If this work out, we can think of a Flume Agents current state in Zk, probably to aid some kind of Monitoring UI's.

          Show
          Ashish Paliwal added a comment - This would be a nice addition for configuration option. IMHO, reading complete file from root would be preferred, since it makes it easy to upload for end users, else we need to work on a script that would parse and upload in the Node tree format. Also, we can use Curator for Zookeeper interaction. If this work out, we can think of a Flume Agents current state in Zk, probably to aid some kind of Monitoring UI's.
          Hide
          Ashish Paliwal added a comment -

          Taking over the JIRA. Have implemented a simple implementation, which expects complete configuration file in the zookeeper node. Working on test cases, shall upload the patch by tomorrow.

          Show
          Ashish Paliwal added a comment - Taking over the JIRA. Have implemented a simple implementation, which expects complete configuration file in the zookeeper node. Working on test cases, shall upload the patch by tomorrow.
          Hide
          Hari Shreedharan added a comment -

          You'd also need to pass in a command line option (or add it to flume-env.sh) to let the Application calls know which config provider to use.

          Show
          Hari Shreedharan added a comment - You'd also need to pass in a command line option (or add it to flume-env.sh) to let the Application calls know which config provider to use.
          Hide
          Ashish Paliwal added a comment -

          Yup, The changes are done, yet to test them. I shall upload the patch before updating User Guide. Used Curator, so code is small. I will run a real example before uploading for review. Hopefully, shall be done over the weekend. Zookeeper configuration provider has Unit tests completed.

          Show
          Ashish Paliwal added a comment - Yup, The changes are done, yet to test them. I shall upload the patch before updating User Guide. Used Curator, so code is small. I will run a real example before uploading for review. Hopefully, shall be done over the weekend. Zookeeper configuration provider has Unit tests completed.
          Hide
          Ashish Paliwal added a comment -

          Initial beta version. Need more discussion on how to expect params form user. Right now one can run with following command
          bin/flume-ng agent --name a1 --conf ./conf/ -z localhost:2181 -Dflume.root.logger=INFO,console

          --z or --zkConnString specify the zookeeper ensemble, base path is also supported. If the params are present, the configuration switches to Zookeeper mode. One has to upload Agent config file under /basepath/agentname

          /basepath defaults to flume

          I have started one agent using the param. The implementation needs a little more refinement on logging and error handling.

          In TestZookeeperConfigurationProvider, there is main method which can help in uploading file for testing. This won't be present in final patch. Would move this to a util class.

          Feedback/suggestions welcome.

          Show
          Ashish Paliwal added a comment - Initial beta version. Need more discussion on how to expect params form user. Right now one can run with following command bin/flume-ng agent --name a1 --conf ./conf/ -z localhost:2181 -Dflume.root.logger=INFO,console --z or --zkConnString specify the zookeeper ensemble, base path is also supported. If the params are present, the configuration switches to Zookeeper mode. One has to upload Agent config file under /basepath/agentname /basepath defaults to flume I have started one agent using the param. The implementation needs a little more refinement on logging and error handling. In TestZookeeperConfigurationProvider, there is main method which can help in uploading file for testing. This won't be present in final patch. Would move this to a util class. Feedback/suggestions welcome.
          Hide
          Ashish Paliwal added a comment -

          Hari ShreedharanBrock NolandMike Percy Folks any feedback from User perspective on the options used. I plan to upload a it for review over the weekend.

          Show
          Ashish Paliwal added a comment - Hari Shreedharan Brock Noland Mike Percy Folks any feedback from User perspective on the options used. I plan to upload a it for review over the weekend.
          Hide
          Bruno Mahé added a comment -
          • flume-ng-node/pom.xml versions of curator should be properties. This makes it easier to override them at build time without extra patches
          • Is automatic reload of the configuration planned?

          Nitpicks:

          • "!Strings.isNullOrEmpty(zkConnString)" why not using "StringUtils.isNotBlank(zkConnString)" ?
          Show
          Bruno Mahé added a comment - flume-ng-node/pom.xml versions of curator should be properties. This makes it easier to override them at build time without extra patches Is automatic reload of the configuration planned? Nitpicks: "!Strings.isNullOrEmpty(zkConnString)" why not using "StringUtils.isNotBlank(zkConnString)" ?
          Hide
          Ashish Paliwal added a comment -

          I shall fix as per the comments.

          About auto-reload, I am not sure. File based implementation has one mode where the config file change would initiate a reload.
          If its needed we can add a watch on the node and initiate a reload on the change notification. IMHO, we can get this simple implementation in, and then can refine it.

          Thanks for the review.

          Show
          Ashish Paliwal added a comment - I shall fix as per the comments. About auto-reload, I am not sure. File based implementation has one mode where the config file change would initiate a reload. If its needed we can add a watch on the node and initiate a reload on the change notification. IMHO, we can get this simple implementation in, and then can refine it. Thanks for the review.
          Hide
          Brock Noland added a comment -

          Hi Ashish,

          Great work!! Thank you very much for contributing to Flume! I have two concerns with the current approach:

          1) I think the big use-case for pulling configuration from ZK would be automatic reconfiguration but this patch doesn't implement that.
          2) The patch stores the entire contents of the file in a single znode which has a 1MB size limit by default.

          Show
          Brock Noland added a comment - Hi Ashish, Great work!! Thank you very much for contributing to Flume! I have two concerns with the current approach: 1) I think the big use-case for pulling configuration from ZK would be automatic reconfiguration but this patch doesn't implement that. 2) The patch stores the entire contents of the file in a single znode which has a 1MB size limit by default.
          Hide
          Ashish Paliwal added a comment -

          Hi Brock,

          Thanks for the review

          For #1 I haven't started on it, was waiting for the feedback. Shall start working on it.
          For #2 The file is store per agent, AFAIK, even with a lot of configurations options, the file may not even breach 100KB.

          I may not have complete details for #2, so my assumption may be incorrect. My day job is not related to Flume, so don't have production experience. Let me know, I will try find other options.

          Show
          Ashish Paliwal added a comment - Hi Brock, Thanks for the review For #1 I haven't started on it, was waiting for the feedback. Shall start working on it. For #2 The file is store per agent, AFAIK, even with a lot of configurations options, the file may not even breach 100KB. I may not have complete details for #2, so my assumption may be incorrect. My day job is not related to Flume, so don't have production experience. Let me know, I will try find other options.
          Hide
          Ashish Paliwal added a comment -

          Beta patch

          Added support for dynamic reconfiguration, and updated as per review comments.

          It needs a little more work on test cases side, and would do a little refactoring to remove duplicate code and code branches.
          Please expect a patch over the weekend

          Show
          Ashish Paliwal added a comment - Beta patch Added support for dynamic reconfiguration, and updated as per review comments. It needs a little more work on test cases side, and would do a little refactoring to remove duplicate code and code branches. Please expect a patch over the weekend
          Hide
          Ashish Paliwal added a comment -

          Updated patch.

          Docs changes are missing. Not sure which sections to fit them in.

          Show
          Ashish Paliwal added a comment - Updated patch. Docs changes are missing. Not sure which sections to fit them in.

            People

            • Assignee:
              Ashish Paliwal
              Reporter:
              Denny Ye
            • Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development