Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13234

Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • fs, ha, hdfs-client
    • None

    Description

      The memory footprint of #DFSClient is very considerable in some special scenario since there are many #Configuration instances and occupy much memory resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies over 600MB we meet under HDFS Federation an HA with QJM and there are dozens of NameNodes). I think some new Configuration instance is not necessary. Such as #ConfiguredFailoverProxyProvider initialization.

        public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
            Class<T> xface, HAProxyFactory<T> factory) {
          this.xface = xface;
          this.conf = new Configuration(conf);
          ......
        }
      

      Attachments

        1. HDFS-13234.001.patch
          1.0 kB
          Xiaoqiao He

        Activity

          hexiaoqiao Xiaoqiao He added a comment -

          upload patch v1 for trunk and pending jenkins.

          hexiaoqiao Xiaoqiao He added a comment - upload patch v1 for trunk and pending jenkins.

          kihwal, you had some good points on HDFS-13195 to a somewhat related topic.
          Do you mind chiming in here?

          elgoiri Íñigo Goiri added a comment - kihwal , you had some good points on HDFS-13195 to a somewhat related topic. Do you mind chiming in here?
          kihwal Kihwal Lee added a comment -

           Configuration occupies over 600MB

          How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Sometimes a conf can get embedded in another conf. Avoiding unnecessarily duplicated confs is a good thing, but looking into what is causing the bloat and fixing that will also be important.

          kihwal Kihwal Lee added a comment -  Configuration occupies over 600MB How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Sometimes a conf can get embedded in another conf. Avoiding unnecessarily duplicated confs is a good thing, but looking into what is causing the bloat and fixing that will also be important.
          kihwal Kihwal Lee added a comment -

          New conf objects are created to prevent unintended conf update propagation. If we have an overlay config feature, we could achieve the same thing without duplicating the entire conf object. Configuration has something overlay-like, but I was told it does not work the way we want.

          kihwal Kihwal Lee added a comment - New conf objects are created to prevent unintended conf update propagation. If we have an overlay config feature, we could achieve the same thing without duplicating the entire conf object. Configuration has something overlay-like, but I was told it does not work the way we want.
          hexiaoqiao Xiaoqiao He added a comment -

          Thanks kihwal,elgoiri for your comments.

          How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs.

          Actually this is yarn logs upload service, and the size of single Configuration instance which located at NodeManager is about 120KB, but it is bloated to 600MB over all Configuration instances since two factors:
          a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), and it create ConfiguredFailoverProxyProvider instance for each name service at client, while num of Configuration instances will *2;
          b. there are 150 single threads at most to execute upload yarn logs to HDFS;
          so, in the extreme case, memory footprint of Configuration instances will occupy ~20 * 2 * 150 * 120KB;

          New conf objects are created to prevent unintended conf update propagation.

          it is true to prevent unintended conf update propagation, I think there are other ways to avoid clone the whole conf for only two parameters of ConfiguredFailoverProxyProvider and IPFailoverProxyProvider and waste huge memory resource probably as you mentioned, is there some suggestions? kihwal

          Thanks again.

          hexiaoqiao Xiaoqiao He added a comment - Thanks kihwal , elgoiri for your comments. How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Actually this is yarn logs upload service, and the size of single Configuration instance which located at NodeManager is about 120KB, but it is bloated to 600MB over all Configuration instances since two factors: a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), and it create ConfiguredFailoverProxyProvider instance for each name service at client, while num of Configuration instances will *2; b. there are 150 single threads at most to execute upload yarn logs to HDFS; so, in the extreme case, memory footprint of Configuration instances will occupy ~20 * 2 * 150 * 120KB; New conf objects are created to prevent unintended conf update propagation. it is true to prevent unintended conf update propagation, I think there are other ways to avoid clone the whole conf for only two parameters of ConfiguredFailoverProxyProvider and IPFailoverProxyProvider and waste huge memory resource probably as you mentioned, is there some suggestions? kihwal Thanks again.
          kihwal Kihwal Lee added a comment -

          jlowe and I discussed a bit about the conf issue this morning. Configuration has both performance and memory foot print issue, but coming up with a single generic solution to solve them for all use cases is difficult, if not impossible. That's one of the road blocks many previous improvement attempts have met. For use cases that do not require refreshing, we can have a single mutable instance to load/reload all resources, instead of duplicating for each config instance. Each new conf can have its own "overlay" map internally to keep track of locally set keys/values. For the keys not found in this map, it will look them up in the base instance. The look-ups will get a bit more expensive, but it avoids problem of multiple resource reloads and object duplication. Since this might not work well with refreshable configs, it would be better to make it a new feature (i.e. a new version of ctor) and offer it opt-in basis. I think most client-side code will be able to take advantage of this.

          Related: HADOOP-11223 and HADOOP-9570

          We can start a design/feasibility discussion, if there is enough interest.

          kihwal Kihwal Lee added a comment - jlowe and I discussed a bit about the conf issue this morning. Configuration has both performance and memory foot print issue, but coming up with a single generic solution to solve them for all use cases is difficult, if not impossible. That's one of the road blocks many previous improvement attempts have met. For use cases that do not require refreshing, we can have a single mutable instance to load/reload all resources, instead of duplicating for each config instance. Each new conf can have its own "overlay" map internally to keep track of locally set keys/values. For the keys not found in this map, it will look them up in the base instance. The look-ups will get a bit more expensive, but it avoids problem of multiple resource reloads and object duplication. Since this might not work well with refreshable configs, it would be better to make it a new feature (i.e. a new version of ctor) and offer it opt-in basis. I think most client-side code will be able to take advantage of this. Related: HADOOP-11223 and HADOOP-9570 We can start a design/feasibility discussion, if there is enough interest.
          hexiaoqiao Xiaoqiao He added a comment -

          Thanks kihwal for your detailed comments.
          It is interesting issues (HADOOP-11223 and HADOOP-9570) for resolving duplicated Configuration instances. But I am not sure if these issue are complete solution for huge memory footprint waste of the case mentioned above. Beside HADOOP-11223 and HADOOP-9570, I think it is necessary to maintain incremental change for Configuration, thus Configuration::getDefault() + Incremental Change could form the complete configuration and no unintended conf update propagation, meaning while it could reduce memory footprint. If I am wrong please correct me.
          Thanks again.

          hexiaoqiao Xiaoqiao He added a comment - Thanks kihwal for your detailed comments. It is interesting issues ( HADOOP-11223 and HADOOP-9570 ) for resolving duplicated Configuration instances. But I am not sure if these issue are complete solution for huge memory footprint waste of the case mentioned above. Beside HADOOP-11223 and HADOOP-9570 , I think it is necessary to maintain incremental change for Configuration, thus Configuration::getDefault() + Incremental Change could form the complete configuration and no unintended conf update propagation, meaning while it could reduce memory footprint. If I am wrong please correct me. Thanks again.

          People

            Unassigned Unassigned
            hexiaoqiao Xiaoqiao He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: