[HDFS-13234] Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: fs, ha, hdfs-client
Labels:
None

Description

The memory footprint of #DFSClient is very considerable in some special scenario since there are many #Configuration instances and occupy much memory resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies over 600MB we meet under HDFS Federation an HA with QJM and there are dozens of NameNodes). I think some new Configuration instance is not necessary. Such as #ConfiguredFailoverProxyProvider initialization.

  public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
      Class<T> xface, HAProxyFactory<T> factory) {
    this.xface = xface;
    this.conf = new Configuration(conf);
    ......
  }

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-13234.001.patch
07/Mar/18 10:03
1.0 kB
Xiaoqiao He

Activity

Ascending order - Click to sort in descending order

Xiaoqiao He added a comment - 07/Mar/18 10:05

upload patch v1 for trunk and pending jenkins.

Xiaoqiao He added a comment - 07/Mar/18 10:05 upload patch v1 for trunk and pending jenkins.

Íñigo Goiri added a comment - 07/Mar/18 17:27

kihwal, you had some good points on ~~HDFS-13195~~ to a somewhat related topic.
Do you mind chiming in here?

Íñigo Goiri added a comment - 07/Mar/18 17:27 kihwal , you had some good points on HDFS-13195 to a somewhat related topic. Do you mind chiming in here?

Kihwal Lee added a comment - 07/Mar/18 20:23

Configuration occupies over 600MB

How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Sometimes a conf can get embedded in another conf. Avoiding unnecessarily duplicated confs is a good thing, but looking into what is causing the bloat and fixing that will also be important.

Kihwal Lee added a comment - 07/Mar/18 20:23 Configuration occupies over 600MB How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Sometimes a conf can get embedded in another conf. Avoiding unnecessarily duplicated confs is a good thing, but looking into what is causing the bloat and fixing that will also be important.

Kihwal Lee added a comment - 07/Mar/18 21:50

New conf objects are created to prevent unintended conf update propagation. If we have an overlay config feature, we could achieve the same thing without duplicating the entire conf object. Configuration has something overlay-like, but I was told it does not work the way we want.

Kihwal Lee added a comment - 07/Mar/18 21:50 New conf objects are created to prevent unintended conf update propagation. If we have an overlay config feature, we could achieve the same thing without duplicating the entire conf object. Configuration has something overlay-like, but I was told it does not work the way we want.

Xiaoqiao He added a comment - 08/Mar/18 04:37

Thanks kihwal,elgoiri for your comments.

How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs.

Actually this is yarn logs upload service, and the size of single Configuration instance which located at NodeManager is about 120KB, but it is bloated to 600MB over all Configuration instances since two factors:
a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), and it create ConfiguredFailoverProxyProvider instance for each name service at client, while num of Configuration instances will *2;
b. there are 150 single threads at most to execute upload yarn logs to HDFS;
so, in the extreme case, memory footprint of Configuration instances will occupy ~20 * 2 * 150 * 120KB;

New conf objects are created to prevent unintended conf update propagation.

it is true to prevent unintended conf update propagation, I think there are other ways to avoid clone the whole conf for only two parameters of ConfiguredFailoverProxyProvider and IPFailoverProxyProvider and waste huge memory resource probably as you mentioned, is there some suggestions? kihwal

Thanks again.

Xiaoqiao He added a comment - 08/Mar/18 04:37 Thanks kihwal , elgoiri for your comments. How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Actually this is yarn logs upload service, and the size of single Configuration instance which located at NodeManager is about 120KB, but it is bloated to 600MB over all Configuration instances since two factors: a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), and it create ConfiguredFailoverProxyProvider instance for each name service at client, while num of Configuration instances will *2; b. there are 150 single threads at most to execute upload yarn logs to HDFS; so, in the extreme case, memory footprint of Configuration instances will occupy ~20 * 2 * 150 * 120KB; New conf objects are created to prevent unintended conf update propagation. it is true to prevent unintended conf update propagation, I think there are other ways to avoid clone the whole conf for only two parameters of ConfiguredFailoverProxyProvider and IPFailoverProxyProvider and waste huge memory resource probably as you mentioned, is there some suggestions? kihwal Thanks again.

Kihwal Lee added a comment - 08/Mar/18 16:58

jlowe and I discussed a bit about the conf issue this morning. Configuration has both performance and memory foot print issue, but coming up with a single generic solution to solve them for all use cases is difficult, if not impossible. That's one of the road blocks many previous improvement attempts have met. For use cases that do not require refreshing, we can have a single mutable instance to load/reload all resources, instead of duplicating for each config instance. Each new conf can have its own "overlay" map internally to keep track of locally set keys/values. For the keys not found in this map, it will look them up in the base instance. The look-ups will get a bit more expensive, but it avoids problem of multiple resource reloads and object duplication. Since this might not work well with refreshable configs, it would be better to make it a new feature (i.e. a new version of ctor) and offer it opt-in basis. I think most client-side code will be able to take advantage of this.

Related: HADOOP-11223 and HADOOP-9570

We can start a design/feasibility discussion, if there is enough interest.

Kihwal Lee added a comment - 08/Mar/18 16:58 jlowe and I discussed a bit about the conf issue this morning. Configuration has both performance and memory foot print issue, but coming up with a single generic solution to solve them for all use cases is difficult, if not impossible. That's one of the road blocks many previous improvement attempts have met. For use cases that do not require refreshing, we can have a single mutable instance to load/reload all resources, instead of duplicating for each config instance. Each new conf can have its own "overlay" map internally to keep track of locally set keys/values. For the keys not found in this map, it will look them up in the base instance. The look-ups will get a bit more expensive, but it avoids problem of multiple resource reloads and object duplication. Since this might not work well with refreshable configs, it would be better to make it a new feature (i.e. a new version of ctor) and offer it opt-in basis. I think most client-side code will be able to take advantage of this. Related: HADOOP-11223 and HADOOP-9570 We can start a design/feasibility discussion, if there is enough interest.

Xiaoqiao He added a comment - 12/Mar/18 03:21

Thanks kihwal for your detailed comments.
It is interesting issues (HADOOP-11223 and HADOOP-9570) for resolving duplicated Configuration instances. But I am not sure if these issue are complete solution for huge memory footprint waste of the case mentioned above. Beside HADOOP-11223 and HADOOP-9570, I think it is necessary to maintain incremental change for Configuration, thus Configuration::getDefault() + Incremental Change could form the complete configuration and no unintended conf update propagation, meaning while it could reduce memory footprint. If I am wrong please correct me.
Thanks again.

Xiaoqiao He added a comment - 12/Mar/18 03:21 Thanks kihwal for your detailed comments. It is interesting issues ( HADOOP-11223 and HADOOP-9570 ) for resolving duplicated Configuration instances. But I am not sure if these issue are complete solution for huge memory footprint waste of the case mentioned above. Beside HADOOP-11223 and HADOOP-9570 , I think it is necessary to maintain incremental change for Configuration, thus Configuration::getDefault() + Incremental Change could form the complete configuration and no unintended conf update propagation, meaning while it could reduce memory footprint. If I am wrong please correct me. Thanks again.

People

Assignee:: Unassigned

Reporter:: Xiaoqiao He

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Mar/18 04:43

Updated:: 12/Mar/18 03:21

Hadoop HDFS

Details

Description

Attachments

Attachments

Activity

People

Dates