[HIVE-20760] Reducing memory overhead due to multiple HiveConfs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Configuration
Labels:
None

Description

The issue is that every Hive task has to load its own version of HiveConf. When running with a large number of cores per executor (HoS), there is a significant (~10%) amount of memory wasted due to this duplication.

I looked into the problem and found a way to reduce the overhead caused by the multiple HiveConf objects.

I've created an implementation of Properties, somewhat similar to CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve this problem, because it drops the interned Properties right after we add a new property.

So my implementation looks like this:

When we create a new HiveConf from an existing one (copy constructor), we change the properties object stored by HiveConf to the new Properties implementation (HiveConfProperties). We have 2 possible way to do this. Either we change the visibility of the properties field in the ancestor class (Configuration which comes from hadoop) to protected, or a simpler way is to just change the type using reflection.
HiveConfProperties instantly intern the given properties. After this, every time we add a new property to HiveConf, we add it to an additional Properties object. This way if we create multiple HiveConf with the same base properties, they will use the same Properties object but each session/task can add its own unique properties.
Getting a property from HiveConfProperties would look like this: (I stored the non-interned properties in super class)

String property=super.getProperty(key);
if (property == null) property= interned.getProperty(key);
return property;

Running some tests showed that the interning works (with 50 connections to HiveServer2, heapdumps created after sessions are created for queries):

Overall memory:

original: 34,599K interned: 20,582K

Retained memory of HiveConfs:

original: 16,366K interned: 10,804K

I attach the JXray reports about the heapdumps.

What are your thoughts about this solution?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-20760.13.patch
04/Jan/19 18:38
31 kB
Barnabas Maidics
HIVE-20760.12.patch
18/Dec/18 09:33
31 kB
Barnabas Maidics
HIVE-20760.11.patch
14/Dec/18 11:11
29 kB
Barnabas Maidics
HIVE-20760.10.patch
11/Dec/18 11:05
29 kB
Barnabas Maidics
HIVE-20760.9.patch
10/Dec/18 15:54
29 kB
Barnabas Maidics
HIVE-20760.8.patch
26/Nov/18 09:07
27 kB
Barnabas Maidics
HIVE-20760.7.patch
19/Nov/18 15:50
27 kB
Barnabas Maidics
HIVE-20760.6.patch
19/Nov/18 09:55
28 kB
Barnabas Maidics
HIVE-20760.5.patch
13/Nov/18 10:38
28 kB
Barnabas Maidics
HIVE-20760.4.patch
06/Nov/18 11:18
23 kB
Barnabas Maidics
HIVE-20760-3.patch
30/Oct/18 15:30
23 kB
Barnabas Maidics
HIVE-20760-2.patch
25/Oct/18 08:23
23 kB
Barnabas Maidics
HIVE-20760-1.patch
19/Oct/18 13:43
23 kB
Barnabas Maidics
HIVE-20760.patch
17/Oct/18 14:22
19 kB
Barnabas Maidics
hiveconf_original.html
17/Oct/18 14:11
2.56 MB
Barnabas Maidics
hiveconf_interned.html
17/Oct/18 14:11
2.91 MB
Barnabas Maidics

Activity

People

Assignee:: Barnabas Maidics

Reporter:: Barnabas Maidics

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Oct/18 14:13

Updated:: 04/Jan/19 20:46