[HIVE-15882] HS2 generating high memory pressure with many partitions and concurrent queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: HiveServer2
Labels:
None

Description

I've created a Hive table with 2000 partitions, each backed by two files, with one row in each file. When I execute some number of concurrent queries against this table, e.g. as follows

for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:10000 -n admin -p admin -e "select count(i_f_1) from misha_table;" & done

it results in a big memory spike. With 20 queries I caused an OOM in a HS2 server with -Xmx200m and with 50 queries - in the one with -Xmx500m.

I am attaching the results of jxray (www.jxray.com) analysis of a heap dump that was generated in the 50queries/500m heap scenario. It suggests that there are several opportunities to reduce memory pressure with not very invasive changes to the code:

1. 24.5% of memory is wasted by duplicate strings (see section 6). With String.intern() calls added in the ~10 relevant places in the code, this overhead can be highly reduced.

2. Almost 20% of memory is wasted due to various suboptimally used collections (see section 8). There are many maps and lists that are either empty or have just 1 element. By modifying the code that creates and populates these collections, we may likely save 5-10% of memory.

3. Almost 20% of memory is used by instances of java.util.Properties. It looks like these objects are highly duplicate, since for each Partition each concurrently running query creates its own copy of Partion, PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions) Properties in memory. By interning/deduplicating these objects we may be able to save perhaps 15% of memory.

So overall, I think there is a good chance to reduce HS2 memory consumption in this scenario by ~40%.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hs2-crash-2000p-500m-50q.txt
11/Feb/17 04:20
65 kB
Misha Dmitriev
HIVE-15882.01.patch
14/Feb/17 22:15
33 kB
Misha Dmitriev
HIVE-15882.02.patch
24/Feb/17 23:36
33 kB
Misha Dmitriev
HIVE-15882.03.patch
27/Feb/17 19:44
33 kB
Misha Dmitriev
HIVE-15882.04.patch
28/Feb/17 20:16
34 kB
Misha Dmitriev

Issue Links

is related to

HIVE-16166 HS2 may still waste up to 15% of memory on duplicate strings

Resolved

HIVE-16079 HS2: high memory pressure due to duplicate Properties objects

Closed

Activity

People

Assignee:: Misha Dmitriev

Reporter:: Misha Dmitriev

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 11/Feb/17 04:19

Updated:: 11/Oct/17 13:31

Resolved:: 02/Mar/17 05:29