[HIVE-19847] Create Separate getInputSummary Service - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0, 4.0.0
Fix Version/s: None
Component/s: HiveServer2
Labels:
None

Description

The Hive org.apache.hadoop.hive.ql.exec.Utilities.java file has taken on a life of its own. We should consider separating out the various components into their own classes. For this ticket, I propose separating out the getInputSummary functionality into its own class.

There are several issues with the current implementation:

It is synchronized. Only one query can get file input summary at a time. For a query which deals with a large data set with a large number of files, this can block other queries for a long period of time. This is especially painful when most queries use a small data set, but a large data set is submitted on occasion.
For each query, time is spend setting up and tearing down a ThreadPool
It uses deprecated code

I propose breaking it out into its own class and creating a single thread pool that all queries pull from. In this way, the bottle neck will be one the number of available threads, not on a single query and if a big query is running and a small query is also submitted, the smaller query will be able to proceed.

In regards to setup/teardown... if a query uses 15 threads to perform this summary action, then finishes, it will tear down the threads, the next query may immediate create 15 new threads for processing. With a single pool, those threads are never performing tear down and setup.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-19847.1.patch
10/Jun/18 18:26
54 kB
David Mollitor
HIVE-19847.2.patch
11/Jun/18 17:28
56 kB
David Mollitor
HIVE-19847.3.patch
11/Sep/18 02:49
50 kB
David Mollitor
HIVE-19847.4.patch
11/Sep/18 13:50
50 kB
David Mollitor
HIVE-19847.5.patch
12/Sep/18 11:35
50 kB
David Mollitor
HIVE-19847.6.patch
13/Sep/18 17:12
50 kB
David Mollitor

Issue Links

relates to

HIVE-21071 Improve getInputSummary

Closed

supercedes

HIVE-20395 Parallelize files move in the ql.metadata.Hive#replaceFiles

Resolved

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Jun/18 18:26

Updated:: 27/Dec/18 05:36