[HIVE-16014] HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: None
Labels:
None

Description

HiveMetastoreChecker uses hive.mv.files.thread configuration value for determining the pool size as below :

private void checkPartitionDirs(Path basePath, Set<Path> allDirs, int maxDepth) throws IOException, HiveException {
    ConcurrentLinkedQueue<Path> basePaths = new ConcurrentLinkedQueue<>();
    basePaths.add(basePath);
    Set<Path> dirSet = Collections.newSetFromMap(new ConcurrentHashMap<Path, Boolean>());
    // Here we just reuse the THREAD_COUNT configuration for
    // HIVE_MOVE_FILES_THREAD_COUNT
    int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 15);

    // Check if too low config is provided for move files. 2x CPU is reasonable max count.
    poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
        Runtime.getRuntime().availableProcessors() * 2);

msck is commonly used to add the missing partitions for the table from the Filesystem. In such a case different pool sizes for HMSHandler and HiveMetastoreChecker can affect the performance. Eg. If hive.metastore.fshandler.threads is set to a lower value like 15 and hive.mv.files.thread is much higher like 100 or vice versa the smaller pool will become the bottleneck. If would be good to use hive.metastore.fshandler.threads to size the pool for HiveMetastoreChecker since the number missing partitions and number of partitions to be added will most likely be the same. In such a case the performance of the query will be optimum when both the pool sizes are same.

Since it is possible to tune both the configs individually it will be very likely that they may be different. But since there is a strong co-relation between amount of work done by HiveMetastoreChecker and HiveMetastore.add_partitions call it might be a good idea to use hive.metastore.fshandler.threads for pool size instead of hive.mv.files.thread

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16014.01.patch
23/Feb/17 18:27
1 kB
Vihang Karajgaonkar
HIVE-16014.02.patch
23/Feb/17 20:22
2 kB
Vihang Karajgaonkar
HIVE-16014.03.patch
28/Feb/17 18:35
2 kB
Vihang Karajgaonkar

Issue Links

is related to

HIVE-16090 Addendum to HIVE-16014

Resolved

Activity

People

Assignee:: Vihang Karajgaonkar

Reporter:: Vihang Karajgaonkar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Feb/17 22:53

Updated:: 21/Jul/17 18:46

Resolved:: 01/Mar/17 22:11