Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to not add partitions one by one in HMS. As part of that improvement, following code was introduced:
// fetch all the partitions matching the part spec using the partition iterable // this way the maximum batch size configuration parameter is considered PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, partSpec, conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), 300)); Iterator<Partition> iterator = partitionIterable.iterator(); // Match valid partition path to partitions while (iterator.hasNext()) { Partition partition = iterator.next(); partitionDetailsMap.entrySet().stream() .filter(entry -> entry.getValue().fullSpec.equals(partition.getSpec())) .findAny().ifPresent(entry -> { entry.getValue().partition = partition; entry.getValue().hasOldPartition = true; }); }
The above code fetches all the existing partitions for a table from HMS and compare that dynamic partitions list to decide old and new partitions to be added to HMS (in batches). The call to fetch all partitions has introduced a performance regression for tables with large number of partitions (of the order of 100K).
This is fixed for external tables in https://issues.apache.org/jira/browse/HIVE-25178. However for ACID tables there is an open Jira(HIVE-25187). Until we have an appropriate fix in HIVE-25187, we can apply the following:
Skip fetching all partitions. Instead, in the threadPool which loads each partition individually, call get_partition() to check if the partition already exists in HMS or not.
This will introduce additional getPartition() call for every partition to be loaded dynamically but removes fetching all existing partitions for a table.
I believe this is fine since for tables with small number of existing partitions in HMS - getPartitions() won't add too much overhead but for tables with large number of existing partitions, it will certainly avoid getting all partitions from HMS
Attachments
Issue Links
- is related to
-
HIVE-25187 Reduce number of getPartition calls during loadDynamicPartitions for Managed Tables
- Open
- links to