Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.5.0
-
None
Description
While running compute stats on a table with +500K partitions I found that the catalog is spending a lot of time in toLowerCase(), looks like searching through all the partitions values is un-optimal.
The code below should use a HashSet.
// Search through all the partitions and check if their partition key values // match the values being searched for. for (HdfsPartition partition: partitionMap_.values()) { if (partition.isDefaultPartition()) continue; List<LiteralExpr> partitionValues = partition.getPartitionValues(); Preconditions.checkState(partitionValues.size() == targetValues.size()); boolean matchFound = true; for (int i = 0; i < targetValues.size(); ++i) { String value; if (partitionValues.get(i) instanceof NullLiteral) { value = getNullPartitionKeyValue(); } else { value = partitionValues.get(i).getStringValue(); Preconditions.checkNotNull(value); // See IMPALA-252: we deliberately map empty strings on to // NULL when they're in partition columns. This is for // backwards compatibility with Hive, and is clearly broken. if (value.isEmpty()) value = getNullPartitionKeyValue(); } if (!targetValues.get(i).equals(value.toLowerCase())) { matchFound = false; break; } } if (matchFound) { return partition; } }