Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
One of the culprit is the "new URI" call in the following function. We should try to get rid of it.
protected static PartitionDesc getPartitionDescFromPath( Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException { // The format of the keys in pathToPartitionInfo sometimes contains a port // and sometimes doesn't, so we just compare paths. for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo .entrySet()) { try { if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) { return entry.getValue(); } } catch (URISyntaxException e2) { } } throw new IOException("cannot find dir = " + dir.toString() + " in partToPartitionInfo!"); }