Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0
-
None
Description
One of the dominant workloads for external metadata services is listing of partition directories. This canĀ end up being bottlenecked on RTT time when partition directories contain a small number of files. This is fairly common, since fine-grained partitioning is used for partition pruning by the query engines.
A batched listing API that takes multiple paths amortizes the RTT cost. Initial benchmarks show a 10-20x improvement in metadata loading performance.
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-16898 Batch listing of multiple directories to be an unstable interface
- Resolved
- relates to
-
HDFS-14233 Implement DistributedFileSystem#listStatus(Path[]) by adding a batching listStatus RPC call to NameNode
- Open
- links to
1.
|
RBF: Supporting batched listing | Open | Unassigned | |
2.
|
Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature | Patch Available | Qi Zhu |