Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0
-
None
-
None
Description
One of the dominant workloads for external metadata services is listing of partition directories. This canĀ end up being bottlenecked on RTT time when partition directories contain a small number of files. This is fairly common, since fine-grained partitioning is used for partition pruning by the query engines.
A batched listing API that takes multiple paths amortizes the RTT cost. Initial benchmarks show a 10-20x improvement in metadata loading performance.
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-16898 Batch listing of multiple directories to be an unstable interface
-
- Resolved
-
- relates to
-
HDFS-14233 Implement DistributedFileSystem#listStatus(Path[]) by adding a batching listStatus RPC call to NameNode
-
- Open
-
- links to
1.
|
RBF: Supporting batched listing |
|
Open | Unassigned |
2.
|
Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature |
|
Patch Available | Qi Zhu |