-
Type:
New Feature
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 3.2.0
-
Fix Version/s: 3.3.0
-
Component/s: None
-
Labels:None
One of the dominant workloads for external metadata services is listing of partition directories. This canĀ end up being bottlenecked on RTT time when partition directories contain a small number of files. This is fairly common, since fine-grained partitioning is used for partition pruning by the query engines.
A batched listing API that takes multiple paths amortizes the RTT cost. Initial benchmarks show a 10-20x improvement in metadata loading performance.
- is related to
-
HADOOP-16898 Batch listing of multiple directories to be an unstable interface
-
- Resolved
-
- relates to
-
HDFS-14233 Implement DistributedFileSystem#listStatus(Path[]) by adding a batching listStatus RPC call to NameNode
-
- Open
-
- links to
1.
|
RBF: Supporting batched listing |
|
Open | Unassigned |
2.
|
Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature |
|
Open | Unassigned |