Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.4.0, 2.0.0
-
None
Description
First i assume there is no reference and link in a region family's directory.
Without the patch to computeHDFSBlocksDistribution for a region family, there is 1+2*N rpc call, N is hfile numbers, The first rpc call is to DistributedFileSystem#listStatus to get hfiles, for every hfile there is two rpc call DistributedFileSystem#getFileStatus(path) and then DistributedFileSystem#getFileBlockLocations(status, start, length).
With the patch to computeHDFSBlocksDistribution for a region family, there is 2 rpc call, they are DistributedFileSystem#getFileStatus(path) and DistributedFileSystem#listLocatedStatus(final Path p, final PathFilter filter).
So if there is at least one hfile, with the patch, the rpc call will less.