Details
-
Improvement
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
1.4.0
-
None
-
None
Description
Even if this step is called computeHDFSBlocksDistribution, this is executed no matter the file system of the snapshot. For example, we have observed an important slowness when we have a snapshot in s3 (~26k regions, 5column families, 2 files per column family) the getsplits time is ~40min due to the calls in s3 for listing the files to get the best locations.
Parallelizing this operation can reduce the overall setup time. The thread pool should be configurable and a good choice could be "hbase.snapshot.thread.pool.max" that is also used in RestoreSnapshotHelper.