When importing large scale data to HBase, we always generate the hfiles with other Hadoop cluster, use the Distcp tool to copy the data to the HBase cluster, and bulkload data to HBase table. However, the data locality is rather low which may result in high query latency. After taking a compaction it will recover. Therefore, we can increase the data locality by specifying the favoredNodes in Distcp.
Could I submit a pull request to optimize it?