Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18629

Hadoop DistCp supports specifying favoredNodes for data copying

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.4
    • None
    • tools/distcp

    Description

      When importing large scale data to HBase, we always generate the hfiles with other Hadoop cluster, use the Distcp tool to copy the data to the HBase cluster, and bulkload data to HBase table. However, the data locality is rather low which may result in high query latency. After taking a compaction it will recover. Therefore, we can increase the data locality by specifying the favoredNodes in Distcp.

      Could I submit a pull request to optimize it?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhuyaogai zhuyaogai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: