Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14841 Replication - Phase 2
  3. HIVE-16901

Distcp optimization - One distcp per ReplCopyTask

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 3.0.0
    • Hive, repl
    • Reviewed

    Description

      Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked for each and every file. Instead, need to pass the list of source files to be copied to distcp tool which basically copies the files in parallel and hence gets lot of performance gain.

      Attachments

        1. HIVE-16901.01.patch
          19 kB
          Sankar Hariappan
        2. HIVE-16901.02.patch
          20 kB
          Sankar Hariappan
        3. HIVE-16901.03.patch
          20 kB
          Sankar Hariappan
        4. HIVE-16901.04.patch
          20 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              sankarh Sankar Hariappan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: