Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14841 Replication - Phase 2
  3. HIVE-16901

Distcp optimization - One distcp per ReplCopyTask

Log workAgile BoardRank to TopRank to BottomVotersStop watchingWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 3.0.0
    • Hive, repl
    • Reviewed

    Description

      Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked for each and every file. Instead, need to pass the list of source files to be copied to distcp tool which basically copies the files in parallel and hence gets lot of performance gain.

      Attachments

        1. HIVE-16901.04.patch
          20 kB
          Sankar Hariappan
        2. HIVE-16901.03.patch
          20 kB
          Sankar Hariappan
        3. HIVE-16901.02.patch
          20 kB
          Sankar Hariappan
        4. HIVE-16901.01.patch
          19 kB
          Sankar Hariappan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sankarh Sankar Hariappan Assign to me
            sankarh Sankar Hariappan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment