Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-14776

Skip 'distcp' call when copying data from HDSF to S3

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      Hive uses 'distcp' to copy files in parallel between HDFS encryption zones when the hive.exec.copyfile.maxsize threshold is lower than the file to copy. This 'distcp' is also executed when copying to S3, but it is causing slower copies.

      We should not invoke distcp when copying to blobstore systems.

        Attachments

        1. HIVE-14776.1.patch
          1 kB
          Sergio Peña
        2. HIVE-14776.2.patch
          1 kB
          Sergio Peña

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              spena Sergio Peña Assign to me
              Reporter:
              spena Sergio Peña

              Dates

              • Created:
                Updated:

                Issue deployment