Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-14776

Skip 'distcp' call when copying data from HDSF to S3

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Hive
    • None

    Description

      Hive uses 'distcp' to copy files in parallel between HDFS encryption zones when the hive.exec.copyfile.maxsize threshold is lower than the file to copy. This 'distcp' is also executed when copying to S3, but it is causing slower copies.

      We should not invoke distcp when copying to blobstore systems.

      Attachments

        1. HIVE-14776.2.patch
          1 kB
          Sergio Peña
        2. HIVE-14776.1.patch
          1 kB
          Sergio Peña

        Activity

          People

            spena Sergio Peña
            spena Sergio Peña
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: