Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-14776

Skip 'distcp' call when copying data from HDSF to S3

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      Hive uses 'distcp' to copy files in parallel between HDFS encryption zones when the hive.exec.copyfile.maxsize threshold is lower than the file to copy. This 'distcp' is also executed when copying to S3, but it is causing slower copies.

      We should not invoke distcp when copying to blobstore systems.

        Attachments

        1. HIVE-14776.2.patch
          1 kB
          Sergio Peña
        2. HIVE-14776.1.patch
          1 kB
          Sergio Peña

          Activity

            People

            • Assignee:
              spena Sergio Peña
              Reporter:
              spena Sergio Peña
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: