Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11940

"INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging directory and copy them ONE BY ONE to target directory.

      When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation which will be instant.

      This happens with files that are not encrypted.

        Attachments

        1. HIVE-11940.1.patch
          1 kB
          Sergio Peña
        2. HIVE-11940.2.patch
          2 kB
          Sergio Peña

          Activity

            People

            • Assignee:
              spena Sergio Peña
              Reporter:
              spena Sergio Peña
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: