Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11940

"INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 2.0.0
    • None
    • None

    Description

      When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging directory and copy them ONE BY ONE to target directory.

      When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation which will be instant.

      This happens with files that are not encrypted.

      Attachments

        1. HIVE-11940.1.patch
          1 kB
          Sergio Peña
        2. HIVE-11940.2.patch
          2 kB
          Sergio Peña

        Activity

          People

            spena Sergio Peña
            spena Sergio Peña
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: