Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20528 hive onprem-s3 replication is slow
  3. HIVE-20517

Creation of staging directory and Move operation is taking time in S3

    XMLWordPrintableJSON

Details

    Description

      Operations like insert and add partition creates a staging directory to generate the files and then move the files created to actual location. In replication flow, the files are first copied to the staging directory and then moved (rename) to the actual table location. In case of S3, move is not an atomic operation. It internally does a copy and delete. So it can not guarantee the consistency required. So it is better to copy the files directly to the actual location. This will help in avoiding the staging directory creation (which takes 1-2 seconds in s3) and move (which takes time proportional to file size).

      Attachments

        1. HIVE-20517.01.patch
          32 kB
          mahesh kumar behera
        2. HIVE-20517.02.patch
          34 kB
          mahesh kumar behera
        3. HIVE-20517.03.patch
          112 kB
          mahesh kumar behera
        4. HIVE-20517.04.patch
          52 kB
          mahesh kumar behera
        5. HIVE-20517.05.patch
          52 kB
          mahesh kumar behera

        Activity

          People

            maheshk114 mahesh kumar behera
            maheshk114 mahesh kumar behera
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m