Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3578

S3: Consider allowing table-sink to stage in HDFS when writing to S3

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • Impala 2.6.0
    • None
    • Perf Investigation

    Description

      If users do not want to skip the staging step on INSERTs to S3, we could allow the table sink to stage the temporary files in HDFS (if available) and make the coordinator move the files to S3 on FinalizeSuccessfulInsert().

      This could improve performance in INSERTs to S3 as writes to HDFS are faster than to S3 currently. Currently, when we do not skip the staging step, the sinks write to a temporary loaction in S3 and the coordinator copies over these files to the final location in S3 (as S3 doesn't support the rename() operation). So this would bring down the number of writes to S3 from 2 to 1 per file.

      Attachments

        Activity

          People

            sailesh Sailesh Mukil
            sailesh Sailesh Mukil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: