Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3578

S3: Consider allowing table-sink to stage in HDFS when writing to S3

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: None
    • Component/s: Perf Investigation
    • Labels:

      Description

      If users do not want to skip the staging step on INSERTs to S3, we could allow the table sink to stage the temporary files in HDFS (if available) and make the coordinator move the files to S3 on FinalizeSuccessfulInsert().

      This could improve performance in INSERTs to S3 as writes to HDFS are faster than to S3 currently. Currently, when we do not skip the staging step, the sinks write to a temporary loaction in S3 and the coordinator copies over these files to the final location in S3 (as S3 doesn't support the rename() operation). So this would bring down the number of writes to S3 from 2 to 1 per file.

        Attachments

          Activity

            People

            • Assignee:
              sailesh Sailesh Mukil
              Reporter:
              sailesh Sailesh Mukil
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: