Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9112

Consider removing hdfsExists calls when writing files to S3

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: Not Applicable
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-13

      Description

      There are a few places in the backend where we call hdfsExists before writing out a file. This can cause issues when writing data to S3, because S3 can cache 404 Not Found errors. This issue manifests itself with errors such as:

      ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d8800000000/.3943ae7ccf00711e-59606d880000000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq TO s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq) failed, error was: s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d8800000000/.3943ae7ccf00711e-59606d880000000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d880000000b_1994902389_data.0.parq
      Error(5): Input/output error
      Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: [])

      HADOOP-13884HADOOP-13950HADOOP-16490 - the HDFS clients allow specifying an "overwrite" option when creating a file; this can avoid doing any HEAD requests when opening a file.

        Attachments

          Activity

            People

            • Assignee:
              stakiar Sahil Takiar
              Reporter:
              stakiar Sahil Takiar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: