Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13574

Unnecessary file existence check causes problems with S3

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:

      Description

      We recently got the following exception on production:

      java.io.FileNotFoundException: Key 'xxx/_temporary/0/_temporary/attempt_201609010631_0000_m_001128_1128/part-01128' does not exist in S3
              at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:234)
              at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.copy(Jets3tNativeFileSystemStore.java:201)
              at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:497)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
              at org.apache.hadoop.fs.s3native.$Proxy13.copy(Unknown Source)
              at org.apache.hadoop.fs.s3native.NativeS3FileSystem.rename(NativeS3FileSystem.java:659)
              at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:435)
              at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
              at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:291)
              at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:98)
              at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:124)
              at org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:107)
              at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1204)
              at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1183)
      

      FileOutputCommitter.commitTask() does check that the file exists before trying to rename it, but due to S3's relaxed consistency guarantees the following fs.rename(taskAttemptPath, committedTaskPath) still fails.

      Here's an excerpt from the Amazon S3 documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html):

      Amazon S3 Data Consistency Model

      Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

      The problematic S3 object existence check, that causes S3 to fallback to eventual consistency, is in NativeS3FileSystem.create():

          if (exists(f) && !overwrite) {
            throw new IOException("File already exists:"+f);
          }
      

      If the "overwrite" parameter is set to "true" (as in our case), calling exists(f) is unnecessary and only "upsets" S3.

      The proposed fix is to switch the order of the predicates:

          if (!overwrite && exists(f)) {
            throw new IOException("File already exists:"+f);
          }
      

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          This is an s3 list inconsistency surfacing on s3n job commits. It will be fixed in s3a with a specific committer for s3

          Show
          stevel@apache.org Steve Loughran added a comment - This is an s3 list inconsistency surfacing on s3n job commits. It will be fixed in s3a with a specific committer for s3
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I see the point of this, and I can also see that this would be an optimisation as you'd skip the overhead of 1-3 HTTP GET requests,

          But...we've effectively frozen all dev of s3n as it is stable enough that things work, and it gives us a fallback while s3a stabilises & takes on performance improvements.

          Looking at the S3A code, the same-ish problem exists. S3A adds the fix for HADOOP-13188, not only doing the GET but failing if the destination path is a directory. That is, even if overwrite==false, you want to make sure that you aren't unintentionally creating a file over a dir, so unintentionally losing all access to the data underneath (it'd break the listing code, see). If I were to go near S3n, I'd probably focus on that bug, rather than dealing with the intermittent caching of the 404 that S3 can do.

          Your particular problem is really due to the fact that the output committer is doing a rename(), which from a performance perspective is the wrong thing to ask an object store what to do. In HADOOP-13345, S3Guard, we're planning to deal with this —anything you can do to help there would be welcome.

          Show
          stevel@apache.org Steve Loughran added a comment - I see the point of this, and I can also see that this would be an optimisation as you'd skip the overhead of 1-3 HTTP GET requests, But...we've effectively frozen all dev of s3n as it is stable enough that things work, and it gives us a fallback while s3a stabilises & takes on performance improvements. Looking at the S3A code, the same-ish problem exists. S3A adds the fix for HADOOP-13188 , not only doing the GET but failing if the destination path is a directory. That is, even if overwrite==false, you want to make sure that you aren't unintentionally creating a file over a dir, so unintentionally losing all access to the data underneath (it'd break the listing code, see). If I were to go near S3n, I'd probably focus on that bug, rather than dealing with the intermittent caching of the 404 that S3 can do. Your particular problem is really due to the fact that the output committer is doing a rename(), which from a performance perspective is the wrong thing to ask an object store what to do. In HADOOP-13345 , S3Guard, we're planning to deal with this —anything you can do to help there would be welcome.

            People

            • Assignee:
              Unassigned
              Reporter:
              azolotko Alex Zolotko
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development