Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12319

S3AFastOutputStream has no ability to apply backpressure

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.7.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Currently, users of S3AFastOutputStream can control memory usage with a few settings: fs.s3a.threads.core,max, which control the number of active uploads (specifically as arguments to a ThreadPoolExecutor), and fs.s3a.max.total.tasks, which controls the size of the feeding queue for the ThreadPoolExecutor.

      However, a user can get an almost guaranteed crash if the throughput of the writing job is higher than the total S3 throughput, because there is never any backpressure or blocking on calls to write.

      If fs.s3a.max.total.tasks is set high (the default is 1000), then write calls will continue to add data to the queue, which can eventually OOM. But if the user tries to set it lower, then writes will fail when the queue is full; the ThreadPoolExecutor will reject the part with java.util.concurrent.RejectedExecutionException.

      Ideally, calls to write should block, not fail when the queue is full, so as to apply backpressure on whatever the writing process is.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                colinmarc Colin Marc
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: