Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12319

S3AFastOutputStream has no ability to apply backpressure

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 2.7.0
    • None
    • fs/s3
    • None

    Description

      Currently, users of S3AFastOutputStream can control memory usage with a few settings: fs.s3a.threads.core,max, which control the number of active uploads (specifically as arguments to a ThreadPoolExecutor), and fs.s3a.max.total.tasks, which controls the size of the feeding queue for the ThreadPoolExecutor.

      However, a user can get an almost guaranteed crash if the throughput of the writing job is higher than the total S3 throughput, because there is never any backpressure or blocking on calls to write.

      If fs.s3a.max.total.tasks is set high (the default is 1000), then write calls will continue to add data to the queue, which can eventually OOM. But if the user tries to set it lower, then writes will fail when the queue is full; the ThreadPoolExecutor will reject the part with java.util.concurrent.RejectedExecutionException.

      Ideally, calls to write should block, not fail when the queue is full, so as to apply backpressure on whatever the writing process is.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              colinmarc Colin Marc
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: