Hadoop Common
  1. Hadoop Common
  2. HADOOP-10610

Upgrade S3n fs.s3.buffer.dir to support multi directories

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.6.0
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      fs.s3.buffer.dir defines the tmp folder where files will be written to before getting sent to S3. Right now this is limited to a single folder which causes to major issues.

      1. You need a drive with enough space to store all the tmp files at once
      2. You are limited to the IO speeds of a single drive

      This solution will resolve both and has been tested to increase the S3 write speed by 2.5x with 10 mappers on hs1.

      1. HADOOP_10610-2.patch
        2 kB
        Ted Malaska
      2. HADOOP-10610.patch
        2 kB
        Ted Malaska
      3. HDFS-6383.patch
        1 kB
        Ted Malaska

        Activity

        Ted Malaska created issue -
        Ted Malaska made changes -
        Field Original Value New Value
        Attachment HDFS-6383.patch [ 12644622 ]
        Ted Malaska made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron T. Myers made changes -
        Assignee Ted Malaska [ ted.m ]
        Affects Version/s 2.4.0 [ 12326143 ]
        Target Version/s 2.5.0 [ 12326264 ]
        Steve Loughran made changes -
        Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
        Key HDFS-6383 HADOOP-10610
        Affects Version/s 2.4.0 [ 12326144 ]
        Affects Version/s 2.4.0 [ 12326143 ]
        Target Version/s 2.5.0 [ 12326264 ] 2.5.0 [ 12326263 ]
        Steve Loughran made changes -
        Component/s fs/s3 [ 12311814 ]
        Aaron T. Myers made changes -
        Summary Upgrade S3n s3.fs.buffer.dir to suppoer multi directories Upgrade S3n s3.fs.buffer.dir to support multi directories
        Ted Malaska made changes -
        Attachment HADOOP-10610.patch [ 12645414 ]
        Ted Malaska made changes -
        Attachment HADOOP_10610-2.patch [ 12650460 ]
        Karthik Kambatla (Inactive) made changes -
        Target Version/s 2.5.0 [ 12326263 ] 2.6.0 [ 12327179 ]
        Aaron T. Myers made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 2.6.0 [ 12327179 ]
        Resolution Fixed [ 1 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Rohit Agarwal made changes -
        Description s3.fs.buffer.dir defines the tmp folder where files will be written to before getting sent to S3. Right now this is limited to a single folder which causes to major issues.

        1. You need a drive with enough space to store all the tmp files at once
        2. You are limited to the IO speeds of a single drive

        This solution will resolve both and has been tested to increase the S3 write speed by 2.5x with 10 mappers on hs1.

        fs.s3.buffer.dir defines the tmp folder where files will be written to before getting sent to S3. Right now this is limited to a single folder which causes to major issues.

        1. You need a drive with enough space to store all the tmp files at once
        2. You are limited to the IO speeds of a single drive

        This solution will resolve both and has been tested to increase the S3 write speed by 2.5x with 10 mappers on hs1.

        Rohit Agarwal made changes -
        Summary Upgrade S3n s3.fs.buffer.dir to support multi directories Upgrade S3n fs.s3.buffer.dir to support multi directories

          People

          • Assignee:
            Ted Malaska
            Reporter:
            Ted Malaska
          • Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development