Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13377 AliyunOSS: improvements for stabilization and optimization
  3. HADOOP-14999

AliyunOSS: provide one asynchronous multi-part based uploading mechanism

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-beta1
    • 2.10.0, 2.9.1, 3.2.0, 3.1.1, 3.0.3
    • fs/oss
    • None

    Description

      This mechanism is designed for uploading file in parallel and asynchronously:

      • improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous.
      • avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space.

      This patch reuse SemaphoredDelegatingExecutor as executor service and depends on HADOOP-15039.

      Attached asynchronous_file_uploading.pdf illustrated the difference between previous AliyunOSSOutputStream and AliyunOSSBlockOutputStream, i.e. this asynchronous multi-part based uploading mechanism.

      1. AliyunOSSOutputStream: we need to output the whole result to local disk before we can upload it to OSS. This will poses two problems:

      • if the output file is too large, it will run out of the local disk.
      • if the output file is too large, task will wait long time to upload result to OSS before finish, wasting much compute resource.

      2. AliyunOSSBlockOutputStream: we cut the task output into small blocks, i.e. some small local file, and each block will be packaged into a uploading task. These tasks will be submitted into SemaphoredDelegatingExecutor. SemaphoredDelegatingExecutor will upload this blocks in parallel, this will improve performance greatly.

      3. Each task will retry 3 times to upload block to Aliyun OSS. If one of those tasks failed, the whole file uploading will failed, and we will abort current uploading.

      Attachments

        1. HADOOP-14999-branch-2.002.patch
          39 kB
          Genmao Yu
        2. HADOOP-14999-branch-2.001.patch
          39 kB
          Genmao Yu
        3. HADOOP-14999.011.patch
          41 kB
          Genmao Yu
        4. HADOOP-14999.010.patch
          35 kB
          Genmao Yu
        5. HADOOP-14999.009.patch
          35 kB
          Genmao Yu
        6. diff-between-patch7-and-patch8.txt
          16 kB
          Genmao Yu
        7. HADOOP-14999.008.patch
          35 kB
          Genmao Yu
        8. HADOOP-14999.007.patch
          29 kB
          Genmao Yu
        9. HADOOP-14999.006.patch
          28 kB
          Genmao Yu
        10. HADOOP-14999.005.patch
          30 kB
          Genmao Yu
        11. HADOOP-14999.004.patch
          29 kB
          Genmao Yu
        12. HADOOP-14999.003.patch
          29 kB
          Genmao Yu
        13. asynchronous_file_uploading.pdf
          1.46 MB
          Genmao Yu
        14. HADOOP-14999.002.patch
          27 kB
          Genmao Yu
        15. HADOOP-14999.001.patch
          27 kB
          Genmao Yu

        Issue Links

          Activity

            People

              uncleGen Genmao Yu
              uncleGen Genmao Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: