Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12090 Handling writes from HDFS to Provided storages
  3. HDFS-13934

Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader



    • Type: Sub-task
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.3.1
    • Component/s: fs, fs/s3, hdfs
    • Labels:
    • Target Version/s:


      the Multipart Uploaders are created via service loaders. This is troublesome

      1. HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process forces the transient loading of dependencies. If a dependent class cannot be loaded (e.g aws-sdk is not on the classpath), that service won't load. Without error handling round the load process, this stops any uploader from loading. Even with that error handling, the performance hit of that load, especially with reshaded dependencies, hurts performance (HADOOP-13138).
      2. it makes wrapping the the load with any filter impossible, stops transitive binding through viewFS, mocking, etc.
      3. It complicates security in a kerberized world. If you have an FS instance of user A, then you should be able to create an MPU instance with that user's permissions. currently, if a service were to try to create one, you'd be looking at doAs() games around the service loading, and a more complex bind process.


      1. remove the service loader mech entirely
      2. add to FS & FC as createMultipartUploader(path) call, which will create one bound to the current FS, with its permissions, DTs, etc.


          Issue Links



              • Assignee:
                stevel@apache.org Steve Loughran
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                10 Start watching this issue


                • Created: