Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18067 Über-jira: S3A Hadoop 3.3.5 features
  3. HADOOP-14132

Filesystem discovery to stop loading implementation classes

Add voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.3
    • None
    • fs, fs/adl, fs/azure, fs/oss, fs/s3, fs/swift
    • None

    Description

      Integration testing of Hadoop with the HADOOP-14040 has shown up that the move to a shaded AWS JAR is slowing all hadoop client code down.

      I believe this is due to how we use service discovery to identify FS implementations: the implementation classes themselves are instantiated.
      This has known problems today with classloading, but clearly impacts performance too, especially with complex transitive dependencies unique to the loaded class.

      Proposed: have lightweight service declaration classes which implement an interface declaring

      1. schema
      2. classname of FileSystem impl
      3. classname of AbstractFS impl
      4. homepage (for third party code, support, etc)

      These are what we register and scan in the FS to look for services.

      This will leave the question about what to do for existing filesystems? I think we'll need to retain the old code for external ones, while moving the hadoop modules to the new ones

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran

            Dates

              Created:
              Updated:

              Slack

                Issue deployment