Details
Description
Integration testing of Hadoop with the HADOOP-14040 has shown up that the move to a shaded AWS JAR is slowing all hadoop client code down.
I believe this is due to how we use service discovery to identify FS implementations: the implementation classes themselves are instantiated.
This has known problems today with classloading, but clearly impacts performance too, especially with complex transitive dependencies unique to the loaded class.
Proposed: have lightweight service declaration classes which implement an interface declaring
- schema
- classname of FileSystem impl
- classname of AbstractFS impl
- homepage (for third party code, support, etc)
These are what we register and scan in the FS to look for services.
This will leave the question about what to do for existing filesystems? I think we'll need to retain the old code for external ones, while moving the hadoop modules to the new ones
Attachments
Issue Links
- depends upon
-
HADOOP-14123 Remove misplaced ADL service provider config file for FileSystem
- Resolved
-
HADOOP-14183 Remove service loader config file for wasb fs
- Resolved
-
HADOOP-13606 swift FS to add a service load metadata file
- Resolved
-
HADOOP-14184 Remove service loader config entry for ftp fs
- Resolved
-
HADOOP-14185 Remove service loader config entry for Har fs
- Resolved
- is related to
-
HADOOP-16102 FilterFileSystem does not implement getScheme
- Resolved
-
HADOOP-17402 Add GCS FS impl reference to core-default.xml
- Resolved
- relates to
-
HADOOP-14138 Remove S3A ref from META-INF service discovery, rely on existing core-default entry
- Resolved