Description
Current implementation of FTP and SFTP filesystems have severe limitations and performance issues when dealing with high number of files. Mine patch solve those issues and integrate both filesystems such a way that most of the core functionality is common for both and therefore simplifying the maintainability.
The core features:
- Support for HTTP/SOCKS proxies
- Support for passive FTP
- Support for explicit FTPS (SSL/TLS)
- Support of connection pooling - new connection is not created for every single command but reused from the pool.
For huge number of files it shows order of magnitude performance improvement over not pooled connections. - Caching of directory trees. For ftp you always need to list whole directory whenever you ask information about particular file.
Again for huge number of files it shows order of magnitude performance improvement over not cached connections. - Support of keep alive (NOOP) messages to avoid connection drops
- Support for Unix style or regexp wildcard glob - useful for listing a particular files across whole directory tree
- Support for reestablishing broken ftp data transfers - can happen surprisingly often
- Support for sftp private keys (including pass phrase)
- Support for keeping passwords, private keys and pass phrase in the jceks key stores
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-5732 Add SFTP FileSystem
- Resolved
- relates to
-
HADOOP-13759 Split SFTP FileSystem into its own artifact
- Open