Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14444

New implementation of ftp and sftp filesystems

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.8.0
    • None
    • fs
    • None
    • Hide
      See README.md file supplied.
      Default file systems for ftp/sftp schemas can probably stay and user can choose to use new implementation by specifying -Dfs.<schema>.impl flag
      Show
      See README.md file supplied. Default file systems for ftp/sftp schemas can probably stay and user can choose to use new implementation by specifying -Dfs.<schema>.impl flag

    Description

      Current implementation of FTP and SFTP filesystems have severe limitations and performance issues when dealing with high number of files. Mine patch solve those issues and integrate both filesystems such a way that most of the core functionality is common for both and therefore simplifying the maintainability.

      The core features:

      • Support for HTTP/SOCKS proxies
      • Support for passive FTP
      • Support for explicit FTPS (SSL/TLS)
      • Support of connection pooling - new connection is not created for every single command but reused from the pool.
        For huge number of files it shows order of magnitude performance improvement over not pooled connections.
      • Caching of directory trees. For ftp you always need to list whole directory whenever you ask information about particular file.
        Again for huge number of files it shows order of magnitude performance improvement over not cached connections.
      • Support of keep alive (NOOP) messages to avoid connection drops
      • Support for Unix style or regexp wildcard glob - useful for listing a particular files across whole directory tree
      • Support for reestablishing broken ftp data transfers - can happen surprisingly often
      • Support for sftp private keys (including pass phrase)
      • Support for keeping passwords, private keys and pass phrase in the jceks key stores

      Attachments

        1. HADOOP-14444.patch
          204 kB
          Lukas Waldmann
        2. HADOOP-14444.2.patch
          263 kB
          Lukas Waldmann
        3. HADOOP-14444.3.patch
          265 kB
          Lukas Waldmann
        4. HADOOP-14444.4.patch
          265 kB
          Lukas Waldmann
        5. HADOOP-14444.5.patch
          268 kB
          Lukas Waldmann
        6. HADOOP-14444.6.patch
          254 kB
          Lukas Waldmann
        7. HADOOP-14444.7.patch
          271 kB
          Lukas Waldmann
        8. HADOOP-14444.8.patch
          271 kB
          Lukas Waldmann
        9. HADOOP-14444.9.patch
          290 kB
          Lukas Waldmann
        10. HADOOP-14444.10.patch
          290 kB
          Lukas Waldmann
        11. HADOOP-14444.11.patch
          293 kB
          Lukas Waldmann
        12. HADOOP-14444.12.patch
          292 kB
          Lukas Waldmann
        13. HADOOP-14444.13.patch
          293 kB
          Lukas Waldmann
        14. HADOOP-14444.14.patch
          319 kB
          Lukas Waldmann
        15. HADOOP-14444.15.patch
          321 kB
          Lukas Waldmann
        16. HADOOP-14444.16.patch
          318 kB
          Lukas Waldmann
        17. HADOOP-14444.17.patch
          322 kB
          Lukas Waldmann
        18. HADOOP-14444.18.patch
          324 kB
          Lukas Waldmann
        19. HADOOP-14444.18.patch
          324 kB
          Lukas Waldmann
        20. HADOOP-14444.19.patch
          324 kB
          Lukas Waldmann

        Issue Links

          Activity

            People

              luky Lukas Waldmann
              luky Lukas Waldmann
              Votes:
              1 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated: