Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1213

Implement an Apache Commons VFS Driver for HDFS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • hdfs-client
    • None

    Description

      We have an open source ETL tool (Kettle) which uses VFS for many input/output steps/jobs. We would like to be able to read/write HDFS from Kettle using VFS.

      I haven't been able to find anything out there other than "it would be nice."

      I had some time a few weeks ago to begin writing a VFS driver for HDFS and we (Pentaho) would like to be able to contribute this driver. I believe it supports all the major file/folder operations and I have written unit tests for all of these operations. The code is currently checked into an open Pentaho SVN repository under the Apache 2.0 license. There are some current limitations, such as a lack of authentication (kerberos), which appears to be coming in 0.22.0, however, the driver supports username/password, but I just can't use them yet.

      I will be attaching the code for the driver once the case is created. The project does not modify existing hadoop/hdfs source.

      Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146

      Attachments

        1. HADOOP-HDFS-Apache-VFS.patch
          21 kB
          Michael D'Amour
        2. pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar
          6 kB
          Michael D'Amour
        3. pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz
          3 kB
          Michael D'Amour

        Activity

          People

            Unassigned Unassigned
            mdamour1976 Michael D'Amour
            Votes:
            2 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: