Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
We have an open source ETL tool (Kettle) which uses VFS for many input/output steps/jobs. We would like to be able to read/write HDFS from Kettle using VFS.
I haven't been able to find anything out there other than "it would be nice."
I had some time a few weeks ago to begin writing a VFS driver for HDFS and we (Pentaho) would like to be able to contribute this driver. I believe it supports all the major file/folder operations and I have written unit tests for all of these operations. The code is currently checked into an open Pentaho SVN repository under the Apache 2.0 license. There are some current limitations, such as a lack of authentication (kerberos), which appears to be coming in 0.22.0, however, the driver supports username/password, but I just can't use them yet.
I will be attaching the code for the driver once the case is created. The project does not modify existing hadoop/hdfs source.
Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146