Hadoop Common
  1. Hadoop Common
  2. HADOOP-1963

Code contribution of Kosmos Filesystem implementation of Hadoop Filesystem interface

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: fs
    • Labels:
      None

      Description

      Kosmos Filesystem (KFS) is an open source implementation targeted towards applications that are required to process large amounts of data. KFS has been integrated with Hadoop using Hadoop's filesystem interfaces. This issue is filed with the intent of getting our code, namely, fs/kfs classes, to be included in the next Hadoop release.

      1. kfs-hadoop-jar.tar.gz
        8 kB
        Sriram Rao
      2. kfs-hadoop-code.tar.gz
        11 kB
        Sriram Rao
      3. kfs-0.1.jar
        9 kB
        Doug Cutting
      4. HADOOP-1963-2.patch
        50 kB
        Doug Cutting
      5. HADOOP-1963.patch
        50 kB
        Doug Cutting

        Activity

        Hide
        Sriram Rao added a comment -

        The patch is .tar.gz that includes necessary java classes which needs to be added to:

        • src/java/org/apache/hadoop/fs/kfs
        • a kfs-0.1.jar file that needs to be in lib directory
        • docs/README.html that describes how to use KFS with Hadoop
        Show
        Sriram Rao added a comment - The patch is .tar.gz that includes necessary java classes which needs to be added to: src/java/org/apache/hadoop/fs/kfs a kfs-0.1.jar file that needs to be in lib directory docs/README.html that describes how to use KFS with Hadoop
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12366711/kfs-hadoop.tar.gz
        against trunk revision r580166.

        @author +1. The patch does not contain any @author tags.

        patch -1. The patch command could not apply the patch.

        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/841/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12366711/kfs-hadoop.tar.gz against trunk revision r580166. @author +1. The patch does not contain any @author tags. patch -1. The patch command could not apply the patch. Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/841/console This message is automatically generated.
        Hide
        Doug Cutting added a comment -

        Since this requires a new jar, which can't be included in a patch file, our automated patch check system will always fail on this. So don't worry about that.

        The attached tar file does not appear to contain the source for the FileSystem implementation. Perhaps you forgot to include it?

        Also, the documentation would probably be better included as a package.html, so that it is picked up and included in Hadoop's javadocs.

        Show
        Doug Cutting added a comment - Since this requires a new jar, which can't be included in a patch file, our automated patch check system will always fail on this. So don't worry about that. The attached tar file does not appear to contain the source for the FileSystem implementation. Perhaps you forgot to include it? Also, the documentation would probably be better included as a package.html, so that it is picked up and included in Hadoop's javadocs.
        Hide
        Sriram Rao added a comment -

        The previous update didn't contain the source file. Fixed that and re-uploading.

        Show
        Sriram Rao added a comment - The previous update didn't contain the source file. Fixed that and re-uploading.
        Hide
        Sriram Rao added a comment - - edited

        Doug---Thanks for pointing the issue with the .tar file. I have uploaded a new .tar.gz with the source code included; also, I have renamed README.html to package.html as you suggested.

        The package.html describes the necessary changes to the config files. In particular, a change to conf/hadoop-default.xml that defines the filesystem module to be invoked for "kfs://" urls.

        Let me know you have questions.

        Sriram

        Show
        Sriram Rao added a comment - - edited Doug---Thanks for pointing the issue with the .tar file. I have uploaded a new .tar.gz with the source code included; also, I have renamed README.html to package.html as you suggested. The package.html describes the necessary changes to the config files. In particular, a change to conf/hadoop-default.xml that defines the filesystem module to be invoked for "kfs://" urls. Let me know you have questions. Sriram
        Hide
        Nigel Daley added a comment -

        I think the package.html should go in the src/java/org/apache/hadoop/fs/kfs directory. Also, if you aren't planning to submit unit tests, perhaps this should go in the src/contrib directory.

        Show
        Nigel Daley added a comment - I think the package.html should go in the src/java/org/apache/hadoop/fs/kfs directory. Also, if you aren't planning to submit unit tests, perhaps this should go in the src/contrib directory.
        Hide
        Doug Cutting added a comment -

        I agree with Nigel. Core code should have unit tests. Without these, src/contrib is probably a better home. And the package.html should go alongside the .java files, so that it is found by javadoc and included.

        Show
        Doug Cutting added a comment - I agree with Nigel. Core code should have unit tests. Without these, src/contrib is probably a better home. And the package.html should go alongside the .java files, so that it is found by javadoc and included.
        Hide
        Sriram Rao added a comment -

        As suggested by Nigel/Doug,
        – added unit tests for KFS code
        – the unit tests can be run without a KFS deployment
        – moved package.html to be near the code.

        As part of this commit, can the following change to conf/hadoop-default.xml be made:

        <property>
        <name>fs.kfs.impl</name>
        <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value>
        <description>The FileSystem for kfs: uris.</description>
        </property>

        Thanks.

        Sriram

        Show
        Sriram Rao added a comment - As suggested by Nigel/Doug, – added unit tests for KFS code – the unit tests can be run without a KFS deployment – moved package.html to be near the code. As part of this commit, can the following change to conf/hadoop-default.xml be made: <property> <name>fs.kfs.impl</name> <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value> <description>The FileSystem for kfs: uris.</description> </property> Thanks. Sriram
        Hide
        Nigel Daley added a comment -

        Thanks for the unit test, but this looks suspicious in TestKosmosFileSystem:

        // kosmosFileSystem.initialize(URI.create(conf.get("test.fs.kfs.name")), conf);
        kosmosFileSystem.initialize(URI.create("kfs://dev104:20000/"), conf);

        Also, I'd suggest you attach 2 files: the jar file and a patch file with everything but the jar file (including any changes you want made to the hadoop-default.xml file). Then a commiter can apply the patch and put the jar file in the right place.

        Show
        Nigel Daley added a comment - Thanks for the unit test, but this looks suspicious in TestKosmosFileSystem: // kosmosFileSystem.initialize(URI.create(conf.get("test.fs.kfs.name")), conf); kosmosFileSystem.initialize(URI.create("kfs://dev104:20000/"), conf); Also, I'd suggest you attach 2 files: the jar file and a patch file with everything but the jar file (including any changes you want made to the hadoop-default.xml file). Then a commiter can apply the patch and put the jar file in the right place.
        Hide
        Doug Cutting added a comment -

        My sense is that 'URI.create("kfs://dev104:20000/")' could be replaced with something like 'URI.create("kfs://bogus/")', since no network access should be made during this test. Is that right?

        The version of the FileSystem API that's implemented by this patch is not the current trunk version. In particular, getFileStatus() must be implemented, listFileStatus() should be implemented instead of listPaths(), and many other deprecated methods should not be implemented (getReplication, isDirectory, getLength, etc.), but instead rely on the base class implementation in terms of getFileStatus().

        Finally, the bufferSize parameter is no longer supported by FSDataInputStream and FSDataOutputStream. Instead, one should interpolate a BufferedInputStream and an FSBufferedInputStream respectively to add buffering to KFS's streams.

        Show
        Doug Cutting added a comment - My sense is that 'URI.create("kfs://dev104:20000/")' could be replaced with something like 'URI.create("kfs://bogus/")', since no network access should be made during this test. Is that right? The version of the FileSystem API that's implemented by this patch is not the current trunk version. In particular, getFileStatus() must be implemented, listFileStatus() should be implemented instead of listPaths(), and many other deprecated methods should not be implemented (getReplication, isDirectory, getLength, etc.), but instead rely on the base class implementation in terms of getFileStatus(). Finally, the bufferSize parameter is no longer supported by FSDataInputStream and FSDataOutputStream. Instead, one should interpolate a BufferedInputStream and an FSBufferedInputStream respectively to add buffering to KFS's streams.
        Hide
        Sriram Rao added a comment -

        Doug/Nigel,

        As suggested by you, I have split up the attachement into two parts:

        • kfs-hadoop-jar.tar.gz: This is the jar that should go in hadoop/lib
        • kfs-hadoop-code.tar.gz: This contains the new files that are needed for KFS to be used with Hadoop; As Doug pointed out, I have updated the API support to be in-sync with what is in trunk (I had previously done with 0.13.1). I have also provided a patch for hadoop/default-conf.xml

        Let me know if you have questions.

        Thanks.

        Sriram

        Show
        Sriram Rao added a comment - Doug/Nigel, As suggested by you, I have split up the attachement into two parts: kfs-hadoop-jar.tar.gz: This is the jar that should go in hadoop/lib kfs-hadoop-code.tar.gz: This contains the new files that are needed for KFS to be used with Hadoop; As Doug pointed out, I have updated the API support to be in-sync with what is in trunk (I had previously done with 0.13.1). I have also provided a patch for hadoop/default-conf.xml Let me know if you have questions. Thanks. Sriram
        Hide
        Doug Cutting added a comment -

        Converted contribution to patch file + jar. Also added a descriptive first line to package.html, as that is used in javadoc's list of packages, and changed visibility of KFSImpl from public to package-private.

        Unless there are objections, I will commit this version soon.

        Show
        Doug Cutting added a comment - Converted contribution to patch file + jar. Also added a descriptive first line to package.html, as that is used in javadoc's list of packages, and changed visibility of KFSImpl from public to package-private. Unless there are objections, I will commit this version soon.
        Hide
        Doug Cutting added a comment -

        A new version that also makes the IFSImpl interface package-private.

        Show
        Doug Cutting added a comment - A new version that also makes the IFSImpl interface package-private.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Sriram!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Sriram!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-Nightly #260 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/260/ )
        Hide
        Sriram Rao added a comment -

        Doug/Nigel,

        I'll submit unit tests and move the .html file around.

        Aside, I also a need a change to be made to conf/hadoop-default.xml.
        I have described the change in package.html. Specifically:
        <property>
        <name>fs.kfs.impl</name>
        <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value>
        <description>The FileSystem for kfs: uris.</description>
        </property>

        This is simlar to what is done for s3. Can this change also be made
        as part of this issue?

        Sriram

        Show
        Sriram Rao added a comment - Doug/Nigel, I'll submit unit tests and move the .html file around. Aside, I also a need a change to be made to conf/hadoop-default.xml. I have described the change in package.html. Specifically: <property> <name>fs.kfs.impl</name> <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value> <description>The FileSystem for kfs: uris.</description> </property> This is simlar to what is done for s3. Can this change also be made as part of this issue? Sriram
        Hide
        Sriram Rao added a comment -

        Doug,

        Yes, that is correct.

        Ok...I'll re-do the update once more.

        Sriram

        Show
        Sriram Rao added a comment - Doug, Yes, that is correct. Ok...I'll re-do the update once more. Sriram

          People

          • Assignee:
            Sriram Rao
            Reporter:
            Sriram Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development