Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA

      HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export.

      The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative.

      The typical licensing terms for WebDAV tools are also attractive:
      GPL for Linux client tools that Hadoop would not redistribute anyway.
      More importantly, Apache Project/Apache license for Java tools and for server components.
      This allows for a tighter integration with the HDFS code base.

      There are some interesting Apache projects that support WebDAV.
      But these are probably too heavyweight for the needs of Hadoop:
      Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html
      Slide: http://jakarta.apache.org/slide/

      Being HTTP-based and "backwards-compatible" with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server.

      General Clients (read-only):
      Any web browser

      Linux Clients:
      Mountable GPL davfs2 http://dav.sourceforge.net/
      FTP-like GPL Cadaver http://www.webdav.org/cadaver/

      Server Protocol compliance tests:
      http://www.webdav.org/neon/litmus/
      A goal is for Hadoop HDFS to pass this test (minus support for Properties)

      Pure Java clients:
      DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/

      WebDAV also makes it convenient to add advanced features in an incremental fashion:
      file locking, access control lists, hard links, symbolic links.
      New WebDAV standards get accepted and more or less featured WebDAV clients exist.
      core http://www.webdav.org/specs/rfc2518.html
      ACLs http://www.webdav.org/specs/rfc3744.html
      redirects "soft links" http://greenbytes.de/tech/webdav/rfc4437.html
      BIND "hard links" http://www.webdav.org/bind/
      quota http://tools.ietf.org/html/rfc4331

      1. jetty-slide.xml
        2 kB
        Albert Strasheim
      2. slideusers.properties
        0.0 kB
        Albert Strasheim
      3. hadoop-webdav.zip
        15 kB
        Albert Strasheim
      4. lib.webdav.tar.gz
        2.14 MB
        Enis Soztutar
      5. webdav_wip1.patch
        41 kB
        Enis Soztutar
      6. webdav_wip2.patch
        43 kB
        Enis Soztutar
      7. screenshot-1.jpg
        54 kB
        Pete Wyckoff
      8. hadoop-496-3.patch
        43 kB
        Michael Bieniosek
      9. hadoop-496-spool-cleanup.patch
        43 kB
        Michael Bieniosek
      10. hadoop-496-4.patch
        38 kB
        Michael Bieniosek
      11. hadoop-496-5.tgz
        7 kB
        Michael Bieniosek

        Issue Links

          Activity

          Hide
          Pier Fumagalli added a comment -

          I have received (privately) a patch from Michel Tourn for my code to allow Headoop to be exposed as a WebDAV repository.

          To limit the problems of IP and stuff, please attach the patch to this bug and I'll commit it to my code

          Show
          Pier Fumagalli added a comment - I have received (privately) a patch from Michel Tourn for my code to allow Headoop to be exposed as a WebDAV repository. To limit the problems of IP and stuff, please attach the patch to this bug and I'll commit it to my code
          Hide
          Michel Tourn added a comment -

          The webdav patch is now available on this separate issue:
          http://issues.apache.org/jira/browse/HADOOP-505

          Show
          Michel Tourn added a comment - The webdav patch is now available on this separate issue: http://issues.apache.org/jira/browse/HADOOP-505
          Hide
          Andrzej Bialecki added a comment -

          I couldn't find the patches that provide the bridge between HDFS and WebDav - the referenced patch and JIRA issue contains only patches ot the external WebDAV library.

          Is there any progress on this issue? Could you perhaps upload those preliminary support for HDFS?

          Show
          Andrzej Bialecki added a comment - I couldn't find the patches that provide the bridge between HDFS and WebDav - the referenced patch and JIRA issue contains only patches ot the external WebDAV library. Is there any progress on this issue? Could you perhaps upload those preliminary support for HDFS?
          Hide
          Yoram Arnon added a comment -

          that patch hasn't reached submission-worth status yet.
          We've slowed down our development on this, so not sure when this will actually get submitted.

          Show
          Yoram Arnon added a comment - that patch hasn't reached submission-worth status yet. We've slowed down our development on this, so not sure when this will actually get submitted.
          Hide
          Andrzej Bialecki added a comment -

          Hmm.. do you think it's easier at this point to implement this functionality from scratch, or to use your patches as a starting point?

          Show
          Andrzej Bialecki added a comment - Hmm.. do you think it's easier at this point to implement this functionality from scratch, or to use your patches as a starting point?
          Hide
          Doug Cutting added a comment -

          FYI, Michel's moved on and is no longer contributing to Hadoop.

          Is there any chance that we can get his WebDAV code posted as a patch, even if it's incomplete?

          Show
          Doug Cutting added a comment - FYI, Michel's moved on and is no longer contributing to Hadoop. Is there any chance that we can get his WebDAV code posted as a patch, even if it's incomplete?
          Hide
          Albert Strasheim added a comment -

          Slide's WebDAV Construction Kit (WCK) looks like it might be the way to go here. I'm trying to get it going in Jetty – if I do, I'll report back here. Once that is sorted out, modifying their sample application to do HDFS should hopefully not be too hard.

          Show
          Albert Strasheim added a comment - Slide's WebDAV Construction Kit (WCK) looks like it might be the way to go here. I'm trying to get it going in Jetty – if I do, I'll report back here. Once that is sorted out, modifying their sample application to do HDFS should hopefully not be too hard.
          Hide
          Albert Strasheim added a comment -

          Okay, here's some info on getting the reference code included with WCK going:

          After checking out Slide from SVN, you need to tweak the build.xmls:

          C:\home\albert\work6\slide>svn diff build.xml wck\build.xml
          Index: build.xml
          ===================================================================
          --- build.xml   (revision 554041)
          +++ build.xml   (working copy)
          @@ -551,9 +551,6 @@
                   </copy>
               </target>
               <target name="dist-xml" unless="jvm14.present">
          -        <copy todir="${slide.dist}/slide/lib" file="${jaxp.jar}"/>
          -        <copy todir="${slide.dist}/slide/lib" file="${xmlapi.jar}"/>
          -        <copy todir="${slide.dist}/slide/lib" file="${xmlparser.jar}"/>
               </target>
               <!-- =================================================================== -->
               <!-- Build a Slide distribution packaged as a web application            -->
          Index: wck/build.xml
          ===================================================================
          --- wck/build.xml       (revision 554041)
          +++ wck/build.xml       (working copy)
          @@ -20,7 +20,7 @@
                  <!-- =================================================================== -->
               <!-- Dependencies Properties                                             -->
               <!-- =================================================================== -->
          -    <property name="commons.transaction.version" value="1.1.1pre1"/>
          +    <property name="commons.transaction.version" value="1.2"/>
               <property name="slide.base.dir" value=".."/>
               <property name="lib.dir" value="${slide.base.dir}/lib"/>
               <property name="slide.lib.dir" value="${slide.base.dir}/dist/slide/lib"/>
          

          Then build the all target of wck/build.xml, which should give you a slide.war in wck/dist. This WAR contains the org.apache.slide.simple.* stuff that implements WebDAV using a file store.

          Get the latest Jetty 5. Extract it. Inside the Jetty directory make a webapps2 directory. Extract the slide.war under slide/ in this directory.

          Copy the jetty-slide.xml and slideusers.properties (to be attached) to etc/ directory of the Jetty distribution.

          Run java -jar start.jar etc/jetty-slide.xml in the Jetty directory. Now you should have a WebDAV server that writes to store/files under your Jetty directory when you put a file via WebDAV.

          You can now also add a network place in Windows. Login with root, password root.

          Show
          Albert Strasheim added a comment - Okay, here's some info on getting the reference code included with WCK going: After checking out Slide from SVN, you need to tweak the build.xmls: C:\home\albert\work6\slide>svn diff build.xml wck\build.xml Index: build.xml =================================================================== --- build.xml (revision 554041) +++ build.xml (working copy) @@ -551,9 +551,6 @@ </copy> </target> <target name="dist-xml" unless="jvm14.present"> - <copy todir="${slide.dist}/slide/lib" file="${jaxp.jar}"/> - <copy todir="${slide.dist}/slide/lib" file="${xmlapi.jar}"/> - <copy todir="${slide.dist}/slide/lib" file="${xmlparser.jar}"/> </target> <!-- =================================================================== --> <!-- Build a Slide distribution packaged as a web application --> Index: wck/build.xml =================================================================== --- wck/build.xml (revision 554041) +++ wck/build.xml (working copy) @@ -20,7 +20,7 @@ <!-- =================================================================== --> <!-- Dependencies Properties --> <!-- =================================================================== --> - <property name="commons.transaction.version" value="1.1.1pre1"/> + <property name="commons.transaction.version" value="1.2"/> <property name="slide.base.dir" value=".."/> <property name="lib.dir" value="${slide.base.dir}/lib"/> <property name="slide.lib.dir" value="${slide.base.dir}/dist/slide/lib"/> Then build the all target of wck/build.xml, which should give you a slide.war in wck/dist. This WAR contains the org.apache.slide.simple.* stuff that implements WebDAV using a file store. Get the latest Jetty 5. Extract it. Inside the Jetty directory make a webapps2 directory. Extract the slide.war under slide/ in this directory. Copy the jetty-slide.xml and slideusers.properties (to be attached) to etc/ directory of the Jetty distribution. Run java -jar start.jar etc/jetty-slide.xml in the Jetty directory. Now you should have a WebDAV server that writes to store/files under your Jetty directory when you put a file via WebDAV. You can now also add a network place in Windows. Login with root, password root.
          Hide
          Albert Strasheim added a comment -

          Here's a basic Eclipse WTP project to experiment with. There's a launch configuration for Jetty included. To get going, just add all the libraries from the slide.war's WEB-INF/lib to this project's WebContent/WEB-INF/lib.

          Next step: plugging in Hadoop.

          Show
          Albert Strasheim added a comment - Here's a basic Eclipse WTP project to experiment with. There's a launch configuration for Jetty included. To get going, just add all the libraries from the slide.war's WEB-INF/lib to this project's WebContent/WEB-INF/lib. Next step: plugging in Hadoop.
          Hide
          Enis Soztutar added a comment -

          I have been working for webdav support in hadoop, and implemented core features including all level 1 features except properties.
          I have considered 3 different webdav libraries and decided to stick with the webdav module in Apache Jackrabbit. Slide seems good enough at first, but it depends on lots of other libraries, it is poorly documented(at least for a newcomer like myself), and i think the development slowed down after 2004. Second option, which is the webdav implementation of could.it is good and satisfactory for our purpose, but i think later we may need some functionality that it does not support.
          Anyhow attached patch contains the implementation build upon JackRabbit's APIs. Required libraries(lib.webdav.tar.gz) should be added to lib/webdav directory.

          The Patch is a work in progress, however adding, deleting, copying and moving files and folders are supported. Any feedback will be appreciated.

          Finally we may want to add this functionality to contrib, since it involves some functionality that is not intended to be the main purpose of DFS.

          ps. Due to bug HADOOP-1647, all files in DFS are treated as collections(folders).

          Show
          Enis Soztutar added a comment - I have been working for webdav support in hadoop, and implemented core features including all level 1 features except properties. I have considered 3 different webdav libraries and decided to stick with the webdav module in Apache Jackrabbit. Slide seems good enough at first, but it depends on lots of other libraries, it is poorly documented(at least for a newcomer like myself), and i think the development slowed down after 2004. Second option, which is the webdav implementation of could.it is good and satisfactory for our purpose, but i think later we may need some functionality that it does not support. Anyhow attached patch contains the implementation build upon JackRabbit's APIs. Required libraries(lib.webdav.tar.gz) should be added to lib/webdav directory. The Patch is a work in progress, however adding, deleting, copying and moving files and folders are supported. Any feedback will be appreciated. Finally we may want to add this functionality to contrib, since it involves some functionality that is not intended to be the main purpose of DFS. ps. Due to bug HADOOP-1647 , all files in DFS are treated as collections(folders).
          Hide
          Albert Strasheim added a comment -

          Extremely impressive. I was able to create a Windows web folder and copy files over without any problems.

          Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it.

          Show
          Albert Strasheim added a comment - Extremely impressive. I was able to create a Windows web folder and copy files over without any problems. Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it.
          Hide
          Albert Strasheim added a comment - - edited

          I'm playing with the litmus tests mentioned in HADOOP-512. To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case.

          Show
          Albert Strasheim added a comment - - edited I'm playing with the litmus tests mentioned in HADOOP-512 . To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case.
          Hide
          Enis Soztutar added a comment -

          Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it.

          This patch already applies to trunk but it should be more mature to make it into it. I intend to complete the support for properties, fix HADOOP-1647 and do some more testing.

          I'm playing with the litmus tests mentioned in HADOOP-512. To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case.

          Thanks for the pointer, i'll check it out.

          Show
          Enis Soztutar added a comment - Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it. This patch already applies to trunk but it should be more mature to make it into it. I intend to complete the support for properties, fix HADOOP-1647 and do some more testing. I'm playing with the litmus tests mentioned in HADOOP-512 . To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case. Thanks for the pointer, i'll check it out.
          Hide
          Enis Soztutar added a comment - - edited

          This patch is an improvement on the previous one. Added file creation, and content length.
          litmus test results are as follows :

          -> running `basic':
           0. init.................. pass
           1. begin................. pass
           2. options............... WARNING: server does not claim Class 2 compliance
              ...................... pass (with 1 warning)
           3. put_get............... WARNING: length mismatch: 0 vs 41
              ...................... pass (with 1 warning)
           4. put_get_utf8_segment.. WARNING: length mismatch: 0 vs 41
              ...................... pass (with 1 warning)
           5. mkcol_over_plain...... pass
           6. delete................ pass
           7. delete_null........... pass
           8. delete_fragment....... pass
           9. mkcol................. pass
          10. mkcol_again........... pass
          11. delete_coll........... pass
          12. mkcol_no_parent....... pass
          13. mkcol_with_body....... FAIL (MKCOL with weird body must fail (RFC2518:8.3.1))
          14. finish................ pass
          

          content length warnings are due to avoiding costly du operation on folders.
          The patch depends on HADOOP-1654.

          Show
          Enis Soztutar added a comment - - edited This patch is an improvement on the previous one. Added file creation, and content length. litmus test results are as follows : -> running `basic': 0. init.................. pass 1. begin................. pass 2. options............... WARNING: server does not claim Class 2 compliance ...................... pass (with 1 warning) 3. put_get............... WARNING: length mismatch: 0 vs 41 ...................... pass (with 1 warning) 4. put_get_utf8_segment.. WARNING: length mismatch: 0 vs 41 ...................... pass (with 1 warning) 5. mkcol_over_plain...... pass 6. delete................ pass 7. delete_null........... pass 8. delete_fragment....... pass 9. mkcol................. pass 10. mkcol_again........... pass 11. delete_coll........... pass 12. mkcol_no_parent....... pass 13. mkcol_with_body....... FAIL (MKCOL with weird body must fail (RFC2518:8.3.1)) 14. finish................ pass content length warnings are due to avoiding costly du operation on folders. The patch depends on HADOOP-1654 .
          Hide
          Albert Strasheim added a comment -

          webdav_wip2.patch depends on the IOUtils class introduced by the patch attached to HADOOP-1654.

          Show
          Albert Strasheim added a comment - webdav_wip2.patch depends on the IOUtils class introduced by the patch attached to HADOOP-1654 .
          Hide
          Albert Strasheim added a comment -

          You might want to run litmus with the -k switch so that it keeps going even it some tests fail.

          Show
          Albert Strasheim added a comment - You might want to run litmus with the -k switch so that it keeps going even it some tests fail.
          Hide
          Michael Bieniosek added a comment -

          Hey, I tried this patch out, and I noticed a few things:

          1. The webdav server is hardcoded to bind to "localhost", so I changed it to bind to "0.0.0.0" instead. I'd prefer if clients didn't all have to run their own server: if the DNS doesn't match, or the client doesn't want to set up hadoop and configure it, it's much easier.
          2. When I actually tried to copy files out, I get a funny error in the client (on Mac OSX, it says "There is a problem with the file and it cannot be copied"). I wish I could be more helpful, but I don't know how to issue raw HTTP to the webdav server and there's nothing indicative in the webdav server log.
          3. If I point an ordinary browser (or wget) at the webdav server, I get a 200 with an empty body for files that exist, and a 404 for files that don't exist. Again, I don't know much about webdav, but it would be nice if you could browse and download with an ordinary browser, as in subversion.

          It was nice to see this almost work, though it's not really usable for me because of problem 2.

          Thanks!

          Show
          Michael Bieniosek added a comment - Hey, I tried this patch out, and I noticed a few things: 1. The webdav server is hardcoded to bind to "localhost", so I changed it to bind to "0.0.0.0" instead. I'd prefer if clients didn't all have to run their own server: if the DNS doesn't match, or the client doesn't want to set up hadoop and configure it, it's much easier. 2. When I actually tried to copy files out, I get a funny error in the client (on Mac OSX, it says "There is a problem with the file and it cannot be copied"). I wish I could be more helpful, but I don't know how to issue raw HTTP to the webdav server and there's nothing indicative in the webdav server log. 3. If I point an ordinary browser (or wget) at the webdav server, I get a 200 with an empty body for files that exist, and a 404 for files that don't exist. Again, I don't know much about webdav, but it would be nice if you could browse and download with an ordinary browser, as in subversion. It was nice to see this almost work, though it's not really usable for me because of problem 2. Thanks!
          Hide
          Enis Soztutar added a comment -

          Thanks for the feedback. The patch is in quite an early stage of development, and i intend to change the server architecture, but not until my specifications are settled. I have manually tested copying files and it worked, also it passes the litmus tests, so maybe you have some other problems with the file? You can use telnet to connect to the server and send handmade http requests. I have not implemented the browsing of filesystem part, mainly because we already have a web interface.

          Show
          Enis Soztutar added a comment - Thanks for the feedback. The patch is in quite an early stage of development, and i intend to change the server architecture, but not until my specifications are settled. I have manually tested copying files and it worked, also it passes the litmus tests, so maybe you have some other problems with the file? You can use telnet to connect to the server and send handmade http requests. I have not implemented the browsing of filesystem part, mainly because we already have a web interface.
          Hide
          Michael Bieniosek added a comment -

          > You can use telnet to connect to the server and send handmade http requests.

          I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave me a 200 with an empty body. If I try to GET a file that doesn't exist, I get a 404 with an html error page.

          Show
          Michael Bieniosek added a comment - > You can use telnet to connect to the server and send handmade http requests. I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave me a 200 with an empty body. If I try to GET a file that doesn't exist, I get a 404 with an html error page.
          Hide
          Michael Bieniosek added a comment -

          I added some extra debugging to DFSDavResource.java, and it looks like the getHref() function is returning malformed urls:

          07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for path:/dirs/to/my/file.tgz -> http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a10000/dirs/to/my/file.tgz

          Show
          Michael Bieniosek added a comment - I added some extra debugging to DFSDavResource.java, and it looks like the getHref() function is returning malformed urls: 07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for path:/dirs/to/my/file.tgz -> http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a10000/dirs/to/my/file.tgz
          Hide
          Pete Wyckoff added a comment -

          I have written a fuse (but not j-fuse) module for dfs and the performance is reasonable. I've made it RO thus far, so I don't know what the performance of writes will be like.

          It seems pretty stable to date although I've only been continually running it for a few days.

          – pete

          Show
          Pete Wyckoff added a comment - I have written a fuse (but not j-fuse) module for dfs and the performance is reasonable. I've made it RO thus far, so I don't know what the performance of writes will be like. It seems pretty stable to date although I've only been continually running it for a few days. – pete
          Hide
          Pete Wyckoff added a comment -

          mounted dfs on linux via fuse.

          Show
          Pete Wyckoff added a comment - mounted dfs on linux via fuse.
          Hide
          Michael Bieniosek added a comment -

          Pete, how are you bridging between fuse and dfs? There is a tgz for fusej-hadoop floating around somewhere, though it is out of date.

          Show
          Michael Bieniosek added a comment - Pete, how are you bridging between fuse and dfs? There is a tgz for fusej-hadoop floating around somewhere, though it is out of date.
          Hide
          Michael Bieniosek added a comment -

          I implemented the DFSDavResource.spool method. This allows data to be copied out (previously a GET on any file returned an empty body). I also ported to hadoop trunk.

          On an unrelated note, I think the webdav sources should be in the org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is nothing specific to the dfs about this patch.

          Show
          Michael Bieniosek added a comment - I implemented the DFSDavResource.spool method. This allows data to be copied out (previously a GET on any file returned an empty body). I also ported to hadoop trunk. On an unrelated note, I think the webdav sources should be in the org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is nothing specific to the dfs about this patch.
          Hide
          Michael Bieniosek added a comment -

          Here's a cleaner version of the patch.

          Show
          Michael Bieniosek added a comment - Here's a cleaner version of the patch.
          Hide
          Michael Bieniosek added a comment -

          I deleted some unused code, I removed some excess logging, removed the usage of StatusHttpServer so the patch no longer modifies core hadoop code, and I added a separate start script that can take the namenode as a command-line argument to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav.

          Currently writing to the DFS does not work, but I can browse and copy files out of the DFS (with Mac OSX webdav mount).

          I think this could become a separate (small) contrib project, since hadoop proper does not rely on it.

          Show
          Michael Bieniosek added a comment - I deleted some unused code, I removed some excess logging, removed the usage of StatusHttpServer so the patch no longer modifies core hadoop code, and I added a separate start script that can take the namenode as a command-line argument to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav. Currently writing to the DFS does not work, but I can browse and copy files out of the DFS (with Mac OSX webdav mount). I think this could become a separate (small) contrib project, since hadoop proper does not rely on it.
          Hide
          Doug Cutting added a comment -

          I don't have a strong feeling about whether this belongs in core or contrib.

          The bin/webdav.sh script replicates much of bin/hadoop. Why not instead just add a sub-command, 'bin/hadoop webdav'? Or, if we put this in contrib, it might call up to ../../bin/hadoop?

          Also:

          • some unit tests would be nice;
          • javadoc is missing;
          • the main() doesn't use Hadoop's standard command line parser (GenericOptions).
          Show
          Doug Cutting added a comment - I don't have a strong feeling about whether this belongs in core or contrib. The bin/webdav.sh script replicates much of bin/hadoop. Why not instead just add a sub-command, 'bin/hadoop webdav'? Or, if we put this in contrib, it might call up to ../../bin/hadoop? Also: some unit tests would be nice; javadoc is missing; the main() doesn't use Hadoop's standard command line parser (GenericOptions).
          Hide
          Ilya M. Slepnev added a comment -

          Pete, is it still stable for reading on your linux?
          What about performance? How much is it slower, then local filesystems (approximately, of course)?

          Show
          Ilya M. Slepnev added a comment - Pete, is it still stable for reading on your linux? What about performance? How much is it slower, then local filesystems (approximately, of course)?
          Hide
          Anurag Sharma added a comment -

          hi,

          We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...).

          The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple.

          We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know.

          Attachments to follow...

          -thanks

          Show
          Anurag Sharma added a comment - hi, We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...). The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple. We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know. Attachments to follow... -thanks
          Hide
          Anurag Sharma added a comment -

          hi,

          Attachments include the following:

          • fuse-j-hadoop package
          • fuse-j patch.

          -thanks

          Show
          Anurag Sharma added a comment - hi, Attachments include the following: fuse-j-hadoop package fuse-j patch. -thanks
          Hide
          Owen O'Malley added a comment -

          Actually, HADOOP-4 would be a better jira for this. smile I love close really old bugs.

          Show
          Owen O'Malley added a comment - Actually, HADOOP-4 would be a better jira for this. smile I love close really old bugs.
          Hide
          Anurag Sharma added a comment -

          hi Owen, ok, will move fuse-j-hadoop to the HADOOP-4 jira. Thanks for the info.

          Show
          Anurag Sharma added a comment - hi Owen, ok, will move fuse-j-hadoop to the HADOOP-4 jira. Thanks for the info.
          Hide
          Michael Bieniosek added a comment -

          bugfixes against previous patch ("4"); it now url-decodes filenames and doesn't crash on locking commands. I can now create/delete/move files with the Mac OSX builtin webdav client.

          I'm excluding the webdav.sh script; this can be invoked with bin/hadoop org.apache.hadoop.webdav.WebdavServer.

          I have not yet made the changes that Doug suggested.

          Show
          Michael Bieniosek added a comment - bugfixes against previous patch ("4"); it now url-decodes filenames and doesn't crash on locking commands. I can now create/delete/move files with the Mac OSX builtin webdav client. I'm excluding the webdav.sh script; this can be invoked with bin/hadoop org.apache.hadoop.webdav.WebdavServer. I have not yet made the changes that Doug suggested.
          Hide
          Pete Wyckoff added a comment -

          I added this to http://wiki.apache.org/hadoop/MountableHDFS which also contains info about fuse-dfs and fuse-j-dfs and hdfs-fuse (which is very similar to fuse-dfs)

          Show
          Pete Wyckoff added a comment - I added this to http://wiki.apache.org/hadoop/MountableHDFS which also contains info about fuse-dfs and fuse-j-dfs and hdfs-fuse (which is very similar to fuse-dfs)
          Hide
          badqiu added a comment -

          This is my webdav Level 2 implemention,hdfs-webdav
          Modify from tomcat webdav implemention.

          download from:
          http://code.google.com/p/hdfs-webdav/downloads/list

          deploy step by step:
          1. download it,and modify WEB-INF/classes/hadoop-size.xml fs.default.name
          2. deploy on tomcat server.
          3. visit http://localhost:8080/hdfs-webdav for test deploy success

          Show
          badqiu added a comment - This is my webdav Level 2 implemention,hdfs-webdav Modify from tomcat webdav implemention. download from: http://code.google.com/p/hdfs-webdav/downloads/list deploy step by step: 1. download it,and modify WEB-INF/classes/hadoop-size.xml fs.default.name 2. deploy on tomcat server. 3. visit http://localhost:8080/hdfs-webdav for test deploy success
          Hide
          Doug Cutting added a comment -

          For more recent information on this, see:

          http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server

          It would be good to get that updated code attached to this issue. Can its authors please do that, under the Apache license? Thanks!

          Show
          Doug Cutting added a comment - For more recent information on this, see: http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server It would be good to get that updated code attached to this issue. Can its authors please do that, under the Apache license? Thanks!
          Hide
          Vladimir Klimontovich added a comment -

          What about converting this patches to a hdfs contrib module? And make it satisfying hadoop style. I mean use ivy, integrate startup script with bin/hdfs script use hdfs-site.xml as configuration and etc. If it's reasonable, i could do it.

          Also I think I will be good to add LDAP authentication as an optional authentication mechanism.

          Show
          Vladimir Klimontovich added a comment - What about converting this patches to a hdfs contrib module? And make it satisfying hadoop style. I mean use ivy, integrate startup script with bin/hdfs script use hdfs-site.xml as configuration and etc. If it's reasonable, i could do it. Also I think I will be good to add LDAP authentication as an optional authentication mechanism.
          Hide
          eric baldeschwieler added a comment -

          I'd love to see the project support webdav well. Before taking someone else's code and contributing it to the project you should talk to owen / doug about the licensing / legal issues. Do you need permission of the authors? If legal issues do not block adding it, I'm sure we'd welcome the work.

          Show
          eric baldeschwieler added a comment - I'd love to see the project support webdav well. Before taking someone else's code and contributing it to the project you should talk to owen / doug about the licensing / legal issues. Do you need permission of the authors? If legal issues do not block adding it, I'm sure we'd welcome the work.
          Hide
          Vladimir Klimontovich added a comment -

          Eric,

          according jira information almost all webdav patches are marked as "Licences for inclusion in ASF works". Doesn't it mean that they are licensed under Apache License and could be included in hadoop?

          Show
          Vladimir Klimontovich added a comment - Eric, according jira information almost all webdav patches are marked as "Licences for inclusion in ASF works". Doesn't it mean that they are licensed under Apache License and could be included in hadoop?
          Hide
          Doug Cutting added a comment -

          Eric, I think modifying someone else's patch that was contributed to the project with the license box checked is fine.

          The license box isn't strictly even required. It's mostly just a reminder of what section 5 of the license states: intentional contributions to Apache licensed works are themselves Apache licensed.

          Show
          Doug Cutting added a comment - Eric, I think modifying someone else's patch that was contributed to the project with the license box checked is fine. The license box isn't strictly even required. It's mostly just a reminder of what section 5 of the license states: intentional contributions to Apache licensed works are themselves Apache licensed.
          Hide
          eric baldeschwieler added a comment -

          Maybe I misunderstood. Is all the code already contributed? I thought it was in Google code?

          Show
          eric baldeschwieler added a comment - Maybe I misunderstood. Is all the code already contributed? I thought it was in Google code?
          Hide
          Enis Soztutar added a comment -

          Webdav module, like the fuse module, is better be a contrib under hdfs. I guess, there is no problem with the licenses, since they are all Apache 2, however the problem is choosing which patch to modify. There is at least 3 different stream of patches : the ones attached to this issue, the one at http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server, and the one at http://code.google.com/p/hdfs-webdav/downloads/list.

          Show
          Enis Soztutar added a comment - Webdav module, like the fuse module, is better be a contrib under hdfs. I guess, there is no problem with the licenses, since they are all Apache 2, however the problem is choosing which patch to modify. There is at least 3 different stream of patches : the ones attached to this issue, the one at http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server , and the one at http://code.google.com/p/hdfs-webdav/downloads/list .
          Hide
          Artem Trunov added a comment -

          Hi all
          I wonder if anyone is working on making webdav as a contrib package? I don't see in the source tree... Also, it seem that at least two patches (iponweb) and hadoop-496-5.tgz are outdated. The iponweb package wont build against 0.20 because of jetty issues (presumably was written with some older jetty?). The hadoop-496-5.tgz was also made for older hadoop distros, the source tree doesn't match the current one. I din't try the hdfs-webdav from code.google - it seem it requires tomcat, which I need to setup in addition to hadoop?
          Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.

          Show
          Artem Trunov added a comment - Hi all I wonder if anyone is working on making webdav as a contrib package? I don't see in the source tree... Also, it seem that at least two patches (iponweb) and hadoop-496-5.tgz are outdated. The iponweb package wont build against 0.20 because of jetty issues (presumably was written with some older jetty?). The hadoop-496-5.tgz was also made for older hadoop distros, the source tree doesn't match the current one. I din't try the hdfs-webdav from code.google - it seem it requires tomcat, which I need to setup in addition to hadoop? Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.
          Hide
          Enis Soztutar added a comment -

          b.q. Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.
          This issue is a long standing one with lots of different efforts, votes and watchers. If you develop a patch that is stable, please attach it here, so it will be reviewed and committed back.

          Show
          Enis Soztutar added a comment - b.q. Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know. This issue is a long standing one with lots of different efforts, votes and watchers. If you develop a patch that is stable, please attach it here, so it will be reviewed and committed back.
          Hide
          Artem Trunov added a comment -

          Hi, Enis! Ok, thanks! But is there is an expert's opinion on which existing patch should be brought to stable state and committed?

          Show
          Artem Trunov added a comment - Hi, Enis! Ok, thanks! But is there is an expert's opinion on which existing patch should be brought to stable state and committed?
          Hide
          Enis Soztutar added a comment -

          Well, as the previous developer, I am biased. I would recommend checking the iponweb's patch first, since they say that it is based on the pateches in this issue. hdfs-webdav project also seems promising, but I don't know about the code other than it is based on Tomcat's Webdav servlet. Personally, I would not recommend going Tomcat-only. I'm afraid, you should check both to make an informed decision.

          Show
          Enis Soztutar added a comment - Well, as the previous developer, I am biased. I would recommend checking the iponweb's patch first, since they say that it is based on the pateches in this issue. hdfs-webdav project also seems promising, but I don't know about the code other than it is based on Tomcat's Webdav servlet. Personally, I would not recommend going Tomcat-only. I'm afraid, you should check both to make an informed decision.
          Hide
          Konstantin Boudnik added a comment -

          Seems like it has been done by HDFS-2178

          Show
          Konstantin Boudnik added a comment - Seems like it has been done by HDFS-2178
          Hide
          Moritz Moeller added a comment -

          HDFS-2178 exposes a different interface, not WebDAV.

          Show
          Moritz Moeller added a comment - HDFS-2178 exposes a different interface, not WebDAV.

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Michel Tourn
            • Votes:
              13 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:

                Development