Issue Details (XML | Word | Printable)

Key: HDFS-225
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Enis Soztutar
Reporter: Michel Tourn
Votes: 10
Watchers: 25
Operations

If you were logged in you would be able to see more operations.
Hadoop HDFS

Expose HDFS as a WebDAV store

Created: 30/Aug/06 07:28 PM   Updated: 03/Aug/09 01:30 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works hadoop-496-3.patch 2007-11-15 12:33 AM Michael Bieniosek 43 kB
Text File Licensed for inclusion in ASF works hadoop-496-4.patch 2007-11-16 10:25 PM Michael Bieniosek 38 kB
File Licensed for inclusion in ASF works hadoop-496-5.tgz 2008-01-26 07:18 AM Michael Bieniosek 7 kB
Text File Licensed for inclusion in ASF works hadoop-496-spool-cleanup.patch 2007-11-15 01:43 AM Michael Bieniosek 43 kB
Zip Archive hadoop-webdav.zip 2007-07-07 01:23 PM Albert Strasheim 15 kB
XML File Licensed for inclusion in ASF works jetty-slide.xml 2007-07-07 01:11 PM Albert Strasheim 2 kB
GZip Archive Licensed for inclusion in ASF works lib.webdav.tar.gz 2007-07-25 01:38 PM Enis Soztutar 2.14 MB
File Licensed for inclusion in ASF works slideusers.properties 2007-07-07 01:12 PM Albert Strasheim 0.0 kB
Text File Licensed for inclusion in ASF works webdav_wip1.patch 2007-07-25 01:38 PM Enis Soztutar 41 kB
Text File Licensed for inclusion in ASF works webdav_wip2.patch 2007-07-26 03:34 PM Enis Soztutar 43 kB
Image Attachments:

1. screenshot-1.jpg
(54 kB)
Issue Links:
Dependants
 
Reference
 


 Description  « Hide
WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA

HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export.

The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative.

The typical licensing terms for WebDAV tools are also attractive:
GPL for Linux client tools that Hadoop would not redistribute anyway.
More importantly, Apache Project/Apache license for Java tools and for server components.
This allows for a tighter integration with the HDFS code base.

There are some interesting Apache projects that support WebDAV.
But these are probably too heavyweight for the needs of Hadoop:
Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html
Slide: http://jakarta.apache.org/slide/

Being HTTP-based and "backwards-compatible" with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server.

General Clients (read-only):
Any web browser

Linux Clients:
Mountable GPL davfs2 http://dav.sourceforge.net/
FTP-like GPL Cadaver http://www.webdav.org/cadaver/

Server Protocol compliance tests:
http://www.webdav.org/neon/litmus/
A goal is for Hadoop HDFS to pass this test (minus support for Properties)

Pure Java clients:
DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/

WebDAV also makes it convenient to add advanced features in an incremental fashion:
file locking, access control lists, hard links, symbolic links.
New WebDAV standards get accepted and more or less featured WebDAV clients exist.
core http://www.webdav.org/specs/rfc2518.html
ACLs http://www.webdav.org/specs/rfc3744.html
redirects "soft links" http://greenbytes.de/tech/webdav/rfc4437.html
BIND "hard links" http://www.webdav.org/bind/
quota http://tools.ietf.org/html/rfc4331



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pier Fumagalli added a comment - 03/Sep/06 11:24 AM
I have received (privately) a patch from Michel Tourn for my code to allow Headoop to be exposed as a WebDAV repository.

To limit the problems of IP and stuff, please attach the patch to this bug and I'll commit it to my code


Michel Tourn added a comment - 03/Sep/06 05:37 PM
The webdav patch is now available on this separate issue:
http://issues.apache.org/jira/browse/HADOOP-505

Andrzej Bialecki added a comment - 06/Nov/06 08:00 AM
I couldn't find the patches that provide the bridge between HDFS and WebDav - the referenced patch and JIRA issue contains only patches ot the external WebDAV library.

Is there any progress on this issue? Could you perhaps upload those preliminary support for HDFS?


Yoram Arnon added a comment - 06/Nov/06 10:02 PM
that patch hasn't reached submission-worth status yet.
We've slowed down our development on this, so not sure when this will actually get submitted.

Andrzej Bialecki added a comment - 08/Nov/06 05:52 PM
Hmm.. do you think it's easier at this point to implement this functionality from scratch, or to use your patches as a starting point?

Doug Cutting added a comment - 08/Nov/06 09:00 PM
FYI, Michel's moved on and is no longer contributing to Hadoop.

Is there any chance that we can get his WebDAV code posted as a patch, even if it's incomplete?


Albert Strasheim added a comment - 07/Jul/07 09:45 AM
Slide's WebDAV Construction Kit (WCK) looks like it might be the way to go here. I'm trying to get it going in Jetty – if I do, I'll report back here. Once that is sorted out, modifying their sample application to do HDFS should hopefully not be too hard.

Albert Strasheim added a comment - 07/Jul/07 01:11 PM
Okay, here's some info on getting the reference code included with WCK going:

After checking out Slide from SVN, you need to tweak the build.xmls:

C:\home\albert\work6\slide>svn diff build.xml wck\build.xml
Index: build.xml
===================================================================
--- build.xml   (revision 554041)
+++ build.xml   (working copy)
@@ -551,9 +551,6 @@
         </copy>
     </target>
     <target name="dist-xml" unless="jvm14.present">
-        <copy todir="${slide.dist}/slide/lib" file="${jaxp.jar}"/>
-        <copy todir="${slide.dist}/slide/lib" file="${xmlapi.jar}"/>
-        <copy todir="${slide.dist}/slide/lib" file="${xmlparser.jar}"/>
     </target>
     <!-- =================================================================== -->
     <!-- Build a Slide distribution packaged as a web application            -->
Index: wck/build.xml
===================================================================
--- wck/build.xml       (revision 554041)
+++ wck/build.xml       (working copy)
@@ -20,7 +20,7 @@
        <!-- =================================================================== -->
     <!-- Dependencies Properties                                             -->
     <!-- =================================================================== -->
-    <property name="commons.transaction.version" value="1.1.1pre1"/>
+    <property name="commons.transaction.version" value="1.2"/>
     <property name="slide.base.dir" value=".."/>
     <property name="lib.dir" value="${slide.base.dir}/lib"/>
     <property name="slide.lib.dir" value="${slide.base.dir}/dist/slide/lib"/>

Then build the all target of wck/build.xml, which should give you a slide.war in wck/dist. This WAR contains the org.apache.slide.simple.* stuff that implements WebDAV using a file store.

Get the latest Jetty 5. Extract it. Inside the Jetty directory make a webapps2 directory. Extract the slide.war under slide/ in this directory.

Copy the jetty-slide.xml and slideusers.properties (to be attached) to etc/ directory of the Jetty distribution.

Run java -jar start.jar etc/jetty-slide.xml in the Jetty directory. Now you should have a WebDAV server that writes to store/files under your Jetty directory when you put a file via WebDAV.

You can now also add a network place in Windows. Login with root, password root.


Albert Strasheim added a comment - 07/Jul/07 01:23 PM
Here's a basic Eclipse WTP project to experiment with. There's a launch configuration for Jetty included. To get going, just add all the libraries from the slide.war's WEB-INF/lib to this project's WebContent/WEB-INF/lib.

Next step: plugging in Hadoop.


Enis Soztutar added a comment - 25/Jul/07 01:38 PM
I have been working for webdav support in hadoop, and implemented core features including all level 1 features except properties.
I have considered 3 different webdav libraries and decided to stick with the webdav module in Apache Jackrabbit. Slide seems good enough at first, but it depends on lots of other libraries, it is poorly documented(at least for a newcomer like myself), and i think the development slowed down after 2004. Second option, which is the webdav implementation of could.it is good and satisfactory for our purpose, but i think later we may need some functionality that it does not support.
Anyhow attached patch contains the implementation build upon JackRabbit's APIs. Required libraries(lib.webdav.tar.gz) should be added to lib/webdav directory.

The Patch is a work in progress, however adding, deleting, copying and moving files and folders are supported. Any feedback will be appreciated.

Finally we may want to add this functionality to contrib, since it involves some functionality that is not intended to be the main purpose of DFS.

ps. Due to bug HADOOP-1647, all files in DFS are treated as collections(folders).


Albert Strasheim added a comment - 25/Jul/07 07:34 PM
Extremely impressive. I was able to create a Windows web folder and copy files over without any problems.

Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it.


Albert Strasheim added a comment - 25/Jul/07 07:57 PM - edited
I'm playing with the litmus tests mentioned in HADOOP-512. To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case.

Enis Soztutar added a comment - 25/Jul/07 09:12 PM

Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it.

This patch already applies to trunk but it should be more mature to make it into it. I intend to complete the support for properties, fix HADOOP-1647 and do some more testing.

I'm playing with the litmus tests mentioned in HADOOP-512. To fix the basic/delete_null test, it seems DFSDavResource.removeMember should throw DavException(DavServletResponse.SC_NOT_FOUND) in the !success case.

Thanks for the pointer, i'll check it out.


Enis Soztutar added a comment - 26/Jul/07 03:34 PM - edited
This patch is an improvement on the previous one. Added file creation, and content length.
litmus test results are as follows :
-> running `basic':
 0. init.................. pass
 1. begin................. pass
 2. options............... WARNING: server does not claim Class 2 compliance
    ...................... pass (with 1 warning)
 3. put_get............... WARNING: length mismatch: 0 vs 41
    ...................... pass (with 1 warning)
 4. put_get_utf8_segment.. WARNING: length mismatch: 0 vs 41
    ...................... pass (with 1 warning)
 5. mkcol_over_plain...... pass
 6. delete................ pass
 7. delete_null........... pass
 8. delete_fragment....... pass
 9. mkcol................. pass
10. mkcol_again........... pass
11. delete_coll........... pass
12. mkcol_no_parent....... pass
13. mkcol_with_body....... FAIL (MKCOL with weird body must fail (RFC2518:8.3.1))
14. finish................ pass

content length warnings are due to avoiding costly du operation on folders.
The patch depends on HADOOP-1654.


Albert Strasheim added a comment - 27/Jul/07 08:46 PM
webdav_wip2.patch depends on the IOUtils class introduced by the patch attached to HADOOP-1654.

Albert Strasheim added a comment - 27/Jul/07 09:20 PM
You might want to run litmus with the -k switch so that it keeps going even it some tests fail.

Michael Bieniosek added a comment - 21/Oct/07 07:33 PM
Hey, I tried this patch out, and I noticed a few things:

1. The webdav server is hardcoded to bind to "localhost", so I changed it to bind to "0.0.0.0" instead. I'd prefer if clients didn't all have to run their own server: if the DNS doesn't match, or the client doesn't want to set up hadoop and configure it, it's much easier.
2. When I actually tried to copy files out, I get a funny error in the client (on Mac OSX, it says "There is a problem with the file and it cannot be copied"). I wish I could be more helpful, but I don't know how to issue raw HTTP to the webdav server and there's nothing indicative in the webdav server log.
3. If I point an ordinary browser (or wget) at the webdav server, I get a 200 with an empty body for files that exist, and a 404 for files that don't exist. Again, I don't know much about webdav, but it would be nice if you could browse and download with an ordinary browser, as in subversion.

It was nice to see this almost work, though it's not really usable for me because of problem 2.

Thanks!


Enis Soztutar added a comment - 22/Oct/07 06:28 AM
Thanks for the feedback. The patch is in quite an early stage of development, and i intend to change the server architecture, but not until my specifications are settled. I have manually tested copying files and it worked, also it passes the litmus tests, so maybe you have some other problems with the file? You can use telnet to connect to the server and send handmade http requests. I have not implemented the browsing of filesystem part, mainly because we already have a web interface.

Michael Bieniosek added a comment - 22/Oct/07 04:31 PM
> You can use telnet to connect to the server and send handmade http requests.

I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave me a 200 with an empty body. If I try to GET a file that doesn't exist, I get a 404 with an html error page.


Michael Bieniosek added a comment - 01/Nov/07 08:38 PM
I added some extra debugging to DFSDavResource.java, and it looks like the getHref() function is returning malformed urls:

07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for path:/dirs/to/my/file.tgz -> http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a10000/dirs/to/my/file.tgz


Pete Wyckoff added a comment - 07/Nov/07 12:36 AM
I have written a fuse (but not j-fuse) module for dfs and the performance is reasonable. I've made it RO thus far, so I don't know what the performance of writes will be like.

It seems pretty stable to date although I've only been continually running it for a few days.

– pete


Pete Wyckoff added a comment - 07/Nov/07 12:38 AM
mounted dfs on linux via fuse.

Michael Bieniosek added a comment - 07/Nov/07 09:54 PM
Pete, how are you bridging between fuse and dfs? There is a tgz for fusej-hadoop floating around somewhere, though it is out of date.

Michael Bieniosek added a comment - 15/Nov/07 12:33 AM
I implemented the DFSDavResource.spool method. This allows data to be copied out (previously a GET on any file returned an empty body). I also ported to hadoop trunk.

On an unrelated note, I think the webdav sources should be in the org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is nothing specific to the dfs about this patch.


Michael Bieniosek added a comment - 15/Nov/07 01:43 AM
Here's a cleaner version of the patch.

Michael Bieniosek added a comment - 16/Nov/07 10:25 PM
I deleted some unused code, I removed some excess logging, removed the usage of StatusHttpServer so the patch no longer modifies core hadoop code, and I added a separate start script that can take the namenode as a command-line argument to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav.

Currently writing to the DFS does not work, but I can browse and copy files out of the DFS (with Mac OSX webdav mount).

I think this could become a separate (small) contrib project, since hadoop proper does not rely on it.


Doug Cutting added a comment - 19/Nov/07 08:22 PM
I don't have a strong feeling about whether this belongs in core or contrib.

The bin/webdav.sh script replicates much of bin/hadoop. Why not instead just add a sub-command, 'bin/hadoop webdav'? Or, if we put this in contrib, it might call up to ../../bin/hadoop?

Also:

  • some unit tests would be nice;
  • javadoc is missing;
  • the main() doesn't use Hadoop's standard command line parser (GenericOptions).

Ilya M. Slepnev added a comment - 20/Nov/07 05:12 PM
Pete, is it still stable for reading on your linux?
What about performance? How much is it slower, then local filesystems (approximately, of course)?

Anurag Sharma added a comment - 01/Dec/07 12:12 AM
hi,

We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...).

The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple.

We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know.

Attachments to follow...

-thanks


Anurag Sharma added a comment - 01/Dec/07 12:18 AM
hi,

Attachments include the following:

  • fuse-j-hadoop package
  • fuse-j patch.

-thanks


Owen O'Malley added a comment - 02/Dec/07 02:25 PM
Actually, HADOOP-4 would be a better jira for this. smile I love close really old bugs.

Anurag Sharma added a comment - 03/Dec/07 06:54 PM
hi Owen, ok, will move fuse-j-hadoop to the HADOOP-4 jira. Thanks for the info.

Michael Bieniosek added a comment - 26/Jan/08 07:18 AM
bugfixes against previous patch ("4"); it now url-decodes filenames and doesn't crash on locking commands. I can now create/delete/move files with the Mac OSX builtin webdav client.

I'm excluding the webdav.sh script; this can be invoked with bin/hadoop org.apache.hadoop.webdav.WebdavServer.

I have not yet made the changes that Doug suggested.


Pete Wyckoff added a comment - 02/Sep/08 04:31 PM
I added this to http://wiki.apache.org/hadoop/MountableHDFS which also contains info about fuse-dfs and fuse-j-dfs and hdfs-fuse (which is very similar to fuse-dfs)

badqiu added a comment - 19/Nov/08 09:15 AM
This is my webdav Level 2 implemention,hdfs-webdav
Modify from tomcat webdav implemention.

download from:
http://code.google.com/p/hdfs-webdav/downloads/list

deploy step by step:
1. download it,and modify WEB-INF/classes/hadoop-size.xml fs.default.name
2. deploy on tomcat server.
3. visit http://localhost:8080/hdfs-webdav for test deploy success


Doug Cutting added a comment - 23/Jan/09 08:36 PM
For more recent information on this, see:

http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server

It would be good to get that updated code attached to this issue. Can its authors please do that, under the Apache license? Thanks!


Vladimir Klimontovich added a comment - 13/Jul/09 11:38 PM
What about converting this patches to a hdfs contrib module? And make it satisfying hadoop style. I mean use ivy, integrate startup script with bin/hdfs script use hdfs-site.xml as configuration and etc. If it's reasonable, i could do it.

Also I think I will be good to add LDAP authentication as an optional authentication mechanism.


eric baldeschwieler added a comment - 14/Jul/09 04:49 PM
I'd love to see the project support webdav well. Before taking someone else's code and contributing it to the project you should talk to owen / doug about the licensing / legal issues. Do you need permission of the authors? If legal issues do not block adding it, I'm sure we'd welcome the work.

Vladimir Klimontovich added a comment - 14/Jul/09 05:01 PM
Eric,

according jira information almost all webdav patches are marked as "Licences for inclusion in ASF works". Doesn't it mean that they are licensed under Apache License and could be included in hadoop?


Doug Cutting added a comment - 15/Jul/09 04:13 PM
Eric, I think modifying someone else's patch that was contributed to the project with the license box checked is fine.

The license box isn't strictly even required. It's mostly just a reminder of what section 5 of the license states: intentional contributions to Apache licensed works are themselves Apache licensed.


eric baldeschwieler added a comment - 16/Jul/09 07:19 AM
Maybe I misunderstood. Is all the code already contributed? I thought it was in Google code?

Enis Soztutar added a comment - 19/Jul/09 09:21 PM
Webdav module, like the fuse module, is better be a contrib under hdfs. I guess, there is no problem with the licenses, since they are all Apache 2, however the problem is choosing which patch to modify. There is at least 3 different stream of patches : the ones attached to this issue, the one at http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server, and the one at http://code.google.com/p/hdfs-webdav/downloads/list.

Artem Trunov added a comment - 03/Aug/09 11:59 AM
Hi all
I wonder if anyone is working on making webdav as a contrib package? I don't see in the source tree... Also, it seem that at least two patches (iponweb) and hadoop-496-5.tgz are outdated. The iponweb package wont build against 0.20 because of jetty issues (presumably was written with some older jetty?). The hadoop-496-5.tgz was also made for older hadoop distros, the source tree doesn't match the current one. I din't try the hdfs-webdav from code.google - it seem it requires tomcat, which I need to setup in addition to hadoop?
Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.

Enis Soztutar added a comment - 03/Aug/09 12:42 PM
b.q. Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.
This issue is a long standing one with lots of different efforts, votes and watchers. If you develop a patch that is stable, please attach it here, so it will be reviewed and committed back.

Artem Trunov added a comment - 03/Aug/09 01:20 PM
Hi, Enis! Ok, thanks! But is there is an expert's opinion on which existing patch should be brought to stable state and committed?

Enis Soztutar added a comment - 03/Aug/09 01:30 PM
Well, as the previous developer, I am biased. I would recommend checking the iponweb's patch first, since they say that it is based on the pateches in this issue. hdfs-webdav project also seems promising, but I don't know about the code other than it is based on Tomcat's Webdav servlet. Personally, I would not recommend going Tomcat-only. I'm afraid, you should check both to make an informed decision.