|
The webdav patch is now available on this separate issue:
http://issues.apache.org/jira/browse/HADOOP-505 I couldn't find the patches that provide the bridge between HDFS and WebDav - the referenced patch and JIRA issue contains only patches ot the external WebDAV library.
Is there any progress on this issue? Could you perhaps upload those preliminary support for HDFS? that patch hasn't reached submission-worth status yet.
We've slowed down our development on this, so not sure when this will actually get submitted. Hmm.. do you think it's easier at this point to implement this functionality from scratch, or to use your patches as a starting point?
FYI, Michel's moved on and is no longer contributing to Hadoop.
Is there any chance that we can get his WebDAV code posted as a patch, even if it's incomplete? Slide's WebDAV Construction Kit (WCK) looks like it might be the way to go here. I'm trying to get it going in Jetty – if I do, I'll report back here. Once that is sorted out, modifying their sample application to do HDFS should hopefully not be too hard.
Okay, here's some info on getting the reference code included with WCK going:
After checking out Slide from SVN, you need to tweak the build.xmls: C:\home\albert\work6\slide>svn diff build.xml wck\build.xml
Index: build.xml
===================================================================
--- build.xml (revision 554041)
+++ build.xml (working copy)
@@ -551,9 +551,6 @@
</copy>
</target>
<target name="dist-xml" unless="jvm14.present">
- <copy todir="${slide.dist}/slide/lib" file="${jaxp.jar}"/>
- <copy todir="${slide.dist}/slide/lib" file="${xmlapi.jar}"/>
- <copy todir="${slide.dist}/slide/lib" file="${xmlparser.jar}"/>
</target>
<!-- =================================================================== -->
<!-- Build a Slide distribution packaged as a web application -->
Index: wck/build.xml
===================================================================
--- wck/build.xml (revision 554041)
+++ wck/build.xml (working copy)
@@ -20,7 +20,7 @@
<!-- =================================================================== -->
<!-- Dependencies Properties -->
<!-- =================================================================== -->
- <property name="commons.transaction.version" value="1.1.1pre1"/>
+ <property name="commons.transaction.version" value="1.2"/>
<property name="slide.base.dir" value=".."/>
<property name="lib.dir" value="${slide.base.dir}/lib"/>
<property name="slide.lib.dir" value="${slide.base.dir}/dist/slide/lib"/>
Then build the all target of wck/build.xml, which should give you a slide.war in wck/dist. This WAR contains the org.apache.slide.simple.* stuff that implements WebDAV using a file store. Get the latest Jetty 5. Extract it. Inside the Jetty directory make a webapps2 directory. Extract the slide.war under slide/ in this directory. Copy the jetty-slide.xml and slideusers.properties (to be attached) to etc/ directory of the Jetty distribution. Run java -jar start.jar etc/jetty-slide.xml in the Jetty directory. Now you should have a WebDAV server that writes to store/files under your Jetty directory when you put a file via WebDAV. You can now also add a network place in Windows. Login with root, password root. Here's a basic Eclipse WTP project to experiment with. There's a launch configuration for Jetty included. To get going, just add all the libraries from the slide.war's WEB-INF/lib to this project's WebContent/WEB-INF/lib.
Next step: plugging in Hadoop. I have been working for webdav support in hadoop, and implemented core features including all level 1 features except properties.
I have considered 3 different webdav libraries and decided to stick with the webdav module in Apache Jackrabbit. Slide seems good enough at first, but it depends on lots of other libraries, it is poorly documented(at least for a newcomer like myself), and i think the development slowed down after 2004. Second option, which is the webdav implementation of could.it is good and satisfactory for our purpose, but i think later we may need some functionality that it does not support. Anyhow attached patch contains the implementation build upon JackRabbit's APIs. Required libraries(lib.webdav.tar.gz) should be added to lib/webdav directory. The Patch is a work in progress, however adding, deleting, copying and moving files and folders are supported. Any feedback will be appreciated. Finally we may want to add this functionality to contrib, since it involves some functionality that is not intended to be the main purpose of DFS. ps. Due to bug Extremely impressive. I was able to create a Windows web folder and copy files over without any problems.
Any idea when this patch might make it into SVN? It seems pretty much self-contained, so it shouldn't be hard to merge. That way, more people might start to experiment with it. I'm playing with the litmus tests mentioned in
This patch already applies to trunk but it should be more mature to make it into it. I intend to complete the support for properties, fix
Thanks for the pointer, i'll check it out. This patch is an improvement on the previous one. Added file creation, and content length.
litmus test results are as follows : -> running `basic':
0. init.................. pass
1. begin................. pass
2. options............... WARNING: server does not claim Class 2 compliance
...................... pass (with 1 warning)
3. put_get............... WARNING: length mismatch: 0 vs 41
...................... pass (with 1 warning)
4. put_get_utf8_segment.. WARNING: length mismatch: 0 vs 41
...................... pass (with 1 warning)
5. mkcol_over_plain...... pass
6. delete................ pass
7. delete_null........... pass
8. delete_fragment....... pass
9. mkcol................. pass
10. mkcol_again........... pass
11. delete_coll........... pass
12. mkcol_no_parent....... pass
13. mkcol_with_body....... FAIL (MKCOL with weird body must fail (RFC2518:8.3.1))
14. finish................ pass
content length warnings are due to avoiding costly du operation on folders. webdav_wip2.patch depends on the IOUtils class introduced by the patch attached to
You might want to run litmus with the -k switch so that it keeps going even it some tests fail.
Hey, I tried this patch out, and I noticed a few things:
1. The webdav server is hardcoded to bind to "localhost", so I changed it to bind to "0.0.0.0" instead. I'd prefer if clients didn't all have to run their own server: if the DNS doesn't match, or the client doesn't want to set up hadoop and configure it, it's much easier. It was nice to see this almost work, though it's not really usable for me because of problem 2. Thanks! Thanks for the feedback. The patch is in quite an early stage of development, and i intend to change the server architecture, but not until my specifications are settled. I have manually tested copying files and it worked, also it passes the litmus tests, so maybe you have some other problems with the file? You can use telnet to connect to the server and send handmade http requests. I have not implemented the browsing of filesystem part, mainly because we already have a web interface.
> You can use telnet to connect to the server and send handmade http requests.
I tried telnetting to the server and doing a GET /path/to/file.tgz. This gave me a 200 with an empty body. If I try to GET a file that doesn't exist, I get a 404 with an html error page. I added some extra debugging to DFSDavResource.java, and it looks like the getHref() function is returning malformed urls:
07/11/01 13:34:56 INFO webdav.DFSDavResource: getHref() for path:/dirs/to/my/file.tgz -> http://localhost:20015hdfs%3a//dfs.cluster.powerset.com%3a10000/dirs/to/my/file.tgz I have written a fuse (but not j-fuse) module for dfs and the performance is reasonable. I've made it RO thus far, so I don't know what the performance of writes will be like.
It seems pretty stable to date although I've only been continually running it for a few days. – pete mounted dfs on linux via fuse.
Pete, how are you bridging between fuse and dfs? There is a tgz for fusej-hadoop floating around somewhere, though it is out of date.
I implemented the DFSDavResource.spool method. This allows data to be copied out (previously a GET on any file returned an empty body). I also ported to hadoop trunk.
On an unrelated note, I think the webdav sources should be in the org.apache.hadoop.fs directory, not org.apache.hadoop.dfs, since there is nothing specific to the dfs about this patch. Here's a cleaner version of the patch.
I deleted some unused code, I removed some excess logging, removed the usage of StatusHttpServer so the patch no longer modifies core hadoop code, and I added a separate start script that can take the namenode as a command-line argument to the server. I also moved the webdav package to org.apache.hadoop.fs.webdav.
Currently writing to the DFS does not work, but I can browse and copy files out of the DFS (with Mac OSX webdav mount). I think this could become a separate (small) contrib project, since hadoop proper does not rely on it. I don't have a strong feeling about whether this belongs in core or contrib.
The bin/webdav.sh script replicates much of bin/hadoop. Why not instead just add a sub-command, 'bin/hadoop webdav'? Or, if we put this in contrib, it might call up to ../../bin/hadoop? Also:
Pete, is it still stable for reading on your linux?
What about performance? How much is it slower, then local filesystems (approximately, of course)? hi,
We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...). The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple. We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know. Attachments to follow... -thanks hi,
Attachments include the following:
-thanks Actually,
hi Owen, ok, will move fuse-j-hadoop to the
bugfixes against previous patch ("4"); it now url-decodes filenames and doesn't crash on locking commands. I can now create/delete/move files with the Mac OSX builtin webdav client.
I'm excluding the webdav.sh script; this can be invoked with bin/hadoop org.apache.hadoop.webdav.WebdavServer. I have not yet made the changes that Doug suggested. I added this to http://wiki.apache.org/hadoop/MountableHDFS
This is my webdav Level 2 implemention,hdfs-webdav
Modify from tomcat webdav implemention. download from: deploy step by step: For more recent information on this, see:
http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server It would be good to get that updated code attached to this issue. Can its authors please do that, under the Apache license? Thanks! What about converting this patches to a hdfs contrib module? And make it satisfying hadoop style. I mean use ivy, integrate startup script with bin/hdfs script use hdfs-site.xml as configuration and etc. If it's reasonable, i could do it.
Also I think I will be good to add LDAP authentication as an optional authentication mechanism. I'd love to see the project support webdav well. Before taking someone else's code and contributing it to the project you should talk to owen / doug about the licensing / legal issues. Do you need permission of the authors? If legal issues do not block adding it, I'm sure we'd welcome the work.
Eric,
according jira information almost all webdav patches are marked as "Licences for inclusion in ASF works". Doesn't it mean that they are licensed under Apache License and could be included in hadoop? Eric, I think modifying someone else's patch that was contributed to the project with the license box checked is fine.
The license box isn't strictly even required. It's mostly just a reminder of what section 5 of the license states: intentional contributions to Apache licensed works are themselves Apache licensed. Maybe I misunderstood. Is all the code already contributed? I thought it was in Google code?
Webdav module, like the fuse module, is better be a contrib under hdfs. I guess, there is no problem with the licenses, since they are all Apache 2, however the problem is choosing which patch to modify. There is at least 3 different stream of patches : the ones attached to this issue, the one at http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/webdav-server
Hi all
I wonder if anyone is working on making webdav as a contrib package? I don't see in the source tree... Also, it seem that at least two patches (iponweb) and hadoop-496-5.tgz are outdated. The iponweb package wont build against 0.20 because of jetty issues (presumably was written with some older jetty?). The hadoop-496-5.tgz was also made for older hadoop distros, the source tree doesn't match the current one. I din't try the hdfs-webdav from code.google - it seem it requires tomcat, which I need to setup in addition to hadoop? Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know. b.q. Anyway we can contribute efforts in making webdav into the hdfs as contrib, based on existing patches, please let me know.
This issue is a long standing one with lots of different efforts, votes and watchers. If you develop a patch that is stable, please attach it here, so it will be reviewed and committed back. Hi, Enis! Ok, thanks! But is there is an expert's opinion on which existing patch should be brought to stable state and committed?
Well, as the previous developer, I am biased. I would recommend checking the iponweb's patch first, since they say that it is based on the pateches in this issue. hdfs-webdav project also seems promising, but I don't know about the code other than it is based on Tomcat's Webdav servlet. Personally, I would not recommend going Tomcat-only. I'm afraid, you should check both to make an informed decision.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To limit the problems of IP and stuff, please attach the patch to this bug and I'll commit it to my code