|
John Xing made changes - 05/Feb/06 04:06 AM
Doug Cutting made changes - 07/Feb/06 03:54 AM
Doug Cutting made changes - 07/Feb/06 04:18 AM
Doug Cutting made changes - 06/Jun/06 06:16 AM
Doug Cutting made changes - 07/Jun/06 04:37 AM
Doug Cutting made changes - 03/Aug/06 05:45 PM
I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.2.3 and HADOOP.0.5.0
Enjoy
Nguyen Quoc Mai made changes - 17/Aug/06 05:35 PM
I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.4 and HADOOP.0.5.0
Enjoy
Nguyen Quoc Mai made changes - 17/Aug/06 05:36 PM
I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with HADOOP.0.5.0
Enjoy
Nguyen Quoc Mai made changes - 17/Aug/06 05:39 PM
I'd like to be able to commit this to the contrib tree, but it could use a bit more polish. Unfortunately we cannot commit the fuse-j jar file, since it is released under the LGPL. What would be great is a build.xml that downloaded and built fuse-j.
The README could also include more info on installing fuse. On Ubuntu I was able to do this with: sudo apt-get install fuse-utils libfuse2 libfuse-devel If you've never built things on your Ubuntu box then you might also need: sudo apt-get install make gcc libc6-dev There are already tools to mount WebDAV on Windows, MacOS and Linux. So HADOOP-496 will provide a more universal solution for this issue. I'd like to resolve this as a duplicate of that issue. Does anyone object?
Has anyone made any comparison (performance or other) of these two different approaches. It lloks like the webdav version (were dfs client is in servlet) introduces one more network hop compared to fuse version (where the dfs client is in machine using the dfs).
One way or the other I could easily satisfy my mounting needs with any one of these solutions - so hopefully one will be integrated (so it gets maintenance attention). I'm cancelling this patch to remove it from the queue of patches that we intend to immenently apply to the current trunk. This is still useful stuff, and, depending on what happens with webdav, we may still decide to integrate it, so the issue should remain open for now.
Doug Cutting made changes - 03/Nov/06 07:29 PM
Doug Cutting made changes - 15/Dec/06 09:52 PM
Hello,
We posted this on HADOOP-496 and were pointed to this jira entry as a better place to post this patch. Pasting our original submission message below... -------------------------------------- We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...). The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple. We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know. Attachments to follow... -thanks Attachments include the following:
Anurag Sharma made changes - 03/Dec/07 07:49 PM
It is unfortunate that this requires patches to the fuse-j sources. The fuse-j project does not appear to be active, so I don't see much point in trying to submit these as a patch there.
We cannot host GPL'd code or even patches to GPL'd code at Apache. We can however have optional commands in our build scripts that download GPL'd code. So this could be structured as a contrib module whose build.xml, when a non-default build option is specified, downloads, patches and compiles fuse-j. But since the patch to fuse-j itself cannot be hosted at Apache, it might be simpler to just create a jar file for the patched version, host that somewhere, and bypass the patch and compile steps. It would be great to get this into a form that we can commit, so that it stays synchronized with the rest of Hadoop. I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module.
thus far I have only implemented read-only and everything works fine. It's been up for about a month with low use, but no problems. pete > I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module.
Yes, you noted that in HADOOP-496. Can you post a patch that implements this? Yes but need a couple of days to get our build guy to make the Makefile.am less facebook-centric.
hi Doug,
Thanks for pointing out this issue. I will remove the FUSE-J patch and try one of the other routes you suggested (to have a patched FUSE-J available), and will come back with a resolution on this very soon. -anurag Hi Doug,
I went through the license for Fuse-J and it is distributed under LGPL, do you think that would allow the Fuse-J patches to be hosted on Apache? (In the latter case we would still modify the submission above to be a contrib module that downloads Fuse-J, applies our patch, and builds it, except we won't have to find a place to host the patch). -thanks > I went through the license for Fuse-J and it is distributed under LGPL,
Unfortunately, the ASF cannot host things published under LGPL either. Sorry! hi Doug. ok :- ), we will follow one of the alternate options you suggested of hosting either the patch or the jar file ourselves, and fixing the fuse-j-hadoop package build to work with this. Will re-submit our changes soon.
-thanks, -anurag
Anurag Sharma made changes - 05/Dec/07 10:44 PM
Here's the source. I will attach the Makefiles and full tar tomorrow or Wed morning.
Pete Wyckoff made changes - 11/Dec/07 12:00 AM
Anurag Sharma made changes - 12/Dec/07 07:30 PM
Anurag Sharma made changes - 12/Dec/07 07:31 PM
hi,
We re-submitted the fuse-j-hadoopfs package with the following changes (as suggested above):
We restructured the fuse-j-hadoopfs build to be a contrib, and have tested it with the Hadoop source-tree build. The fuse-j-hadoopfs build is a no-op when a standard "compile" target is specified. To actually build fuse-j-hadoopfs, the user has to specify the following command line: "ant compile -Dbuild-fuse-j-hadoopfs=1". We still have the following todo's remaining:
The above tarball (fuse-j-hadoopfs-03.tar.gz) consists of a directory that can be placed inside "hadoop/src/contrib", please let us know if we should submit this as a patch-file instead, or if we need to make more changes... -thanks Here's the same code (w/o the closelog after the return in main
0. install hadoop 14.x Obviously, hadoop needs to be in your class path and your ld library path needs the fuse and hdfs .sos When ready for production remove the -debug (which is an option to fuse-dfs for deubgging) and the -d which is the fuse debugging option. On my return from vacation, I will make better docs and autoconf. Note again, this is read only, but it is really easy to implement writes. – pete
Pete Wyckoff made changes - 13/Dec/07 11:00 PM
I implemented mkdir, mv, rm and rmdir. But, since the programmatic API doesn't use the Trash feature, this is a pretty big problem.
2 solutions: 1. have fuse dfs rename things into /Trash I'm just wondering why the programmatic API never used trash in the first place?? – pete > I'm just wondering why the programmatic API never used trash in the first place??
Because most other programmatic APIs don't. Unix 'rm' does not use the trash, nor does the unlink system call, nor does DOS command line, etc. Trash is usually only a feature of a user-interface, like a GUI and command shells designed for interactive use. The hadoop API is not POSIX-compliant. Wouldn't it be better to protect people from what is catastrophic data loss?
newer version which supports mkdir, rmdir, mv, and rm. NOTE - this is still a work in progress. Any help appreciated
Pete Wyckoff made changes - 24/Jan/08 07:24 PM
Since a) the latest patches from Pete are not depending on fuse-j and b) fuse is now part of linux kernel (2.6.14 and later) It would be really nice to get this into hadoop proper.
Before that it would be nice to try this out so i am begging for some more information on how to compile this I did some testing with the c version of this tool (and managed to write a working makefile
Sami - can you attach the Makefile?
should have mentioned the problem with mine is that it is generated with some custom auto make macros that we use here.
Here's the makefile I used. One should perhaps adapt to the model that the other c* modules are using.
Sami Siren made changes - 28/Jan/08 06:31 PM
Here's a version with configure to make things easier.
Just run bootstrap.sh after setting the right hdfs, fuse and jdk paths. I still haven't added a version # (next on my list) or better docs (next next on my list). — pete
Pete Wyckoff made changes - 07/Feb/08 10:12 PM
Includes version # from the configure.ac into fuse_dfs.c so fuse_dfs --version prints it out. And also include better README documentation.
The package is pretty much complete now other than implementing writes! If someone has 0.15+ Hadoop installed and wants to work on it, that would be great. 15 because you can see file creates before they are closed which is needed for the implementation. I only have 0.14.2 right now. This newest version should be pretty self explanatory.
Pete Wyckoff made changes - 08/Feb/08 10:42 PM
Pete,
I have been experimenting with fuse_dfs.c and have a few questions: (1) I am using a previous version of fuse_dfs.c, mainly because I dont have bootstrap.sh. However, with respect to the new fuse_dfs.c option parsing - is this compatible with calling via mount.fuse, and autofs? This how I currently mount using an autofs map containing: hdfs -fstype=fuse,rw,nodev,nonempty,noatime,allow_other :/path/to/fuse_dfs_moutn/fuse_dfs.sh\#dfs\://namenode\:9000
fuse_dfs.sh is just a shell script setting the CLASSPATH and LD_LIBRARY_PATH, and essentially, just execs the fuse_dfs. If I changed to the more recent version, I would probably have to put the dfs://namenode:9000 configuration into the script I think. (2) Have you done any sort of performance testing? I'm experimenting with HDFS for use in a mixed envionment (hadoop and non-hadoop jobs), and the throughput I see is miserable. For example, I use a test network of 8 P3-1GHz nodes, and a similar client on 100meg network. Below, I compare cat-ing a 512MB file from (a) an NFS mount on the same network as the cluster nodes (b) using the hadoop frontend and (c) using the FUSE HDFS filesystem. # (a) $ time cat /mnt/tmp/data.df > /dev/null real 0m47.280s user 0m0.059s sys 0m2.476s # (b) $ time bin/hadoop fs -cat hdfs:///user/craigm/data.df > /dev/null real 0m48.839s user 0m16.256s sys 0m7.001s # (c) $ time cat /misc/hdfs/user/craigm/data.df >/dev/null real 1m41.686s user 0m0.135s sys 0m2.302s Note that the NFS and Hadoop fs -cat obtain about 10.5MB/sec, while the hdfs fuse mount (in /misc/hdfs) achieves only 5MB/sec. Is this an expected overhead for FUSE? I did try tuning rd_buf_size to match the size of reads that the kernel was requesting - ie 128KB instead of 32KB, however this made matters worse: # with 128KB buffer size $ time cat /misc/hdfs/user/craigm/data.df >/dev/null real 2m11.080s user 0m0.113s sys 0m2.180s Perhaps an option would be to keep the HDFS file open between reads and timeout the connection when not used, or something; read more than we need and then keep it in the memory? Both would overly complicate the neat code though! (3) If I use an autofs for hdfs, then mounts will timeout quickly (30 seconds), and then reconnect again on demand. Perhaps fuse_dfs.c can implement the destroy fuse operation to free up the connection to the namenode, etc? Cheers Craig Hi Craig,
I may be using an older fuse? I have #dfs#dfs://hadoopNN01.facebook.com:9000 /mnt/hdfs fuse allow_other,rw 0 0 For performance, it's really fast when there's 0 load on the namenode, but start running a couple of jobs and it gets killed. Obviously because one operation may require multiple fuse calls and thus multiple dfs calls versus the single one for bin/hadoop dfs. I'm noticing that all the critical sections in the namenode do things like logging to the debug log IN the critical section. And since everything locks fsRoot, just one job can really hose a real-time system. I don't see anything to do here other than trying to fix these after verifying with jconsole this is the problem. And I will add the autofs destroy function this week. So, what do you think I should do for #1, revert to the older configuration code? (which was way nicer anyway). Do you have your bash script so I can use it too? thanks, pete For #1, I think thats an fstab line, not an autofs line, perhaps? I'm not sure the fuse version will make any difference, mount will only pass through the source path, the dest path and the -o options. Would it be possible to keep all options for fuse_dfs.c in -o options? On the other hand, if a shell script is always needed to set the CLASSPATH and LD_LIBRARY_PATH options, then it is not as important how the options are set from fstab or autofs. (LD_LIBRARY_PATH could be fixed bya /etc/ld.conf.d entry)
My test had no load on the namenode, and usage of the namenode CPU looks low. In contrast, the fuse_dcs CPU usage was high. I would like to profile or jconsole the fuse_dfs binary, but the getJNIEnv() method in src/c++/libhdfs/hdfsJniHelper.c could be a bit more helpful for passing agument to the JVM initialisation. Essentially, it only allows fills in -Djava.class.path from the CLASSPATH to be set, not any other arbitrary system properties or -X options etc, bah. Reported separately as Will attach bash script shortly. C Shell script to set LD_LIBRARY_PATH and CLASSPATH when called from mount.fuse (which can be called from mount, and hence from autofs etc).
Craig Macdonald made changes - 20/Feb/08 02:52 PM
I made some changes - mainly caching the file handle in fi->fh and it's performing much better now. I'm attaching the new one.
– pete newest fuse dfs which should perofrm better on reads. I'm seeing this on a cluster in use:
> time cat part-00000 > /dev/null real 1m25.078s
Pete Wyckoff made changes - 20/Feb/08 11:13 PM
Craig Macdonald made changes - 20/Feb/08 11:42 PM
Hi Craig,
I'm setting fi->fh in open as you suggest. You're right I didn't look at pread and tell here - good point. – pete I added the destroy method. Also looked and pread should be correct.
I'm hoping to test with 0.16.x soon and see if writes work! – pete
Pete Wyckoff made changes - 21/Feb/08 12:25 AM
version 0.2.0 includes better read and in theory writes but they won't work wi/o hadoop 0.16 and I can't test. Obviously, the writes have to be append only. And I'm still not sure what the semantics are as far as block size.
Pete Wyckoff made changes - 21/Feb/08 12:40 AM
should have been this one that includes init method
Pete Wyckoff made changes - 21/Feb/08 12:57 AM
Hi Pete, Definently using the latest tar this time Some comments: 1. Firstly, I shouldnt have deleted my last comment - though it was clearly in error as I was reading the wrong version of fuse_dfs.c. In your comments, can you say which file you've just uploaded? For posterity, previous comment was:
2. With respect to the read speed, this is indeed a bit faster in our test setting (nearer 6MB/sec), but not yet similar to the Hadoop fs shell (about 10.5MB/sec). Fuse version 2.7.2 # time bin/hadoop fs -cat /user/craigm/data.df > /dev/null real 0m50.347s user 0m16.023s sys 0m6.644s # time cat /misc/hdfs/user/craigm/data.df > /dev/null real 1m31.263s user 0m0.131s sys 0m2.384s I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know how much CPU time it burns. Can I ask how your test time test compares to using the Hadoop fs shell on the same machine? When reading, the CPU on the client is used 45%ish, similar to the Hadoop fs shell CPU use. I feel it would be good to aim for similar performance as the Hadoop fs shell, as this seems reasonable compared to NFS in my test setting, and should scale better as the number of concurrent reads increases, given available wire bandwidth. 3. With respect to the build system, it could be clearer what --with-dfspath= is meant to point to. src/Makefile.am seems to assume that include files are at ${dfspath}/include/linux and the hdfs.so at ${dfspath}/include/shared. This isnt how the Hadoop installation is laid out. Perhaps it would be better if we could give an option to the hadoop installation and it's taken from there? 4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my shell script about guessing the locations of the JRE shared lib files. 5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that was absolute - ie to your installation. Should be deleted when building tar file. 6 (minor). update print_usage if you're happy with the specification of filesystem options. I made no changes to my shell script or my autofs mount for this version to work Cheers Craig
Doug Cutting made changes - 21/Feb/08 06:45 PM
Pete,
I have tried everything to profile fuse_dfs. Valgrind (callgrind) doesnt play with Sun Java, and I failed to get the GNU profiler to give any output. I wrote a patch for Craig Hi Rui,
It's because of the order FUSE calls the implementation. The file is created with an open and then closed and then opened again. So in <16, that second open in write mode will fail So, I think as long as appends are coming in, it should work with 16. I haven't looked at the implementation, but hopefully it buffers things till it gets to a full block and isn't creating small blocks all over the place pete ps although we're on 15.3 at FB, I can't upgrade our test cluster to 16 yet as we want to install HBASE and need to try it there. I'm gonna try applying the hbase patch that the powerset guys nicely gave me soon and then if all goes well, I can upgrade our test cluster to 16 and test things. Hi Craig,
I see now that the buffer size for the OS reads is only 128K and since there's no ioctl for fuse to bump it up, it's a problem. I think what we can do is create a fuse block device mount and then using blockdev -setblocksize 128M, we should see some real speedups. Unfortunately, fuse on my dev machine isn't configured properly for this. First there's no /dev/fuseblk which it assumes and even with that there I get a useless error message. I just want to verify with one of the kernel guys here how big the buffer can get before going to crazy. I'll ask one of them today and update this tomorrow. – pete Craig,
I should mention I tried to get fuse to do more readahead than 128K, but setting that param didn't seem to do anything. I can probably play with this tomorrow. To be honest, I don't know exactly what it means when you configure fuse module as a block device since you also need to specify the mount point. Is fuse under the covers just doing the block device so we can do better ioctl? I mean there's no way to implement a real block device since we'd have to make it look like a real filesystem. But, fuse requires both the block device and the mount point. – pete Hi Pete,
The block stuff in fuse is appallingly documented. I have hunted the Web for info on this all afternoon, to understand it further. To be honest, the only thing I have found useful is reading the source of ntfs-3g.c at http://ntfs-3g.cvs.sourceforge.net/ntfs-3g/ntfs-3g/src/ntfs-3g.c?revision=1.106&view=markup I test I did do a few days ago was to comparing reading an NFS mounted file directly vs, the same file read via NFS via a FUSE fs - http://mattwork.potsdam.edu/projects/wiki/index.php/Rofs#rofs_code_.28C.29 I dont have any objections to pretty large buffer sizes for fuse_dfs.c - HDFS is designed for large files, and streaming read access. Btw, you mentioned you are re-exporting the mounted FS as NFS - have you had any issues vs the issues described in fuses' README.NFS? Regards Craig I just talked to one of our kernel guys and he isn't 100% sure as he hasn't done that much IO stuff on Linux but thinks the 128K readahead may just be the maximum. we could always do like 1MB readaheads ourselves although that would complicate things - although not that much since we could keep the cached data with the open file handle so there's no dirty cache problems or garbage collection issues since we just dump it when we do the close. So, maybe that's the easiest way to go... I can probably look at that tomorrow or Thursday. pete Hi Pete,
Thanks a lot for the explanation! But as I heard from the hdfs team, 0.16 still does not support file appending. Suppose appending is not supported, can we still try writing file in Fuse as the work around you described? Thanks, Rui Hi Pete,
Have you had a chance to look at FUSE readaheads? I have attached a version of fuse_dfs.c I have patched, which reads 10MB chunks from DFS, and cache these in the a struct held in the filehandle. I'm seeing some improvement (down to 1m 20 compared to "bin/hadoop dfs -cat file > /dev/null" which takes about 50 seconds). Increasing the buffer size shows some improvement [I only did some quick tests] - I tried up to 30MB, but I dont think there's much improvement over 5-10MB Do you think we're reaching the limit such that the overheads of JNI are making it impossible to go any faster? Ie Where do we go from here? Another comment I have is that the configure/makefile asks for a dfs_home. It might be easier to ask for Hadoop home, then build the appropriate paths from there (${hadoop_home}/libhdfs and ${hadoop_home}/src/c++/libhdfs). Hadoop has no include/linux folders etc. Finally, we need a way to detect whether to use i386 or amd64 to find jvm.so Craig
Craig Macdonald made changes - 19/Mar/08 06:38 PM
HEre's my most recent one. I will try merging Craig's read ahead code in and then I guess see about getting it into contrib.
Pete Wyckoff made changes - 10/Apr/08 11:44 PM
I should have mentioned I fixed the autoconf problems and made the "protectedpaths" configurable. I guess we'll have to have a discussion about whether people like this because I think Doug clearly doesn't
This is by no means a completely final product but more of a version 0.1. but, it has decent autoconf, comments and readme files and has been working in production for quite a while.
Pete Wyckoff made changes - 11/Apr/08 12:54 AM
Pete Wyckoff made changes - 11/Apr/08 12:55 AM
Pete Wyckoff made changes - 11/Apr/08 12:56 AM
Changed the affected versions and fixes to unknown from 0.5
Pete Wyckoff made changes - 11/Apr/08 12:57 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379898/patch.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit -1. The applied patch generated 211 release audit warnings (more than the trunk's current 202 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/testReport/ This message is automatically generated. Help - I don't know what a release audit warning is ?? It just lists the filenames in the release audit link.
Also, unit testing for this is pretty hard, but can be done to some extent in the future by running each function like fuse calls them, but these would be C unit tests anyway which I don't know if we have support for. Do people want to comment on the feature of moving deleted files to /Trash and also of not allowing rmdir on some "special directories" e.g., '/' '/user' /warehouse' ... ?? The release audit flags new files that don't contain the Apache license (or old files that have had it removed). In this case most of those flagged are fine to not have the Apache license, since they're automatically generated stuff, but it probably wouldn't hurt to add it to the shell scripts.
Some automated tests would be good, e.g., an end-to-end test that starts HDFS, mounts it with fuse, and then lists and reads files through the mount. But such tests should not be run by default, since the default build does not compile C++ code, nor should it depend on fuse being installed. But it would be good to eventually configure Hudson to run these, to verify that fuse continues to show signs of life as Hadoop evolves. So, in summary, Hudson will not generate a clean report card for this issue, since it will contain some files that don't have the Apache license, and Hudson will not, at this point, automatically run any new JUnit tests for it. But that doesn't mean that some licenses and tests shouldn't still be added before we commit the patch. Hi Doug,
I will add the header to all the files - think I just had it in the C file. Sure, I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount. pete > I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount.
It will be easier to integrate Java unit tests. Also, currently I don't think we require Python, so I wouldn't want to add a system dependency just to test one component. Perhaps, if you don't like Java, you could write tests in C or as bash scripts, then somehow hook them into the test-contrib target? FWIW, Hudson nightly and patch builds do run with -Dcompile.c++=yes so that tests for Pipes and libhdfs get built and run. What doesn't get built is the eclipse plugin and the native compression library (libhadoop).
Latest update - includes all the headers for license in every file and a test/TestFuseDFS.java . Not sure how to link this into the other build.xmls to have it buil\t and run but assume we don't want that right now anwyay.
Pete Wyckoff made changes - 15/Apr/08 06:47 PM
Pete Wyckoff made changes - 15/Apr/08 06:53 PM
I have created
I updated the patch and am hoping this operation re-starts things JIRA wise - ie runs tests and email Doug.
Pete Wyckoff made changes - 17/Apr/08 07:00 PM
Pete Wyckoff made changes - 17/Apr/08 11:11 PM
Pete Wyckoff made changes - 17/Apr/08 11:11 PM
Pete Wyckoff made changes - 17/Apr/08 11:35 PM
Pete Wyckoff made changes - 17/Apr/08 11:37 PM
Pete Wyckoff made changes - 18/Apr/08 06:23 PM
Pete Wyckoff made changes - 18/Apr/08 06:25 PM
Pete,
I havent had time to test your latest patch, but things seems to be improving. I note your comments about exporting the fuse mount. There is a README.NFS in the fuse distribution, which concerns exporting FUSE mounts. I have copied it in verbatim below from version 2.7.3 - seems not quite mature yet. FUSE module in official kernels (>= 2.6.14) don't support NFS exporting. In this case if you need NFS exporting capability, use the '--enable-kernel-module' configure option to compile the module from this package. And make sure, that the FUSE is not compiled into the kernel (CONFIG_FUSE_FS must be 'm' or 'n'). You need to add an fsid=NNN option to /etc/exports to make exporting a FUSE directory work. You may get ESTALE (Stale NFS file handle) errors with this. This is because the current FUSE kernel API and the userspace library cannot handle a situation where the kernel forgets about an inode which is still referenced by the remote NFS client. This problem will be addressed in a later version. In the future it planned that NFS exporting will be done solely in userspace. Regards C +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380454/patch2.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/testReport/ This message is automatically generated. Can someone please validate that this works for them?
I finally got this to compile, after modifying plain-bootstrap.sh and src/Makefile.am. The latter has some hardwired paths to make things work for Pete. Most of the stuff in the former is stuff that's already known to Hadoop's build (JDK location, libhdfs location, etc.)
It should be possible to get this to build from the top-level build.xml, provided:
Pete, are you familiar with Ant? Addressed doug's concerns.
I got rid of the home/pwyckoff sutff in Makefie.am, switched all the autoconf variables to match build.xml env vars and added a compile-fusedfs target to src/contrib/build-contrib.xml and I updated the documents. So, now to build, you: ant compile-contrib -Dfusedfs=1
Pete Wyckoff made changes - 26/Apr/08 01:47 AM
Pete Wyckoff made changes - 26/Apr/08 01:47 AM
Pete Wyckoff made changes - 26/Apr/08 01:48 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380975/patch3.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2335/console This message is automatically generated. Hi,
probably I'm missing something, but I don't understand how patch mechanism works. (Actually, I'm not sure this is the appropriate way, in term of netiquette, to find help) Do I download and untar every files present above? Could someone suggest me how I can install this sw? thanks in advance Maurizio I have some minor issues. I was working on compiling on Friday afternoon, but Doug beat me to it with identical comments
Will test new build in due course. Maurizio - see http://en.wikipedia.org/wiki/Patch_(Unix Fixed a few changes and addressed the points Craig and Doug brought up. Changes: 1. I changed the top-level build to have a compile-contrib-fuse target that exports the right properties and then has a subant task of the build.xml in the fuse-dfs directory. 2. fixed fuse_dfs_wrapper.sh to set env vars only if not set and to pass all the args ala $@ to the executable 3. added build.xml in src/contrib/fuse-dfs 4. I updated README and I removed README.build
Pete Wyckoff made changes - 28/Apr/08 07:00 PM
Hi Maurizio,
To apply this patch, go to your checkout of hadoop - the top level and do "patch -p0 < patch4.txt" This should apply it, and then read the README is src/contrib/fuse-dfs on instructions on how to compile. Let me know if you have any problems. – pete
Pete Wyckoff made changes - 29/Apr/08 01:35 AM
Pete Wyckoff made changes - 29/Apr/08 01:35 AM
Pete Wyckoff made changes - 29/Apr/08 05:38 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381125/patch4.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/testReport/ This message is automatically generated. This still doesn't work out of the box for me. I'll attach a new version that does.
Doug Cutting made changes - 29/Apr/08 09:14 PM
Here's a version that builds for me. I changed it to fit within the normal contrib compilation framework.
If you want to compile it alone, manually, then you must first run 'ant -Dcompile.c+=1 compile-libhdfs' at root, then run 'ant -Dfusedfs=1' when connected to src/contrib/fuse-dfs, or you can simply run 'ant compile-contrib -Dcompile.c+=1 -Dcompile.fusedfs=1' at top-level to compile it along with all other contrib modules. I have not yet tested that it runs, however. Building this generates a lot of files that we'll need to add to the svn ignore list, and that are not removed by 'make clean'. Are all of these needed?
Doug Cutting made changes - 29/Apr/08 09:24 PM
The right version of the patch...
Doug Cutting made changes - 29/Apr/08 09:26 PM
After applying the patch you must 'chmod +x src/contrib/fuse-dfs/bootstrap.sh' before building the first time.
Doug Cutting made changes - 29/Apr/08 10:11 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381140/HADOOP-4.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/testReport/ This message is automatically generated. thanks doug. I will provide a make clean that cleans up everything.
Doug,
Your patch didn't seem to work for me unless I modified the entries using ${basedir} in src/contrib/fuse-dfs/build.xml to append ../../../ to it. It seems in my build, basedir is now the fuse-dfs dir whereas before, I think it was pointing to the top level ?? Note, the comment I made about the {basedir} in src/contrib/fuse-dfs/build.xml seems to be the thing that also made the last hudson build fail.
Pete Wyckoff made changes - 30/Apr/08 12:51 AM
Pete Wyckoff made changes - 30/Apr/08 12:51 AM
sorry - should be patch4.txt
Pete Wyckoff made changes - 30/Apr/08 12:52 AM
Your diff above isn't to my patch. My patch, e.g., sets HADOOP_HOME to ${hadoop.home}, not to ${basedir}. In my patch, src/contrib/fuse-dfs/build.xml does not refer to ${basedir} at all, since ${basedir} is the CWD and can thus be elided in relative paths.
Pete Wyckoff made changes - 01/May/08 06:35 PM
Pete Wyckoff made changes - 01/May/08 06:35 PM
Pete Wyckoff made changes - 01/May/08 06:36 PM
Your changes to src/c+/libhdfs/Makefile break things for me. Also, you undid one of my changes to build.xml, making compile-libhdfs conditional on compile.c+. This is required to make compile-libhdfs an optional target, which we must do now that compile-contrib depends on it. Finally, there are still a lot of generated symlinks and files left in src/contrib/fuse-dfs after a 'make clean'.
Doug Cutting made changes - 01/May/08 08:22 PM
all im' trying to do is add:
clean: to src/contrib/Makefile.am (this will clean up everything) and change src/contrib/build.xml to do executable bin/sh with arg value=bootstrap.sh to avoid the chmod +x problem. I thought that is all i changed.
Pete Wyckoff made changes - 01/May/08 09:46 PM
Pete Wyckoff made changes - 01/May/08 09:47 PM
last patch is just doug's
> avoid the chmod +x problem
I wouldn't worry too much about that. The patch doesn't remember that it's executable, but subversion will. But, sure, fixing that is fine too. > I thought that is all i changed. Did you 'svn revert -R .' and make sure that 'svn stat' reported nothing before applying my patch and making new mods? Yes, I think what may have happened is I uploaded an older patch.
This time I did a revert, applied This got assigned to by mistake. I haven't followed this jira closely till now.
Raghu Angadi made changes - 01/May/08 09:55 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381278/patch6.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/testReport/ This message is automatically generated. Hi Doug, The following is the compile error that happens when running with Hudson - I don't understand why it's having this problem. Can you look at it? Thanks for your help with this. pete ------------------- BUILD FAILED > I don't understand why it's having this problem.
The problem is that the "jar" and "package" targets are failing in fuse-dfs.
Doug Cutting made changes - 02/May/08 09:44 PM
Somehow you lost my 'if="compile.c++" addition to the compile-libhdfs target in build.xml again. I re-added that, updated the README, and added "jar" and "package" targets to make Hudson happier.
This now builds for me. I also tested it. I was able to mount an HDFS filesystem list directories, and read files.
Doug Cutting made changes - 02/May/08 09:48 PM
Doug Cutting made changes - 02/May/08 09:48 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381349/HADOOP-4.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/testReport/ This message is automatically generated. Comments on the latest patch:
a) Rather than use LD_LIBRARY_PATH, would it be better to set a runtime link path that used $ORIGIN?
b) what happens if root is part of the super group? @Allen
a) not sure what you mean here. Ideally, I'd like to use fuse_dfs_wrapper.sh in my fstab/automount lines, so it should have all env vars already set, if they can be derived at build or run time. b) This works fine. Permissions within fuse-dfs are a whole other kettle of fish, so I think I'll keep quiet until fuse-dfs is committed, then start another JIRA. It's just worth noting that if you want to share a fuse-dfs mount between multiple users, then the DFS permissions model will be broken. > -1 but only after I build libhdfs
The confusion is that libhdfs is now conditioned on compile.c+, but fuse-dfs does not, so it's possible to invoke ant at the top level in such a way that it will try to compile fuse-dfs without having compiled libhdfs. To fix this we should probably make fuse-dfs conditional on compile.c+ too. In any case, you need to specify compile.c++=1 for top-level builds of fuse-dfs to work. > -1 I think that there should be package target in fuse-dfs/build.xml that copies fuse-dfs stuff into $HADOOP_HOME/contrib/fuse-dfs Good idea. The fuse-dfs package target should copy things to ${dist.dir}/contrib/${name}. Who will update the patch? i added the package target and also made fuse dfs build dependent on both compile.c++ and fusedfs properties.
I also remove aclocal.m4 as that's a generated file.
Pete Wyckoff made changes - 05/May/08 11:32 PM
Pete Wyckoff made changes - 05/May/08 11:32 PM
Pete Wyckoff made changes - 05/May/08 11:33 PM
Pete Wyckoff made changes - 05/May/08 11:33 PM
Pete Wyckoff made changes - 05/May/08 11:34 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381459/HADOOP-4.patch against trunk revision 653638. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/testReport/ This message is automatically generated. my bad - added if for the package target.
Pete Wyckoff made changes - 06/May/08 12:58 AM
Doug Cutting made changes - 06/May/08 11:10 PM
Here's a new version that:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The intended commit location is http://svn.apache.org/repos/asf/lucene/hadoop/trunk/contrib/fuse
Please vote on this issue.