Details

    • Release Note:
      Hide
      Introduced FUSE module for HDFS. Module allows mount of HDFS as a Unix filesystem, and optionally the export of that mount point to other machines. Writes are disabled. rmdir, mv, mkdir, rm are supported, but not cp, touch, and the like. Usage information is attached to the Jira record.

      Show
      Introduced FUSE module for HDFS. Module allows mount of HDFS as a Unix filesystem, and optionally the export of that mount point to other machines. Writes are disabled. rmdir, mv, mkdir, rm are supported, but not cp, touch, and the like. Usage information is attached to the Jira record.

      Description

      This is a FUSE module for Hadoop's HDFS.

      It allows one to mount HDFS as a Unix filesystem and optionally export
      that mount point to other machines.

      rmdir, mv, mkdir, rm are all supported. just not cp, touch, ..., but actual writes require: https://issues.apache.org/jira/browse/HADOOP-3485

      For the most up-to-date documentation, see: http://wiki.apache.org/hadoop/MountableHDFS

      BUILDING:

      Requirements:

      1. a Linux kernel > 2.6.9 or a kernel module from FUSE - i.e., you
      compile it yourself and then modprobe it. Better off with the
      former option if possible. (Note for now if you use the kernel
      with fuse included, it doesn't allow you to export this through NFS
      so be warned. See the FUSE email list for more about this.)

      2. FUSE should be installed in /usr/local or FUSE_HOME ant
      environment variable

      To build:

      1. in HADOOP_HOME: ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1 -Dlibhdfs=1

      NOTE: for amd64 architecture, libhdfs will not compile unless you edit
      the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64
      (probably the same for others too).

      --------------------------------------------------------------------------------

      CONFIGURING:

      Look at all the paths in fuse_dfs_wrapper.sh and either correct them
      or set them in your environment before running. (note for automount
      and mount as root, you probably cannnot control the environment, so
      best to set them in the wrapper)

      INSTALLING:

      1. mkdir /mnt/dfs (or wherever you want to mount it)

      2. fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /mnt/dfs -d
      ; and from another terminal, try ls /mnt/dfs

      If 2 works, try again dropping the debug mode, i.e., -d

      (note - common problems are that you don't have libhdfs.so or
      libjvm.so or libfuse.so on your LD_LIBRARY_PATH, and your CLASSPATH
      does not contain hadoop and other required jars.)

      --------------------------------------------------------------------------------

      DEPLOYING:

      in a root shell do the following:

      1. add the following to /etc/fstab -
      fuse_dfs#dfs://hadoop_server.foo.com:9000 /mnt/dfs fuse
      allow_other,rw 0 0

      2. mount /mnt/dfs Expect problems with not finding fuse_dfs. You will
      need to probably add this to /sbin and then problems finding the
      above 3 libraries. Add these using ldconfig.

      --------------------------------------------------------------------------------

      EXPORTING:

      Add the following to /etc/exports:

      /mnt/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)

      NOTE - you cannot export this with a FUSE module built into the kernel

      • e.g., kernel 2.6.17. For info on this, refer to the FUSE wiki.
        --------------------------------------------------------------------------------

      ADVANCED:

      you may want to ensure certain directories cannot be deleted from the
      shell until the FS has permissions. You can set this in the build.xml
      file in src/contrib/fuse-dfs/build.xml

      1. patch6.txt
        94 kB
        Pete Wyckoff
      2. patch6.txt
        94 kB
        Pete Wyckoff
      3. patch5.txt
        94 kB
        Pete Wyckoff
      4. patch4.txt
        94 kB
        Pete Wyckoff
      5. patch4.txt
        94 kB
        Pete Wyckoff
      6. patch4.txt
        94 kB
        Pete Wyckoff
      7. patch3.txt
        97 kB
        Pete Wyckoff
      8. patch2.txt
        61 kB
        Pete Wyckoff
      9. patch.txt
        80 kB
        Pete Wyckoff
      10. patch.txt
        61 kB
        Pete Wyckoff
      11. Makefile
        0.2 kB
        Sami Siren
      12. HADOOP-4.patch
        95 kB
        Doug Cutting
      13. HADOOP-4.patch
        95 kB
        Doug Cutting
      14. HADOOP-4.patch
        95 kB
        Doug Cutting
      15. HADOOP-4.patch
        63 kB
        Pete Wyckoff
      16. HADOOP-4.patch
        63 kB
        Pete Wyckoff
      17. HADOOP-4.patch
        64 kB
        Pete Wyckoff
      18. HADOOP-4.patch
        65 kB
        Doug Cutting
      19. fuse-j-hadoopfs-03.tar.gz
        11 kB
        Anurag Sharma
      20. fuse-hadoop-0.1.1.tar.gz
        5 kB
        John Xing
      21. fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz
        27 kB
        Nguyen Quoc Mai
      22. fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz
        27 kB
        Nguyen Quoc Mai
      23. fuse-dfs.tar.gz
        5 kB
        Pete Wyckoff
      24. fuse-dfs.tar.gz
        5 kB
        Pete Wyckoff
      25. fuse-dfs.tar.gz
        112 kB
        Pete Wyckoff
      26. fuse-dfs.tar.gz
        112 kB
        Pete Wyckoff
      27. fuse-dfs.tar.gz
        172 kB
        Pete Wyckoff
      28. fuse_dfs.tar.gz
        21 kB
        Pete Wyckoff
      29. fuse_dfs.sh
        0.6 kB
        Craig Macdonald
      30. fuse_dfs.c
        16 kB
        Pete Wyckoff
      31. fuse_dfs.c
        23 kB
        Pete Wyckoff
      32. fuse_dfs.c
        23 kB
        Pete Wyckoff
      33. fuse_dfs.c
        23 kB
        Pete Wyckoff
      34. fuse_dfs.c
        25 kB
        Craig Macdonald

        Issue Links

          Activity

          Hide
          John Xing added a comment -

          It works with new hadoop project now (tarball attached).
          The intended commit location is http://svn.apache.org/repos/asf/lucene/hadoop/trunk/contrib/fuse
          Please vote on this issue.

          Show
          John Xing added a comment - It works with new hadoop project now (tarball attached). The intended commit location is http://svn.apache.org/repos/asf/lucene/hadoop/trunk/contrib/fuse Please vote on this issue.
          Hide
          Nguyen Quoc Mai added a comment -

          I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.2.3 and HADOOP.0.5.0

          Enjoy

          Show
          Nguyen Quoc Mai added a comment - I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.2.3 and HADOOP.0.5.0 Enjoy
          Hide
          Nguyen Quoc Mai added a comment -

          I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.4 and HADOOP.0.5.0

          Enjoy

          Show
          Nguyen Quoc Mai added a comment - I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with FUSE-J.2.4 and HADOOP.0.5.0 Enjoy
          Hide
          Nguyen Quoc Mai added a comment -

          I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with HADOOP.0.5.0

          Enjoy

          Show
          Nguyen Quoc Mai added a comment - I changed the code for fuse-hadoop. This is a working version for mounting DFS to linux file system. This version works fine with HADOOP.0.5.0 Enjoy
          Hide
          Doug Cutting added a comment -

          I'd like to be able to commit this to the contrib tree, but it could use a bit more polish. Unfortunately we cannot commit the fuse-j jar file, since it is released under the LGPL. What would be great is a build.xml that downloaded and built fuse-j.

          The README could also include more info on installing fuse.

          On Ubuntu I was able to do this with:

          sudo apt-get install fuse-utils libfuse2 libfuse-devel

          If you've never built things on your Ubuntu box then you might also need:

          sudo apt-get install make gcc libc6-dev

          Show
          Doug Cutting added a comment - I'd like to be able to commit this to the contrib tree, but it could use a bit more polish. Unfortunately we cannot commit the fuse-j jar file, since it is released under the LGPL. What would be great is a build.xml that downloaded and built fuse-j. The README could also include more info on installing fuse. On Ubuntu I was able to do this with: sudo apt-get install fuse-utils libfuse2 libfuse-devel If you've never built things on your Ubuntu box then you might also need: sudo apt-get install make gcc libc6-dev
          Hide
          Doug Cutting added a comment -

          There are already tools to mount WebDAV on Windows, MacOS and Linux. So HADOOP-496 will provide a more universal solution for this issue. I'd like to resolve this as a duplicate of that issue. Does anyone object?

          Show
          Doug Cutting added a comment - There are already tools to mount WebDAV on Windows, MacOS and Linux. So HADOOP-496 will provide a more universal solution for this issue. I'd like to resolve this as a duplicate of that issue. Does anyone object?
          Hide
          Sami Siren added a comment -

          Has anyone made any comparison (performance or other) of these two different approaches. It lloks like the webdav version (were dfs client is in servlet) introduces one more network hop compared to fuse version (where the dfs client is in machine using the dfs).

          One way or the other I could easily satisfy my mounting needs with any one of these solutions - so hopefully one will be integrated (so it gets maintenance attention).

          Show
          Sami Siren added a comment - Has anyone made any comparison (performance or other) of these two different approaches. It lloks like the webdav version (were dfs client is in servlet) introduces one more network hop compared to fuse version (where the dfs client is in machine using the dfs). One way or the other I could easily satisfy my mounting needs with any one of these solutions - so hopefully one will be integrated (so it gets maintenance attention).
          Hide
          Doug Cutting added a comment -

          I'm cancelling this patch to remove it from the queue of patches that we intend to immenently apply to the current trunk. This is still useful stuff, and, depending on what happens with webdav, we may still decide to integrate it, so the issue should remain open for now.

          Show
          Doug Cutting added a comment - I'm cancelling this patch to remove it from the queue of patches that we intend to immenently apply to the current trunk. This is still useful stuff, and, depending on what happens with webdav, we may still decide to integrate it, so the issue should remain open for now.
          Hide
          Anurag Sharma added a comment -

          Hello,
          We posted this on HADOOP-496 and were pointed to this jira entry as a better place to post this patch. Pasting our original submission message below...

          --------------------------------------
          Hi,

          We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...).

          The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple.

          We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know.

          Attachments to follow...

          -thanks
          --------------------------------------

          Attachments include the following:

          • fuse-j-hadoop package
          • fuse-j patch.
          Show
          Anurag Sharma added a comment - Hello, We posted this on HADOOP-496 and were pointed to this jira entry as a better place to post this patch. Pasting our original submission message below... -------------------------------------- Hi, We revived the old fuse-hadoop project (a FUSE-J based plugin that lets you mount Hadoop-FS). We have tried this on a small cluster (10 nodes) and basic functionality works (mount, ls, cat,cp, mkdir, rm, mv, ...). The main changes include some bug fixes to FUSE-J and changing the previous fuse-hadoop implementation to enforce write-once. We found the FUSE framework to be straightforward and simple. We have seen several mentions of using FUSE with Hadoop, so if there is a better place to post these files, please let me know. Attachments to follow... -thanks -------------------------------------- Attachments include the following: fuse-j-hadoop package fuse-j patch.
          Hide
          Doug Cutting added a comment -

          It is unfortunate that this requires patches to the fuse-j sources. The fuse-j project does not appear to be active, so I don't see much point in trying to submit these as a patch there.

          We cannot host GPL'd code or even patches to GPL'd code at Apache. We can however have optional commands in our build scripts that download GPL'd code. So this could be structured as a contrib module whose build.xml, when a non-default build option is specified, downloads, patches and compiles fuse-j. But since the patch to fuse-j itself cannot be hosted at Apache, it might be simpler to just create a jar file for the patched version, host that somewhere, and bypass the patch and compile steps. It would be great to get this into a form that we can commit, so that it stays synchronized with the rest of Hadoop.

          Show
          Doug Cutting added a comment - It is unfortunate that this requires patches to the fuse-j sources. The fuse-j project does not appear to be active, so I don't see much point in trying to submit these as a patch there. We cannot host GPL'd code or even patches to GPL'd code at Apache. We can however have optional commands in our build scripts that download GPL'd code. So this could be structured as a contrib module whose build.xml, when a non-default build option is specified, downloads, patches and compiles fuse-j. But since the patch to fuse-j itself cannot be hosted at Apache, it might be simpler to just create a jar file for the patched version, host that somewhere, and bypass the patch and compile steps. It would be great to get this into a form that we can commit, so that it stays synchronized with the rest of Hadoop.
          Hide
          Pete Wyckoff added a comment -

          I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module.

          thus far I have only implemented read-only and everything works fine. It's been up for about a month with low use, but no problems.

          pete

          Show
          Pete Wyckoff added a comment - I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module. thus far I have only implemented read-only and everything works fine. It's been up for about a month with low use, but no problems. pete
          Hide
          Doug Cutting added a comment -

          > I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module.

          Yes, you noted that in HADOOP-496. Can you post a patch that implements this?

          Show
          Doug Cutting added a comment - > I think I commented on another JIRA that I have implemented a dfs fuse module for the straight C fuse module. Yes, you noted that in HADOOP-496 . Can you post a patch that implements this?
          Hide
          Pete Wyckoff added a comment -

          Yes but need a couple of days to get our build guy to make the Makefile.am less facebook-centric.

          Show
          Pete Wyckoff added a comment - Yes but need a couple of days to get our build guy to make the Makefile.am less facebook-centric.
          Hide
          Anurag Sharma added a comment -

          hi Doug,

          Thanks for pointing out this issue. I will remove the FUSE-J patch and try one of the other routes you suggested (to have a patched FUSE-J available), and will come back with a resolution on this very soon.

          -anurag

          Show
          Anurag Sharma added a comment - hi Doug, Thanks for pointing out this issue. I will remove the FUSE-J patch and try one of the other routes you suggested (to have a patched FUSE-J available), and will come back with a resolution on this very soon. -anurag
          Hide
          Anurag Sharma added a comment -

          Hi Doug,

          I went through the license for Fuse-J and it is distributed under LGPL, do you think that would allow the Fuse-J patches to be hosted on Apache?

          (In the latter case we would still modify the submission above to be a contrib module that downloads Fuse-J, applies our patch, and builds it, except we won't have to find a place to host the patch).

          -thanks
          -anurag

          Show
          Anurag Sharma added a comment - Hi Doug, I went through the license for Fuse-J and it is distributed under LGPL, do you think that would allow the Fuse-J patches to be hosted on Apache? (In the latter case we would still modify the submission above to be a contrib module that downloads Fuse-J, applies our patch, and builds it, except we won't have to find a place to host the patch). -thanks -anurag
          Hide
          Doug Cutting added a comment -

          > I went through the license for Fuse-J and it is distributed under LGPL,

          Unfortunately, the ASF cannot host things published under LGPL either. Sorry!

          Show
          Doug Cutting added a comment - > I went through the license for Fuse-J and it is distributed under LGPL, Unfortunately, the ASF cannot host things published under LGPL either. Sorry!
          Hide
          Anurag Sharma added a comment -

          hi Doug. ok :- ), we will follow one of the alternate options you suggested of hosting either the patch or the jar file ourselves, and fixing the fuse-j-hadoop package build to work with this. Will re-submit our changes soon.
          -thanks,
          -anurag

          Show
          Anurag Sharma added a comment - hi Doug. ok :- ), we will follow one of the alternate options you suggested of hosting either the patch or the jar file ourselves, and fixing the fuse-j-hadoop package build to work with this. Will re-submit our changes soon. -thanks, -anurag
          Hide
          Pete Wyckoff added a comment -

          Here's the source. I will attach the Makefiles and full tar tomorrow or Wed morning.

          Show
          Pete Wyckoff added a comment - Here's the source. I will attach the Makefiles and full tar tomorrow or Wed morning.
          Hide
          Anurag Sharma added a comment - - edited

          hi,
          We re-submitted the fuse-j-hadoopfs package with the following changes (as suggested above):

          • we are hosting a patched FUSE-J on a separate server.
          • the fuse-j-hadoopfs build downloads this patched version at compile time.

          We restructured the fuse-j-hadoopfs build to be a contrib, and have tested it with the Hadoop source-tree build.

          The fuse-j-hadoopfs build is a no-op when a standard "compile" target is specified. To actually build fuse-j-hadoopfs, the user has to specify the following command line: "ant compile -Dbuild-fuse-j-hadoopfs=1".

          We still have the following todo's remaining:

          • Pick up some environment variables dynamically, so the user doesn't have to set them in our build.properties file (these do not affect the no-op build).
          • Change the 'hadoopfs_fuse_mount.sh' script to use the 'hadoop/bin' scripts, so it can automatically pick up hadoop-specific conf, jar and class files.

          The above tarball (fuse-j-hadoopfs-03.tar.gz) consists of a directory that can be placed inside "hadoop/src/contrib", please let us know if we should submit this as a patch-file instead, or if we need to make more changes...

          -thanks

          Show
          Anurag Sharma added a comment - - edited hi, We re-submitted the fuse-j-hadoopfs package with the following changes (as suggested above): we are hosting a patched FUSE-J on a separate server. the fuse-j-hadoopfs build downloads this patched version at compile time. We restructured the fuse-j-hadoopfs build to be a contrib, and have tested it with the Hadoop source-tree build. The fuse-j-hadoopfs build is a no-op when a standard "compile" target is specified. To actually build fuse-j-hadoopfs, the user has to specify the following command line: "ant compile -Dbuild-fuse-j-hadoopfs=1". We still have the following todo's remaining: Pick up some environment variables dynamically, so the user doesn't have to set them in our build.properties file (these do not affect the no-op build). Change the 'hadoopfs_fuse_mount.sh' script to use the 'hadoop/bin' scripts, so it can automatically pick up hadoop-specific conf, jar and class files. The above tarball (fuse-j-hadoopfs-03.tar.gz) consists of a directory that can be placed inside "hadoop/src/contrib", please let us know if we should submit this as a patch-file instead, or if we need to make more changes... -thanks
          Hide
          Pete Wyckoff added a comment -

          Here's the same code (w/o the closelog after the return in main ) and the Makefile.am - sorry I haven't been able to get to this and making a nice autoconf. But, it is easy to build:

          0. install hadoop 14.x
          1. build and install latest fuse
          2. compile fuse_dfs.c with 2 includes - one for fuse and one for hadoop's hdfs.h and their libraries as well.
          3. run it ./fuse_dfs -debug -server <hadoopnn> -port <hadoopnnport> /mnt/hdfs -d -o allow_other

          Obviously, hadoop needs to be in your class path and your ld library path needs the fuse and hdfs .sos

          When ready for production remove the -debug (which is an option to fuse-dfs for deubgging) and the -d which is the fuse debugging option.

          On my return from vacation, I will make better docs and autoconf.

          Note again, this is read only, but it is really easy to implement writes.

          – pete

          Show
          Pete Wyckoff added a comment - Here's the same code (w/o the closelog after the return in main ) and the Makefile.am - sorry I haven't been able to get to this and making a nice autoconf. But, it is easy to build: 0. install hadoop 14.x 1. build and install latest fuse 2. compile fuse_dfs.c with 2 includes - one for fuse and one for hadoop's hdfs.h and their libraries as well. 3. run it ./fuse_dfs -debug -server <hadoopnn> -port <hadoopnnport> /mnt/hdfs -d -o allow_other Obviously, hadoop needs to be in your class path and your ld library path needs the fuse and hdfs .sos When ready for production remove the -debug (which is an option to fuse-dfs for deubgging) and the -d which is the fuse debugging option. On my return from vacation, I will make better docs and autoconf. Note again, this is read only, but it is really easy to implement writes. – pete
          Hide
          Pete Wyckoff added a comment -

          I implemented mkdir, mv, rm and rmdir. But, since the programmatic API doesn't use the Trash feature, this is a pretty big problem.

          2 solutions:

          1. have fuse dfs rename things into /Trash
          2. make the programmatic API use the Trash. Which is more general.

          I'm just wondering why the programmatic API never used trash in the first place??

          – pete

          Show
          Pete Wyckoff added a comment - I implemented mkdir, mv, rm and rmdir. But, since the programmatic API doesn't use the Trash feature, this is a pretty big problem. 2 solutions: 1. have fuse dfs rename things into /Trash 2. make the programmatic API use the Trash. Which is more general. I'm just wondering why the programmatic API never used trash in the first place?? – pete
          Hide
          Doug Cutting added a comment -

          > I'm just wondering why the programmatic API never used trash in the first place??

          Because most other programmatic APIs don't. Unix 'rm' does not use the trash, nor does the unlink system call, nor does DOS command line, etc. Trash is usually only a feature of a user-interface, like a GUI and command shells designed for interactive use.

          Show
          Doug Cutting added a comment - > I'm just wondering why the programmatic API never used trash in the first place?? Because most other programmatic APIs don't. Unix 'rm' does not use the trash, nor does the unlink system call, nor does DOS command line, etc. Trash is usually only a feature of a user-interface, like a GUI and command shells designed for interactive use.
          Hide
          Pete Wyckoff added a comment -

          The hadoop API is not POSIX-compliant. Wouldn't it be better to protect people from what is catastrophic data loss?

          Show
          Pete Wyckoff added a comment - The hadoop API is not POSIX-compliant. Wouldn't it be better to protect people from what is catastrophic data loss?
          Hide
          Pete Wyckoff added a comment -

          newer version which supports mkdir, rmdir, mv, and rm. NOTE - this is still a work in progress. Any help appreciated

          Show
          Pete Wyckoff added a comment - newer version which supports mkdir, rmdir, mv, and rm. NOTE - this is still a work in progress. Any help appreciated
          Hide
          Sami Siren added a comment -

          Since a) the latest patches from Pete are not depending on fuse-j and b) fuse is now part of linux kernel (2.6.14 and later) It would be really nice to get this into hadoop proper.

          Before that it would be nice to try this out so i am begging for some more information on how to compile this
          I downloaded fuse-dfs.tgz and there were two files fuse_dfs.c and Makefile.am , with what commands do I generate the makefile, tried many starting with auto* but none were succesfull.

          Show
          Sami Siren added a comment - Since a) the latest patches from Pete are not depending on fuse-j and b) fuse is now part of linux kernel (2.6.14 and later) It would be really nice to get this into hadoop proper. Before that it would be nice to try this out so i am begging for some more information on how to compile this I downloaded fuse-dfs.tgz and there were two files fuse_dfs.c and Makefile.am , with what commands do I generate the makefile, tried many starting with auto* but none were succesfull.
          Hide
          Sami Siren added a comment -

          I did some testing with the c version of this tool (and managed to write a working makefile . For some reason it only worked for me when running it with fuse debug on, anyone else seen this? Other than that it worked great.

          Show
          Sami Siren added a comment - I did some testing with the c version of this tool (and managed to write a working makefile . For some reason it only worked for me when running it with fuse debug on, anyone else seen this? Other than that it worked great.
          Hide
          Jason added a comment -

          I have seen the debug issue myself, I think it is related to fuse more
          than the fuse_dfs.c, as i also see that with some of my ssh mounts to
          some systems.

          Show
          Jason added a comment - I have seen the debug issue myself, I think it is related to fuse more than the fuse_dfs.c, as i also see that with some of my ssh mounts to some systems.
          Hide
          Pete Wyckoff added a comment -

          Sami - can you attach the Makefile?

          Show
          Pete Wyckoff added a comment - Sami - can you attach the Makefile?
          Hide
          Pete Wyckoff added a comment -

          should have mentioned the problem with mine is that it is generated with some custom auto make macros that we use here.

          Show
          Pete Wyckoff added a comment - should have mentioned the problem with mine is that it is generated with some custom auto make macros that we use here.
          Hide
          Sami Siren added a comment -

          Here's the makefile I used. One should perhaps adapt to the model that the other c* modules are using.

          Show
          Sami Siren added a comment - Here's the makefile I used. One should perhaps adapt to the model that the other c* modules are using.
          Hide
          Pete Wyckoff added a comment -

          Here's a version with configure to make things easier.

          Just run bootstrap.sh after setting the right hdfs, fuse and jdk paths.

          I still haven't added a version # (next on my list) or better docs (next next on my list).

          — pete

          Show
          Pete Wyckoff added a comment - Here's a version with configure to make things easier. Just run bootstrap.sh after setting the right hdfs, fuse and jdk paths. I still haven't added a version # (next on my list) or better docs (next next on my list). — pete
          Hide
          Pete Wyckoff added a comment -

          Includes version # from the configure.ac into fuse_dfs.c so fuse_dfs --version prints it out. And also include better README documentation.

          The package is pretty much complete now other than implementing writes! If someone has 0.15+ Hadoop installed and wants to work on it, that would be great. 15 because you can see file creates before they are closed which is needed for the implementation. I only have 0.14.2 right now.

          This newest version should be pretty self explanatory.
          – pete

          Show
          Pete Wyckoff added a comment - Includes version # from the configure.ac into fuse_dfs.c so fuse_dfs --version prints it out. And also include better README documentation. The package is pretty much complete now other than implementing writes! If someone has 0.15+ Hadoop installed and wants to work on it, that would be great. 15 because you can see file creates before they are closed which is needed for the implementation. I only have 0.14.2 right now. This newest version should be pretty self explanatory. – pete
          Hide
          Craig Macdonald added a comment -

          Pete,

          I have been experimenting with fuse_dfs.c and have a few questions:

          (1) I am using a previous version of fuse_dfs.c, mainly because I dont have bootstrap.sh. However, with respect to the new fuse_dfs.c option parsing - is this compatible with calling via mount.fuse, and autofs?

          This how I currently mount using an autofs map containing:

          hdfs            -fstype=fuse,rw,nodev,nonempty,noatime,allow_other  :/path/to/fuse_dfs_moutn/fuse_dfs.sh\#dfs\://namenode\:9000
          

          fuse_dfs.sh is just a shell script setting the CLASSPATH and LD_LIBRARY_PATH, and essentially, just execs the fuse_dfs. If I changed to the more recent version, I would probably have to put the dfs://namenode:9000 configuration into the script I think.

          (2) Have you done any sort of performance testing? I'm experimenting with HDFS for use in a mixed envionment (hadoop and non-hadoop jobs), and the throughput I see is miserable. For example, I use a test network of 8 P3-1GHz nodes, and a similar client on 100meg network.

          Below, I compare cat-ing a 512MB file from (a) an NFS mount on the same network as the cluster nodes (b) using the hadoop frontend and (c) using the FUSE HDFS filesystem.

          # (a)
          $ time cat /mnt/tmp/data.df > /dev/null
          
          real 0m47.280s
          user 0m0.059s
          sys 0m2.476s
          
          # (b)
          $ time bin/hadoop fs -cat hdfs:///user/craigm/data.df > /dev/null
          
          real 0m48.839s
          user 0m16.256s
          sys 0m7.001s
          
          # (c)
          $ time cat /misc/hdfs/user/craigm/data.df >/dev/null
          
          real    1m41.686s
          user    0m0.135s
          sys     0m2.302s
          

          Note that the NFS and Hadoop fs -cat obtain about 10.5MB/sec, while the hdfs fuse mount (in /misc/hdfs) achieves only 5MB/sec. Is this an expected overhead for FUSE?

          I did try tuning rd_buf_size to match the size of reads that the kernel was requesting - ie 128KB instead of 32KB, however this made matters worse:

          # with 128KB buffer size
          $ time cat /misc/hdfs/user/craigm/data.df >/dev/null
          
          real    2m11.080s
          user    0m0.113s
          sys     0m2.180s
          

          Perhaps an option would be to keep the HDFS file open between reads and timeout the connection when not used, or something; read more than we need and then keep it in the memory? Both would overly complicate the neat code though!

          (3) If I use an autofs for hdfs, then mounts will timeout quickly (30 seconds), and then reconnect again on demand. Perhaps fuse_dfs.c can implement the destroy fuse operation to free up the connection to the namenode, etc?

          Cheers

          Craig

          Show
          Craig Macdonald added a comment - Pete, I have been experimenting with fuse_dfs.c and have a few questions: (1) I am using a previous version of fuse_dfs.c, mainly because I dont have bootstrap.sh. However, with respect to the new fuse_dfs.c option parsing - is this compatible with calling via mount.fuse, and autofs? This how I currently mount using an autofs map containing: hdfs -fstype=fuse,rw,nodev,nonempty,noatime,allow_other :/path/to/fuse_dfs_moutn/fuse_dfs.sh\#dfs\: //namenode\:9000 fuse_dfs.sh is just a shell script setting the CLASSPATH and LD_LIBRARY_PATH, and essentially, just execs the fuse_dfs. If I changed to the more recent version, I would probably have to put the dfs://namenode:9000 configuration into the script I think. (2) Have you done any sort of performance testing? I'm experimenting with HDFS for use in a mixed envionment (hadoop and non-hadoop jobs), and the throughput I see is miserable. For example, I use a test network of 8 P3-1GHz nodes, and a similar client on 100meg network. Below, I compare cat-ing a 512MB file from (a) an NFS mount on the same network as the cluster nodes (b) using the hadoop frontend and (c) using the FUSE HDFS filesystem. # (a) $ time cat /mnt/tmp/data.df > /dev/null real 0m47.280s user 0m0.059s sys 0m2.476s # (b) $ time bin/hadoop fs -cat hdfs:///user/craigm/data.df > /dev/null real 0m48.839s user 0m16.256s sys 0m7.001s # (c) $ time cat /misc/hdfs/user/craigm/data.df >/dev/null real 1m41.686s user 0m0.135s sys 0m2.302s Note that the NFS and Hadoop fs -cat obtain about 10.5MB/sec, while the hdfs fuse mount (in /misc/hdfs) achieves only 5MB/sec. Is this an expected overhead for FUSE? I did try tuning rd_buf_size to match the size of reads that the kernel was requesting - ie 128KB instead of 32KB, however this made matters worse: # with 128KB buffer size $ time cat /misc/hdfs/user/craigm/data.df >/dev/null real 2m11.080s user 0m0.113s sys 0m2.180s Perhaps an option would be to keep the HDFS file open between reads and timeout the connection when not used, or something; read more than we need and then keep it in the memory? Both would overly complicate the neat code though! (3) If I use an autofs for hdfs, then mounts will timeout quickly (30 seconds), and then reconnect again on demand. Perhaps fuse_dfs.c can implement the destroy fuse operation to free up the connection to the namenode, etc? Cheers Craig
          Hide
          Pete Wyckoff added a comment -

          Hi Craig,

          I may be using an older fuse? I have #dfs#dfs://hadoopNN01.facebook.com:9000 /mnt/hdfs fuse allow_other,rw 0 0

          For performance, it's really fast when there's 0 load on the namenode, but start running a couple of jobs and it gets killed. Obviously because one operation may require multiple fuse calls and thus multiple dfs calls versus the single one for bin/hadoop dfs. I'm noticing that all the critical sections in the namenode do things like logging to the debug log IN the critical section. And since everything locks fsRoot, just one job can really hose a real-time system. I don't see anything to do here other than trying to fix these after verifying with jconsole this is the problem.

          And I will add the autofs destroy function this week.

          So, what do you think I should do for #1, revert to the older configuration code? (which was way nicer anyway). Do you have your bash script so I can use it too?

          thanks, pete

          Show
          Pete Wyckoff added a comment - Hi Craig, I may be using an older fuse? I have #dfs#dfs://hadoopNN01.facebook.com:9000 /mnt/hdfs fuse allow_other,rw 0 0 For performance, it's really fast when there's 0 load on the namenode, but start running a couple of jobs and it gets killed. Obviously because one operation may require multiple fuse calls and thus multiple dfs calls versus the single one for bin/hadoop dfs. I'm noticing that all the critical sections in the namenode do things like logging to the debug log IN the critical section. And since everything locks fsRoot, just one job can really hose a real-time system. I don't see anything to do here other than trying to fix these after verifying with jconsole this is the problem. And I will add the autofs destroy function this week. So, what do you think I should do for #1, revert to the older configuration code? (which was way nicer anyway). Do you have your bash script so I can use it too? thanks, pete
          Hide
          Craig Macdonald added a comment - - edited

          For #1, I think thats an fstab line, not an autofs line, perhaps? I'm not sure the fuse version will make any difference, mount will only pass through the source path, the dest path and the -o options. Would it be possible to keep all options for fuse_dfs.c in -o options? On the other hand, if a shell script is always needed to set the CLASSPATH and LD_LIBRARY_PATH options, then it is not as important how the options are set from fstab or autofs. (LD_LIBRARY_PATH could be fixed bya /etc/ld.conf.d entry)

          My test had no load on the namenode, and usage of the namenode CPU looks low. In contrast, the fuse_dcs CPU usage was high.

          I would like to profile or jconsole the fuse_dfs binary, but the getJNIEnv() method in src/c++/libhdfs/hdfsJniHelper.c could be a bit more helpful for passing agument to the JVM initialisation. Essentially, it only allows fills in -Djava.class.path from the CLASSPATH to be set, not any other arbitrary system properties or -X options etc, bah. Reported separately as HADOOP-2857.

          Will attach bash script shortly.

          C

          Show
          Craig Macdonald added a comment - - edited For #1, I think thats an fstab line, not an autofs line, perhaps? I'm not sure the fuse version will make any difference, mount will only pass through the source path, the dest path and the -o options. Would it be possible to keep all options for fuse_dfs.c in -o options? On the other hand, if a shell script is always needed to set the CLASSPATH and LD_LIBRARY_PATH options, then it is not as important how the options are set from fstab or autofs. (LD_LIBRARY_PATH could be fixed bya /etc/ld.conf.d entry) My test had no load on the namenode, and usage of the namenode CPU looks low. In contrast, the fuse_dcs CPU usage was high. I would like to profile or jconsole the fuse_dfs binary, but the getJNIEnv() method in src/c++/libhdfs/hdfsJniHelper.c could be a bit more helpful for passing agument to the JVM initialisation. Essentially, it only allows fills in -Djava.class.path from the CLASSPATH to be set, not any other arbitrary system properties or -X options etc, bah. Reported separately as HADOOP-2857 . Will attach bash script shortly. C
          Hide
          Craig Macdonald added a comment -

          Shell script to set LD_LIBRARY_PATH and CLASSPATH when called from mount.fuse (which can be called from mount, and hence from autofs etc).

          Show
          Craig Macdonald added a comment - Shell script to set LD_LIBRARY_PATH and CLASSPATH when called from mount.fuse (which can be called from mount, and hence from autofs etc).
          Hide
          Pete Wyckoff added a comment -

          I made some changes - mainly caching the file handle in fi->fh and it's performing much better now. I'm attaching the new one.

          – pete

          Show
          Pete Wyckoff added a comment - I made some changes - mainly caching the file handle in fi->fh and it's performing much better now. I'm attaching the new one. – pete
          Hide
          Pete Wyckoff added a comment -

          newest fuse dfs which should perofrm better on reads. I'm seeing this on a cluster in use:

          > time cat part-00000 > /dev/null

          real 1m25.078s
          user 0m0.056s
          sys 0m1.514s
          > du -kh part-00000
          1.1G part-00000

          Show
          Pete Wyckoff added a comment - newest fuse dfs which should perofrm better on reads. I'm seeing this on a cluster in use: > time cat part-00000 > /dev/null real 1m25.078s user 0m0.056s sys 0m1.514s > du -kh part-00000 1.1G part-00000
          Hide
          Pete Wyckoff added a comment -

          Hi Craig,

          I'm setting fi->fh in open as you suggest. You're right I didn't look at pread and tell here - good point.

          – pete

          Show
          Pete Wyckoff added a comment - Hi Craig, I'm setting fi->fh in open as you suggest. You're right I didn't look at pread and tell here - good point. – pete
          Hide
          Pete Wyckoff added a comment -

          I added the destroy method. Also looked and pread should be correct.

          I'm hoping to test with 0.16.x soon and see if writes work!

          – pete

          Show
          Pete Wyckoff added a comment - I added the destroy method. Also looked and pread should be correct. I'm hoping to test with 0.16.x soon and see if writes work! – pete
          Hide
          Pete Wyckoff added a comment -

          version 0.2.0 includes better read and in theory writes but they won't work wi/o hadoop 0.16 and I can't test. Obviously, the writes have to be append only. And I'm still not sure what the semantics are as far as block size.

          Show
          Pete Wyckoff added a comment - version 0.2.0 includes better read and in theory writes but they won't work wi/o hadoop 0.16 and I can't test. Obviously, the writes have to be append only. And I'm still not sure what the semantics are as far as block size.
          Hide
          Pete Wyckoff added a comment -

          should have been this one that includes init method

          Show
          Pete Wyckoff added a comment - should have been this one that includes init method
          Hide
          Craig Macdonald added a comment -

          Hi Pete,

          Definently using the latest tar this time
          My first time using the new build system - looks good!

          Some comments:

          1. Firstly, I shouldnt have deleted my last comment - though it was clearly in error as I was reading the wrong version of fuse_dfs.c. In your comments, can you say which file you've just uploaded?

          For posterity, previous comment was:

          I will try the newer version tomorrow when @work. I note that fi->fh isnt used or set in dfs_read in your latest version. Could we set it in dfs_open for O_READONLY, and then use it if available?

          I'm not clear on the semantics of hdfsPread - does it assume that offset is after previous offset?
          If so then we need to check that the current read on a file is strictly after the previous read for a previously open FH to be of use - hdfsTell could be of use here.

          2. With respect to the read speed, this is indeed a bit faster in our test setting (nearer 6MB/sec), but not yet similar to the Hadoop fs shell (about 10.5MB/sec). Fuse version 2.7.2

          # time bin/hadoop fs -cat /user/craigm/data.df > /dev/null 
          
          real    0m50.347s
          user    0m16.023s
          sys     0m6.644s
          
          # time cat /misc/hdfs/user/craigm/data.df > /dev/null 
          
          real    1m31.263s
          user    0m0.131s
          sys     0m2.384s
          
          

          I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know how much CPU time it burns.

          Can I ask how your test time test compares to using the Hadoop fs shell on the same machine? When reading, the CPU on the client is used 45%ish, similar to the Hadoop fs shell CPU use.

          I feel it would be good to aim for similar performance as the Hadoop fs shell, as this seems reasonable compared to NFS in my test setting, and should scale better as the number of concurrent reads increases, given available wire bandwidth.

          3. With respect to the build system, it could be clearer what --with-dfspath= is meant to point to. src/Makefile.am seems to assume that include files are at $

          {dfspath}/include/linux and the hdfs.so at ${dfspath}

          /include/shared. This isnt how the Hadoop installation is laid out. Perhaps it would be better if we could give an option to the hadoop installation and it's taken from there?

          4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my shell script about guessing the locations of the JRE shared lib files.

          5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that was absolute - ie to your installation. Should be deleted when building tar file.

          6 (minor). update print_usage if you're happy with the specification of filesystem options. I made no changes to my shell script or my autofs mount for this version to work

          Cheers

          Craig

          Show
          Craig Macdonald added a comment - Hi Pete, Definently using the latest tar this time My first time using the new build system - looks good! Some comments: 1. Firstly, I shouldnt have deleted my last comment - though it was clearly in error as I was reading the wrong version of fuse_dfs.c. In your comments, can you say which file you've just uploaded? For posterity, previous comment was: I will try the newer version tomorrow when @work. I note that fi->fh isnt used or set in dfs_read in your latest version. Could we set it in dfs_open for O_READONLY, and then use it if available? I'm not clear on the semantics of hdfsPread - does it assume that offset is after previous offset? If so then we need to check that the current read on a file is strictly after the previous read for a previously open FH to be of use - hdfsTell could be of use here. 2. With respect to the read speed, this is indeed a bit faster in our test setting (nearer 6MB/sec), but not yet similar to the Hadoop fs shell (about 10.5MB/sec). Fuse version 2.7.2 # time bin/hadoop fs -cat /user/craigm/data.df > /dev/null real 0m50.347s user 0m16.023s sys 0m6.644s # time cat /misc/hdfs/user/craigm/data.df > /dev/null real 1m31.263s user 0m0.131s sys 0m2.384s I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know how much CPU time it burns. Can I ask how your test time test compares to using the Hadoop fs shell on the same machine? When reading, the CPU on the client is used 45%ish, similar to the Hadoop fs shell CPU use. I feel it would be good to aim for similar performance as the Hadoop fs shell, as this seems reasonable compared to NFS in my test setting, and should scale better as the number of concurrent reads increases, given available wire bandwidth. 3. With respect to the build system, it could be clearer what --with-dfspath= is meant to point to. src/Makefile.am seems to assume that include files are at $ {dfspath}/include/linux and the hdfs.so at ${dfspath} /include/shared. This isnt how the Hadoop installation is laid out. Perhaps it would be better if we could give an option to the hadoop installation and it's taken from there? 4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my shell script about guessing the locations of the JRE shared lib files. 5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that was absolute - ie to your installation. Should be deleted when building tar file. 6 (minor). update print_usage if you're happy with the specification of filesystem options. I made no changes to my shell script or my autofs mount for this version to work Cheers Craig
          Hide
          Rui Shi added a comment -

          Hi Pete,

          Could you please explain in more details why write can not be implemented before 0.16. And how we expect write to work with 0.16?

          Thanks a lot!

          Rui

          Show
          Rui Shi added a comment - Hi Pete, Could you please explain in more details why write can not be implemented before 0.16. And how we expect write to work with 0.16? Thanks a lot! Rui
          Hide
          Craig Macdonald added a comment -

          Pete,

          I have tried everything to profile fuse_dfs. Valgrind (callgrind) doesnt play with Sun Java, and I failed to get the GNU profiler to give any output. I wrote a patch for HADOOP-2857, but using this I cant get any stack traces from the Java profiler - it's as if no Java code is run!

          Craig

          Show
          Craig Macdonald added a comment - Pete, I have tried everything to profile fuse_dfs. Valgrind (callgrind) doesnt play with Sun Java, and I failed to get the GNU profiler to give any output. I wrote a patch for HADOOP-2857 , but using this I cant get any stack traces from the Java profiler - it's as if no Java code is run! Craig
          Hide
          Pete Wyckoff added a comment -

          Hi Rui,

          It's because of the order FUSE calls the implementation. The file is created with an open and then closed and then opened again.

          So in <16, that second open in write mode will fail I have tried hacking things so that the code doesn't do the close on the file in between and caches the filehandle and so fakes the second open and everything works fine.

          So, I think as long as appends are coming in, it should work with 16. I haven't looked at the implementation, but hopefully it buffers things till it gets to a full block and isn't creating small blocks all over the place If it does, it's easy enough to buffer in fuse (although 128MB is a big buffer).

          pete

          ps although we're on 15.3 at FB, I can't upgrade our test cluster to 16 yet as we want to install HBASE and need to try it there. I'm gonna try applying the hbase patch that the powerset guys nicely gave me soon and then if all goes well, I can upgrade our test cluster to 16 and test things.

          Show
          Pete Wyckoff added a comment - Hi Rui, It's because of the order FUSE calls the implementation. The file is created with an open and then closed and then opened again. So in <16, that second open in write mode will fail I have tried hacking things so that the code doesn't do the close on the file in between and caches the filehandle and so fakes the second open and everything works fine. So, I think as long as appends are coming in, it should work with 16. I haven't looked at the implementation, but hopefully it buffers things till it gets to a full block and isn't creating small blocks all over the place If it does, it's easy enough to buffer in fuse (although 128MB is a big buffer). pete ps although we're on 15.3 at FB, I can't upgrade our test cluster to 16 yet as we want to install HBASE and need to try it there. I'm gonna try applying the hbase patch that the powerset guys nicely gave me soon and then if all goes well, I can upgrade our test cluster to 16 and test things.
          Hide
          Pete Wyckoff added a comment -

          Hi Craig,

          I see now that the buffer size for the OS reads is only 128K and since there's no ioctl for fuse to bump it up, it's a problem. I think what we can do is create a fuse block device mount and then using blockdev -setblocksize 128M, we should see some real speedups.

          Unfortunately, fuse on my dev machine isn't configured properly for this. First there's no /dev/fuseblk which it assumes and even with that there I get a useless error message.

          I just want to verify with one of the kernel guys here how big the buffer can get before going to crazy. I'll ask one of them today and update this tomorrow.

          – pete

          Show
          Pete Wyckoff added a comment - Hi Craig, I see now that the buffer size for the OS reads is only 128K and since there's no ioctl for fuse to bump it up, it's a problem. I think what we can do is create a fuse block device mount and then using blockdev -setblocksize 128M, we should see some real speedups. Unfortunately, fuse on my dev machine isn't configured properly for this. First there's no /dev/fuseblk which it assumes and even with that there I get a useless error message. I just want to verify with one of the kernel guys here how big the buffer can get before going to crazy. I'll ask one of them today and update this tomorrow. – pete
          Hide
          Pete Wyckoff added a comment -

          Craig,

          I should mention I tried to get fuse to do more readahead than 128K, but setting that param didn't seem to do anything. I can probably play with this tomorrow. To be honest, I don't know exactly what it means when you configure fuse module as a block device since you also need to specify the mount point. Is fuse under the covers just doing the block device so we can do better ioctl? I mean there's no way to implement a real block device since we'd have to make it look like a real filesystem. But, fuse requires both the block device and the mount point.

          – pete

          Show
          Pete Wyckoff added a comment - Craig, I should mention I tried to get fuse to do more readahead than 128K, but setting that param didn't seem to do anything. I can probably play with this tomorrow. To be honest, I don't know exactly what it means when you configure fuse module as a block device since you also need to specify the mount point. Is fuse under the covers just doing the block device so we can do better ioctl? I mean there's no way to implement a real block device since we'd have to make it look like a real filesystem. But, fuse requires both the block device and the mount point. – pete
          Hide
          Craig Macdonald added a comment - - edited

          Hi Pete,

          The block stuff in fuse is appallingly documented. I have hunted the Web for info on this all afternoon, to understand it further. To be honest, the only thing I have found useful is reading the source of ntfs-3g.c at http://ntfs-3g.cvs.sourceforge.net/ntfs-3g/ntfs-3g/src/ntfs-3g.c?revision=1.106&view=markup

          I test I did do a few days ago was to comparing reading an NFS mounted file directly vs, the same file read via NFS via a FUSE fs - http://mattwork.potsdam.edu/projects/wiki/index.php/Rofs#rofs_code_.28C.29 (ROFS, the Read-Only Filesystem). Speed results were fairly comparable between NFS & NFS+ROFS, so it suggests that FUSE doesnt add too much overhead to IO. Hence then we can only suspect that the problem is in either (a) JNI interface, or (b) the size of the reads we're performing. A simple C tool can be generated to exclude (a).

          I dont have any objections to pretty large buffer sizes for fuse_dfs.c - HDFS is designed for large files, and streaming read access.

          Btw, you mentioned you are re-exporting the mounted FS as NFS - have you had any issues vs the issues described in fuses' README.NFS?

          Regards

          Craig

          Show
          Craig Macdonald added a comment - - edited Hi Pete, The block stuff in fuse is appallingly documented. I have hunted the Web for info on this all afternoon, to understand it further. To be honest, the only thing I have found useful is reading the source of ntfs-3g.c at http://ntfs-3g.cvs.sourceforge.net/ntfs-3g/ntfs-3g/src/ntfs-3g.c?revision=1.106&view=markup I test I did do a few days ago was to comparing reading an NFS mounted file directly vs, the same file read via NFS via a FUSE fs - http://mattwork.potsdam.edu/projects/wiki/index.php/Rofs#rofs_code_.28C.29 (ROFS, the Read-Only Filesystem). Speed results were fairly comparable between NFS & NFS+ROFS, so it suggests that FUSE doesnt add too much overhead to IO. Hence then we can only suspect that the problem is in either (a) JNI interface, or (b) the size of the reads we're performing. A simple C tool can be generated to exclude (a). I dont have any objections to pretty large buffer sizes for fuse_dfs.c - HDFS is designed for large files, and streaming read access. Btw, you mentioned you are re-exporting the mounted FS as NFS - have you had any issues vs the issues described in fuses' README.NFS? Regards Craig
          Hide
          Pete Wyckoff added a comment -

          I just talked to one of our kernel guys and he isn't 100% sure as he hasn't done that much IO stuff on Linux but thinks the 128K readahead may just be the maximum.

          we could always do like 1MB readaheads ourselves although that would complicate things - although not that much since we could keep the cached data with the open file handle so there's no dirty cache problems or garbage collection issues since we just dump it when we do the close. So, maybe that's the easiest way to go... I can probably look at that tomorrow or Thursday.

          pete

          Show
          Pete Wyckoff added a comment - I just talked to one of our kernel guys and he isn't 100% sure as he hasn't done that much IO stuff on Linux but thinks the 128K readahead may just be the maximum. we could always do like 1MB readaheads ourselves although that would complicate things - although not that much since we could keep the cached data with the open file handle so there's no dirty cache problems or garbage collection issues since we just dump it when we do the close. So, maybe that's the easiest way to go... I can probably look at that tomorrow or Thursday. pete
          Hide
          Rui Shi added a comment -

          Hi Pete,

          Thanks a lot for the explanation!

          But as I heard from the hdfs team, 0.16 still does not support file appending. Suppose appending is not supported, can we still try writing file in Fuse as the work around you described?

          Thanks,

          Rui

          Show
          Rui Shi added a comment - Hi Pete, Thanks a lot for the explanation! But as I heard from the hdfs team, 0.16 still does not support file appending. Suppose appending is not supported, can we still try writing file in Fuse as the work around you described? Thanks, Rui
          Hide
          Craig Macdonald added a comment -

          Hi Pete,

          Have you had a chance to look at FUSE readaheads? I have attached a version of fuse_dfs.c I have patched, which reads 10MB chunks from DFS, and cache these in the a struct held in the filehandle.

          I'm seeing some improvement (down to 1m 20 compared to "bin/hadoop dfs -cat file > /dev/null" which takes about 50 seconds). Increasing the buffer size shows some improvement [I only did some quick tests] - I tried up to 30MB, but I dont think there's much improvement over 5-10MB

          Do you think we're reaching the limit such that the overheads of JNI are making it impossible to go any faster? Ie Where do we go from here?

          Another comment I have is that the configure/makefile asks for a dfs_home. It might be easier to ask for Hadoop home, then build the appropriate paths from there ($

          {hadoop_home}/libhdfs and ${hadoop_home}

          /src/c++/libhdfs). Hadoop has no include/linux folders etc. Finally, we need a way to detect whether to use i386 or amd64 to find jvm.so

          Craig

          Show
          Craig Macdonald added a comment - Hi Pete, Have you had a chance to look at FUSE readaheads? I have attached a version of fuse_dfs.c I have patched, which reads 10MB chunks from DFS, and cache these in the a struct held in the filehandle. I'm seeing some improvement (down to 1m 20 compared to "bin/hadoop dfs -cat file > /dev/null" which takes about 50 seconds). Increasing the buffer size shows some improvement [I only did some quick tests] - I tried up to 30MB, but I dont think there's much improvement over 5-10MB Do you think we're reaching the limit such that the overheads of JNI are making it impossible to go any faster? Ie Where do we go from here? Another comment I have is that the configure/makefile asks for a dfs_home. It might be easier to ask for Hadoop home, then build the appropriate paths from there ($ {hadoop_home}/libhdfs and ${hadoop_home} /src/c++/libhdfs). Hadoop has no include/linux folders etc. Finally, we need a way to detect whether to use i386 or amd64 to find jvm.so Craig
          Hide
          Pete Wyckoff added a comment -

          HEre's my most recent one. I will try merging Craig's read ahead code in and then I guess see about getting it into contrib.

          Show
          Pete Wyckoff added a comment - HEre's my most recent one. I will try merging Craig's read ahead code in and then I guess see about getting it into contrib.
          Hide
          Pete Wyckoff added a comment -

          I should have mentioned I fixed the autoconf problems and made the "protectedpaths" configurable. I guess we'll have to have a discussion about whether people like this because I think Doug clearly doesn't

          Show
          Pete Wyckoff added a comment - I should have mentioned I fixed the autoconf problems and made the "protectedpaths" configurable. I guess we'll have to have a discussion about whether people like this because I think Doug clearly doesn't
          Hide
          Pete Wyckoff added a comment -

          This is by no means a completely final product but more of a version 0.1. but, it has decent autoconf, comments and readme files and has been working in production for quite a while.

          Show
          Pete Wyckoff added a comment - This is by no means a completely final product but more of a version 0.1. but, it has decent autoconf, comments and readme files and has been working in production for quite a while.
          Hide
          Pete Wyckoff added a comment -

          First checkin.

          Show
          Pete Wyckoff added a comment - First checkin.
          Hide
          Pete Wyckoff added a comment -

          Changed the affected versions and fixes to unknown from 0.5

          Show
          Pete Wyckoff added a comment - Changed the affected versions and fixes to unknown from 0.5
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12379898/patch.txt
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit -1. The applied patch generated 211 release audit warnings (more than the trunk's current 202 warnings).

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/build/test/checkstyle-errors.html
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379898/patch.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit -1. The applied patch generated 211 release audit warnings (more than the trunk's current 202 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/build/test/checkstyle-errors.html Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/artifact/trunk/current/releaseAuditDiffWarnings.txt Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2201/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          Help - I don't know what a release audit warning is ?? It just lists the filenames in the release audit link.

          Also, unit testing for this is pretty hard, but can be done to some extent in the future by running each function like fuse calls them, but these would be C unit tests anyway which I don't know if we have support for.

          Do people want to comment on the feature of moving deleted files to /Trash and also of not allowing rmdir on some "special directories" e.g., '/' '/user' /warehouse' ... ??

          Show
          Pete Wyckoff added a comment - Help - I don't know what a release audit warning is ?? It just lists the filenames in the release audit link. Also, unit testing for this is pretty hard, but can be done to some extent in the future by running each function like fuse calls them, but these would be C unit tests anyway which I don't know if we have support for. Do people want to comment on the feature of moving deleted files to /Trash and also of not allowing rmdir on some "special directories" e.g., '/' '/user' /warehouse' ... ??
          Hide
          Doug Cutting added a comment -

          The release audit flags new files that don't contain the Apache license (or old files that have had it removed). In this case most of those flagged are fine to not have the Apache license, since they're automatically generated stuff, but it probably wouldn't hurt to add it to the shell scripts.

          Some automated tests would be good, e.g., an end-to-end test that starts HDFS, mounts it with fuse, and then lists and reads files through the mount. But such tests should not be run by default, since the default build does not compile C++ code, nor should it depend on fuse being installed. But it would be good to eventually configure Hudson to run these, to verify that fuse continues to show signs of life as Hadoop evolves.

          So, in summary, Hudson will not generate a clean report card for this issue, since it will contain some files that don't have the Apache license, and Hudson will not, at this point, automatically run any new JUnit tests for it. But that doesn't mean that some licenses and tests shouldn't still be added before we commit the patch.

          Show
          Doug Cutting added a comment - The release audit flags new files that don't contain the Apache license (or old files that have had it removed). In this case most of those flagged are fine to not have the Apache license, since they're automatically generated stuff, but it probably wouldn't hurt to add it to the shell scripts. Some automated tests would be good, e.g., an end-to-end test that starts HDFS, mounts it with fuse, and then lists and reads files through the mount. But such tests should not be run by default, since the default build does not compile C++ code, nor should it depend on fuse being installed. But it would be good to eventually configure Hudson to run these, to verify that fuse continues to show signs of life as Hadoop evolves. So, in summary, Hudson will not generate a clean report card for this issue, since it will contain some files that don't have the Apache license, and Hudson will not, at this point, automatically run any new JUnit tests for it. But that doesn't mean that some licenses and tests shouldn't still be added before we commit the patch.
          Hide
          Pete Wyckoff added a comment -

          Hi Doug,

          I will add the header to all the files - think I just had it in the C file.

          Sure, I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount.

          pete

          Show
          Pete Wyckoff added a comment - Hi Doug, I will add the header to all the files - think I just had it in the C file. Sure, I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount. pete
          Hide
          Doug Cutting added a comment -

          > I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount.

          It will be easier to integrate Java unit tests. Also, currently I don't think we require Python, so I wouldn't want to add a system dependency just to test one component. Perhaps, if you don't like Java, you could write tests in C or as bash scripts, then somehow hook them into the test-contrib target?

          Show
          Doug Cutting added a comment - > I will add a Python script or something to drive creating a few files in DFS sand then trying to ls and cat them from a mount. It will be easier to integrate Java unit tests. Also, currently I don't think we require Python, so I wouldn't want to add a system dependency just to test one component. Perhaps, if you don't like Java, you could write tests in C or as bash scripts, then somehow hook them into the test-contrib target?
          Hide
          Nigel Daley added a comment -

          FWIW, Hudson nightly and patch builds do run with -Dcompile.c++=yes so that tests for Pipes and libhdfs get built and run. What doesn't get built is the eclipse plugin and the native compression library (libhadoop).

          Show
          Nigel Daley added a comment - FWIW, Hudson nightly and patch builds do run with -Dcompile.c++=yes so that tests for Pipes and libhdfs get built and run. What doesn't get built is the eclipse plugin and the native compression library (libhadoop).
          Hide
          Pete Wyckoff added a comment -

          Latest update - includes all the headers for license in every file and a test/TestFuseDFS.java . Not sure how to link this into the other build.xmls to have it buil\t and run but assume we don't want that right now anwyay.

          Show
          Pete Wyckoff added a comment - Latest update - includes all the headers for license in every file and a test/TestFuseDFS.java . Not sure how to link this into the other build.xmls to have it buil\t and run but assume we don't want that right now anwyay.
          Hide
          Craig Macdonald added a comment -

          I have created HADOOP-3264 noting the fact that permissions/owner/group-owner get/set isnt supported in libhdfs, and would be useful for fuse-hdfs. Not a blocker for this JIRA, but related all the same.

          Show
          Craig Macdonald added a comment - I have created HADOOP-3264 noting the fact that permissions/owner/group-owner get/set isnt supported in libhdfs, and would be useful for fuse-hdfs. Not a blocker for this JIRA, but related all the same.
          Hide
          Pete Wyckoff added a comment -

          I updated the patch and am hoping this operation re-starts things JIRA wise - ie runs tests and email Doug.

          Show
          Pete Wyckoff added a comment - I updated the patch and am hoping this operation re-starts things JIRA wise - ie runs tests and email Doug.
          Hide
          Pete Wyckoff added a comment -

          newest patch.

          Show
          Pete Wyckoff added a comment - newest patch.
          Hide
          Pete Wyckoff added a comment -

          incremeneted patch #

          Show
          Pete Wyckoff added a comment - incremeneted patch #
          Hide
          Craig Macdonald added a comment -

          Pete,

          I havent had time to test your latest patch, but things seems to be improving. I note your comments about exporting the fuse mount. There is a README.NFS in the fuse distribution, which concerns exporting FUSE mounts. I have copied it in verbatim below from version 2.7.3 - seems not quite mature yet.

          FUSE module in official kernels (>= 2.6.14) don't support NFS
          exporting.  In this case if you need NFS exporting capability, use the
          '--enable-kernel-module' configure option to compile the module from
          this package.  And make sure, that the FUSE is not compiled into the
          kernel (CONFIG_FUSE_FS must be 'm' or 'n').
          
          You need to add an fsid=NNN option to /etc/exports to make exporting a
          FUSE directory work.
          
          You may get ESTALE (Stale NFS file handle) errors with this.  This is
          because the current FUSE kernel API and the userspace library cannot
          handle a situation where the kernel forgets about an inode which is
          still referenced by the remote NFS client.  This problem will be
          addressed in a later version.
          
          In the future it planned that NFS exporting will be done solely in
          userspace.
          

          Regards

          C

          Show
          Craig Macdonald added a comment - Pete, I havent had time to test your latest patch, but things seems to be improving. I note your comments about exporting the fuse mount. There is a README.NFS in the fuse distribution, which concerns exporting FUSE mounts. I have copied it in verbatim below from version 2.7.3 - seems not quite mature yet. FUSE module in official kernels (>= 2.6.14) don't support NFS exporting. In this case if you need NFS exporting capability, use the '--enable-kernel-module' configure option to compile the module from this package. And make sure, that the FUSE is not compiled into the kernel (CONFIG_FUSE_FS must be 'm' or 'n'). You need to add an fsid=NNN option to /etc/exports to make exporting a FUSE directory work. You may get ESTALE (Stale NFS file handle) errors with this. This is because the current FUSE kernel API and the userspace library cannot handle a situation where the kernel forgets about an inode which is still referenced by the remote NFS client. This problem will be addressed in a later version. In the future it planned that NFS exporting will be done solely in userspace. Regards C
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12380454/patch2.txt
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12380454/patch2.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2277/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          Can someone please validate that this works for them?

          Show
          Owen O'Malley added a comment - Can someone please validate that this works for them?
          Hide
          Doug Cutting added a comment -

          I finally got this to compile, after modifying plain-bootstrap.sh and src/Makefile.am. The latter has some hardwired paths to make things work for Pete. Most of the stuff in the former is stuff that's already known to Hadoop's build (JDK location, libhdfs location, etc.)

          It should be possible to get this to build from the top-level build.xml, provided:

          • one specifies a -Dcompile.fuse=true or somesuch option
          • one has the appropriate unix environment (g++ installed, libfuse-dev installed, etc.)
            The environmental requirements should be described as best as possible in the README. The build should be dependent on libhdfs.

          Pete, are you familiar with Ant?

          Show
          Doug Cutting added a comment - I finally got this to compile, after modifying plain-bootstrap.sh and src/Makefile.am. The latter has some hardwired paths to make things work for Pete. Most of the stuff in the former is stuff that's already known to Hadoop's build (JDK location, libhdfs location, etc.) It should be possible to get this to build from the top-level build.xml, provided: one specifies a -Dcompile.fuse=true or somesuch option one has the appropriate unix environment (g++ installed, libfuse-dev installed, etc.) The environmental requirements should be described as best as possible in the README. The build should be dependent on libhdfs. Pete, are you familiar with Ant?
          Hide
          Pete Wyckoff added a comment -

          Addressed doug's concerns.

          I got rid of the home/pwyckoff sutff in Makefie.am, switched all the autoconf variables to match build.xml env vars and added a compile-fusedfs target to src/contrib/build-contrib.xml and I updated the documents.

          So, now to build, you:

          ant compile-contrib -Dfusedfs=1

          Show
          Pete Wyckoff added a comment - Addressed doug's concerns. I got rid of the home/pwyckoff sutff in Makefie.am, switched all the autoconf variables to match build.xml env vars and added a compile-fusedfs target to src/contrib/build-contrib.xml and I updated the documents. So, now to build, you: ant compile-contrib -Dfusedfs=1
          Hide
          Pete Wyckoff added a comment -

          patch2.txt old news

          Show
          Pete Wyckoff added a comment - patch2.txt old news
          Hide
          Pete Wyckoff added a comment -

          patch3.txt

          Show
          Pete Wyckoff added a comment - patch3.txt
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12380975/patch3.txt
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          patch -1. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2335/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12380975/patch3.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2335/console This message is automatically generated.
          Hide
          Maurizio added a comment - - edited

          Hi,
          probably I'm missing something, but I don't understand how patch mechanism works. (Actually, I'm not sure this is the appropriate way, in term of netiquette, to find help)
          Do I download and untar every files present above?
          Could someone suggest me how I can install this sw?

          thanks in advance

          Maurizio

          Show
          Maurizio added a comment - - edited Hi, probably I'm missing something, but I don't understand how patch mechanism works. (Actually, I'm not sure this is the appropriate way, in term of netiquette, to find help) Do I download and untar every files present above? Could someone suggest me how I can install this sw? thanks in advance Maurizio
          Hide
          Craig Macdonald added a comment - - edited

          I have some minor issues. I was working on compiling on Friday afternoon, but Doug beat me to it with identical comments

          • I see that the ant script calls the make file, with the correct env vars overridden. In that case, do we need the bootstrap, automake, configure etc? Why not a simpler build system like that used by libhdfs etc? The makefile contains references to the system that configure was run on.
          • build-contrib.xml in patch3.txt has the wrong path.
          • I think it would be better if the built fuse_dfs module was placed in $HADOOP_HOME/contrib/fuse-dfs, in a similar manner to libhdfs, etc
          • If we're keeping configure et al:
            • README.BUILD refers to bootstrap.sh, while the script is plain_bootstrap.sh
            • the configure script doesnt identify the JARCH for non-64bit platforms - see config.log for my platform
              configure:1377: checking target system type
              configure:1391: result: i686-pc-linux-gnu
              

              but JARCH is unset, should be i386. I presume this works ok from the ant script?

          • fuse_dfs_wrapper.sh has some issues - perhaps you could base this more closely on the fuse_dfs.sh I attached previously to this JIRA?
            • various variables have to be written in, eg JAVA_HOME, OS_ARCH has already been identified in configure/ant?, why cant they be set in the script?
            • fuse_dfs should be called with "$@" instead of $1 $2. However, if i use "$@" then and call the wrapper script via mount, then mount add -o ro to the end, and fuse_dfs cant handle this?
            • the classpath is hard-coded - why can't you identify all the classpath automatically from the HADOOP_HOME path?

          Will test new build in due course.

          Maurizio - see http://en.wikipedia.org/wiki/Patch_(Unix) and apply the patch to a recent release of Hadoop. It requires FUSE.

          Show
          Craig Macdonald added a comment - - edited I have some minor issues. I was working on compiling on Friday afternoon, but Doug beat me to it with identical comments I see that the ant script calls the make file, with the correct env vars overridden. In that case, do we need the bootstrap, automake, configure etc? Why not a simpler build system like that used by libhdfs etc? The makefile contains references to the system that configure was run on. build-contrib.xml in patch3.txt has the wrong path. I think it would be better if the built fuse_dfs module was placed in $HADOOP_HOME/contrib/fuse-dfs, in a similar manner to libhdfs, etc If we're keeping configure et al: README.BUILD refers to bootstrap.sh, while the script is plain_bootstrap.sh the configure script doesnt identify the JARCH for non-64bit platforms - see config.log for my platform configure:1377: checking target system type configure:1391: result: i686-pc-linux-gnu but JARCH is unset, should be i386. I presume this works ok from the ant script? fuse_dfs_wrapper.sh has some issues - perhaps you could base this more closely on the fuse_dfs.sh I attached previously to this JIRA? various variables have to be written in, eg JAVA_HOME, OS_ARCH has already been identified in configure/ant?, why cant they be set in the script? fuse_dfs should be called with "$@" instead of $1 $2. However, if i use "$@" then and call the wrapper script via mount, then mount add -o ro to the end, and fuse_dfs cant handle this? the classpath is hard-coded - why can't you identify all the classpath automatically from the HADOOP_HOME path? Will test new build in due course. Maurizio - see http://en.wikipedia.org/wiki/Patch_(Unix ) and apply the patch to a recent release of Hadoop. It requires FUSE.
          Hide
          Pete Wyckoff added a comment -

          Fixed a few changes and addressed the points Craig and Doug brought up.

          Changes:

          1. I changed the top-level build to have a compile-contrib-fuse target that exports the right properties and then has a subant task of the build.xml in the fuse-dfs directory.
          NOTE: I couldn't use the normal compile-contrib path as that path doesn''t inherit properties to the subant call. I think it's much easier for the million properties set in the top level build.xml to be available in the subant call rather than trying to re-create them. And since this is a build, I needed arch related ones.

          2. fixed fuse_dfs_wrapper.sh to set env vars only if not set and to pass all the args ala $@ to the executable

          3. added build.xml in src/contrib/fuse-dfs
          It (A) calls bootstraph.sh to get a correct Makefile and then (B) calls the Makefile to build fuse_dfs
          NOTE: I left all the autoconf stuff because doing something like libhdfs where there's only a Makefile would cause bad builds for environments unlike mine - I end up having to edit Makefile for libhdfs to set OS_ARCH=amd64.

          4. I updated README and I removed README.build

          Show
          Pete Wyckoff added a comment - Fixed a few changes and addressed the points Craig and Doug brought up. Changes: 1. I changed the top-level build to have a compile-contrib-fuse target that exports the right properties and then has a subant task of the build.xml in the fuse-dfs directory. NOTE: I couldn't use the normal compile-contrib path as that path doesn''t inherit properties to the subant call. I think it's much easier for the million properties set in the top level build.xml to be available in the subant call rather than trying to re-create them. And since this is a build, I needed arch related ones. 2. fixed fuse_dfs_wrapper.sh to set env vars only if not set and to pass all the args ala $@ to the executable 3. added build.xml in src/contrib/fuse-dfs It (A) calls bootstraph.sh to get a correct Makefile and then (B) calls the Makefile to build fuse_dfs NOTE: I left all the autoconf stuff because doing something like libhdfs where there's only a Makefile would cause bad builds for environments unlike mine - I end up having to edit Makefile for libhdfs to set OS_ARCH=amd64. 4. I updated README and I removed README.build
          Hide
          Pete Wyckoff added a comment -

          Hi Maurizio,

          To apply this patch, go to your checkout of hadoop - the top level and do "patch -p0 < patch4.txt"

          This should apply it, and then read the README is src/contrib/fuse-dfs on instructions on how to compile. Let me know if you have any problems.

          – pete

          Show
          Pete Wyckoff added a comment - Hi Maurizio, To apply this patch, go to your checkout of hadoop - the top level and do "patch -p0 < patch4.txt" This should apply it, and then read the README is src/contrib/fuse-dfs on instructions on how to compile. Let me know if you have any problems. – pete
          Hide
          Pete Wyckoff added a comment -

          patch4.txt

          Show
          Pete Wyckoff added a comment - patch4.txt
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381125/patch4.txt
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to cause Findbugs to fail.

          core tests -1. The patch failed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381125/patch4.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2339/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          This still doesn't work out of the box for me. I'll attach a new version that does.

          Show
          Doug Cutting added a comment - This still doesn't work out of the box for me. I'll attach a new version that does.
          Hide
          Doug Cutting added a comment -

          Here's a version that builds for me. I changed it to fit within the normal contrib compilation framework.

          If you want to compile it alone, manually, then you must first run 'ant -Dcompile.c+=1 compile-libhdfs' at root, then run 'ant -Dfusedfs=1' when connected to src/contrib/fuse-dfs, or you can simply run 'ant compile-contrib -Dcompile.c+=1 -Dcompile.fusedfs=1' at top-level to compile it along with all other contrib modules.

          I have not yet tested that it runs, however.

          Building this generates a lot of files that we'll need to add to the svn ignore list, and that are not removed by 'make clean'. Are all of these needed?

          Show
          Doug Cutting added a comment - Here's a version that builds for me. I changed it to fit within the normal contrib compilation framework. If you want to compile it alone, manually, then you must first run 'ant -Dcompile.c+ =1 compile-libhdfs' at root, then run 'ant -Dfusedfs=1' when connected to src/contrib/fuse-dfs, or you can simply run 'ant compile-contrib -Dcompile.c +=1 -Dcompile.fusedfs=1' at top-level to compile it along with all other contrib modules. I have not yet tested that it runs, however. Building this generates a lot of files that we'll need to add to the svn ignore list, and that are not removed by 'make clean'. Are all of these needed?
          Hide
          Doug Cutting added a comment -

          The right version of the patch...

          Show
          Doug Cutting added a comment - The right version of the patch...
          Hide
          Doug Cutting added a comment -

          After applying the patch you must 'chmod +x src/contrib/fuse-dfs/bootstrap.sh' before building the first time.

          Show
          Doug Cutting added a comment - After applying the patch you must 'chmod +x src/contrib/fuse-dfs/bootstrap.sh' before building the first time.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381140/HADOOP-4.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to cause Findbugs to fail.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381140/HADOOP-4.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2344/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          thanks doug. I will provide a make clean that cleans up everything.

          Show
          Pete Wyckoff added a comment - thanks doug. I will provide a make clean that cleans up everything.
          Hide
          Pete Wyckoff added a comment -

          Doug,

          Your patch didn't seem to work for me unless I modified the entries using $

          {basedir} in src/contrib/fuse-dfs/build.xml to append ../../../ to it. It seems in my build, basedir is now the fuse-dfs dir whereas before, I think it was pointing to the top level ??

          I'm also uploading patch5.txt which includes a clean: target that does all the clean up in the Makefile in src/contrib/fuse-dfs.

          But, note I had to include the below difference to src/contrib/fuse-dfs/build.xml (other than that it's identical to your patch).

          thanks, pete



          — fuse-dfs/build.xml 2008-04-29 17:41:55.000000000 -0700
          +++ fuse-dfs.new/build.xml 2008-04-29 17:37:47.000000000 -0700
          @@ -22,10 +22,9 @@
          <!-- fuse-dfs targets. -->
          <!-- ================================================================== -->
          <target name="compile" if="fusedfs">
          - <property name="fuse-dfs.dir" value="${basedir}

          /src/contrib/fuse-dfs/"/>
          + <property name="fuse-dfs.dir" value="$

          {basedir}"/>
          <exec dir="${fuse-dfs.dir}" executable="${fuse-dfs.dir}/bootstrap.sh">
          </exec>
          -
          <exec dir="${fuse-dfs.dir}" executable="make">
          <env key="OS_NAME" value="${os.name}"/>
          <env key="OS_ARCH" value="${os.arch}"/>
          @@ -29,11 +28,10 @@
          <exec dir="${fuse-dfs.dir}" executable="make">
          <env key="OS_NAME" value="${os.name}"/>
          <env key="OS_ARCH" value="${os.arch}"/>
          - <env key="HADOOP_HOME" value="${basedir}

          "/>
          + <env key="HADOOP_HOME" value="$

          {basedir}/../../.."/>
          <env key="PROTECTED_PATHS" value="/,/Trash,/user"/>
          <env key="PACKAGE_VERSION" value="0.1.0"/>
          <env key="FUSE_HOME" value="/usr/local"/>
          - <env key="LIBHDFS_BUILD_DIR" value="${basedir}

          /src/c++/libhdfs"/>
          </exec>
          </target>
          </project>

          Show
          Pete Wyckoff added a comment - Doug, Your patch didn't seem to work for me unless I modified the entries using $ {basedir} in src/contrib/fuse-dfs/build.xml to append ../../../ to it. It seems in my build, basedir is now the fuse-dfs dir whereas before, I think it was pointing to the top level ?? I'm also uploading patch5.txt which includes a clean: target that does all the clean up in the Makefile in src/contrib/fuse-dfs. But, note I had to include the below difference to src/contrib/fuse-dfs/build.xml (other than that it's identical to your patch). thanks, pete — fuse-dfs/build.xml 2008-04-29 17:41:55.000000000 -0700 +++ fuse-dfs.new/build.xml 2008-04-29 17:37:47.000000000 -0700 @@ -22,10 +22,9 @@ <!-- fuse-dfs targets. --> <!-- ================================================================== --> <target name="compile" if="fusedfs"> - <property name="fuse-dfs.dir" value="${basedir} /src/contrib/fuse-dfs/"/> + <property name="fuse-dfs.dir" value="$ {basedir}"/> <exec dir="${fuse-dfs.dir}" executable="${fuse-dfs.dir}/bootstrap.sh"> </exec> - <exec dir="${fuse-dfs.dir}" executable="make"> <env key="OS_NAME" value="${os.name}"/> <env key="OS_ARCH" value="${os.arch}"/> @@ -29,11 +28,10 @@ <exec dir="${fuse-dfs.dir}" executable="make"> <env key="OS_NAME" value="${os.name}"/> <env key="OS_ARCH" value="${os.arch}"/> - <env key="HADOOP_HOME" value="${basedir} "/> + <env key="HADOOP_HOME" value="$ {basedir}/../../.."/> <env key="PROTECTED_PATHS" value="/,/Trash,/user"/> <env key="PACKAGE_VERSION" value="0.1.0"/> <env key="FUSE_HOME" value="/usr/local"/> - <env key="LIBHDFS_BUILD_DIR" value="${basedir} /src/c++/libhdfs"/> </exec> </target> </project>
          Hide
          Pete Wyckoff added a comment -

          Note, the comment I made about the

          {basedir}

          in src/contrib/fuse-dfs/build.xml seems to be the thing that also made the last hudson build fail.

          Show
          Pete Wyckoff added a comment - Note, the comment I made about the {basedir} in src/contrib/fuse-dfs/build.xml seems to be the thing that also made the last hudson build fail.
          Hide
          Pete Wyckoff added a comment -

          sorry - should be patch4.txt

          Show
          Pete Wyckoff added a comment - sorry - should be patch4.txt
          Hide
          Doug Cutting added a comment -

          Your diff above isn't to my patch. My patch, e.g., sets HADOOP_HOME to $

          {hadoop.home}

          , not to $

          {basedir}. In my patch, src/contrib/fuse-dfs/build.xml does not refer to ${basedir}

          at all, since $

          {basedir}

          is the CWD and can thus be elided in relative paths.

          Show
          Doug Cutting added a comment - Your diff above isn't to my patch. My patch, e.g., sets HADOOP_HOME to $ {hadoop.home} , not to $ {basedir}. In my patch, src/contrib/fuse-dfs/build.xml does not refer to ${basedir} at all, since $ {basedir} is the CWD and can thus be elided in relative paths.
          Hide
          Pete Wyckoff added a comment -

          fixes the cleanup problem

          Show
          Pete Wyckoff added a comment - fixes the cleanup problem
          Hide
          Doug Cutting added a comment -

          Your changes to src/c+/libhdfs/Makefile break things for me. Also, you undid one of my changes to build.xml, making compile-libhdfs conditional on compile.c+. This is required to make compile-libhdfs an optional target, which we must do now that compile-contrib depends on it. Finally, there are still a lot of generated symlinks and files left in src/contrib/fuse-dfs after a 'make clean'.

          Show
          Doug Cutting added a comment - Your changes to src/c+ /libhdfs/Makefile break things for me. Also, you undid one of my changes to build.xml, making compile-libhdfs conditional on compile.c +. This is required to make compile-libhdfs an optional target, which we must do now that compile-contrib depends on it. Finally, there are still a lot of generated symlinks and files left in src/contrib/fuse-dfs after a 'make clean'.
          Hide
          Pete Wyckoff added a comment -

          all im' trying to do is add:

          clean:
          rm -rf autom4te.cache config.guess config.log config.status config.sub configure depcomp src/.deps install-sh Makefile.in src/Makefile.in src/Makefile missing Makefile src/fuse_dfs.o src/fuse_dfs

          to src/contrib/Makefile.am (this will clean up everything)

          and change src/contrib/build.xml to do executable bin/sh with arg value=bootstrap.sh to avoid the chmod +x problem.

          I thought that is all i changed.

          Show
          Pete Wyckoff added a comment - all im' trying to do is add: clean: rm -rf autom4te.cache config.guess config.log config.status config.sub configure depcomp src/.deps install-sh Makefile.in src/Makefile.in src/Makefile missing Makefile src/fuse_dfs.o src/fuse_dfs to src/contrib/Makefile.am (this will clean up everything) and change src/contrib/build.xml to do executable bin/sh with arg value=bootstrap.sh to avoid the chmod +x problem. I thought that is all i changed.
          Hide
          Pete Wyckoff added a comment -

          patch6.txt

          Show
          Pete Wyckoff added a comment - patch6.txt
          Hide
          Pete Wyckoff added a comment -

          this is it

          Show
          Pete Wyckoff added a comment - this is it
          Hide
          Pete Wyckoff added a comment -

          last patch is just doug's HADOOP-4 and the 2 above changes and removing make clean from boostrap.sh and configure.ac

          Show
          Pete Wyckoff added a comment - last patch is just doug's HADOOP-4 and the 2 above changes and removing make clean from boostrap.sh and configure.ac
          Hide
          Doug Cutting added a comment -

          > avoid the chmod +x problem

          I wouldn't worry too much about that. The patch doesn't remember that it's executable, but subversion will. But, sure, fixing that is fine too.

          > I thought that is all i changed.

          Did you 'svn revert -R .' and make sure that 'svn stat' reported nothing before applying my patch and making new mods?

          Show
          Doug Cutting added a comment - > avoid the chmod +x problem I wouldn't worry too much about that. The patch doesn't remember that it's executable, but subversion will. But, sure, fixing that is fine too. > I thought that is all i changed. Did you 'svn revert -R .' and make sure that 'svn stat' reported nothing before applying my patch and making new mods?
          Hide
          Pete Wyckoff added a comment -

          Yes, I think what may have happened is I uploaded an older patch.

          This time I did a revert, applied HADOOP-4 and just edited those files and created the patch.

          Show
          Pete Wyckoff added a comment - Yes, I think what may have happened is I uploaded an older patch. This time I did a revert, applied HADOOP-4 and just edited those files and created the patch.
          Hide
          Raghu Angadi added a comment -

          This got assigned to by mistake. I haven't followed this jira closely till now.

          Show
          Raghu Angadi added a comment - This got assigned to by mistake. I haven't followed this jira closely till now.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381278/patch6.txt
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to cause Findbugs to fail.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381278/patch6.txt against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2360/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          Hi Doug,

          The following is the compile error that happens when running with Hudson - I don't understand why it's having this problem. Can you look at it? Thanks for your help with this.

          pete

          -------------------

          BUILD FAILED
          /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build.xml:780: The following error occurred while executing this line:
          /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/build.xml:39: The following error occurred while executing this line:
          /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/build-contrib.xml:157: /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/fuse-dfs/classes not found.

          Show
          Pete Wyckoff added a comment - Hi Doug, The following is the compile error that happens when running with Hudson - I don't understand why it's having this problem. Can you look at it? Thanks for your help with this. pete ------------------- BUILD FAILED /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build.xml:780: The following error occurred while executing this line: /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/build.xml:39: The following error occurred while executing this line: /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/build-contrib.xml:157: /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/fuse-dfs/classes not found.
          Hide
          Doug Cutting added a comment -

          > I don't understand why it's having this problem.

          The problem is that the "jar" and "package" targets are failing in fuse-dfs.

          Show
          Doug Cutting added a comment - > I don't understand why it's having this problem. The problem is that the "jar" and "package" targets are failing in fuse-dfs.
          Hide
          Doug Cutting added a comment -

          Somehow you lost my 'if="compile.c++" addition to the compile-libhdfs target in build.xml again. I re-added that, updated the README, and added "jar" and "package" targets to make Hudson happier.

          This now builds for me. I also tested it. I was able to mount an HDFS filesystem list directories, and read files.

          Show
          Doug Cutting added a comment - Somehow you lost my 'if="compile.c++" addition to the compile-libhdfs target in build.xml again. I re-added that, updated the README, and added "jar" and "package" targets to make Hudson happier. This now builds for me. I also tested it. I was able to mount an HDFS filesystem list directories, and read files.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381349/HADOOP-4.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 13 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381349/HADOOP-4.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 13 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2372/console This message is automatically generated.
          Hide
          Craig Macdonald added a comment -

          Comments on the latest patch:

          • +1 It compiles for me from ant
          • -1 but only after I build libhdfs or alter fuse-dfs/src/Makefile.am to refer to $(HADOOP_HOME)/libhdfs and not $(HADOOP_HOME)/build/libhdfs
          • -1fuse_dfs_wrapper.sh : LD_LIBRARY_PATH should be set after JAVA_HOME; LD_LIBRARY_PATH should contain $HADOOP_HOME/libhdfs
          • -1 I think that there should be package target in fuse-dfs/build.xml that copies fuse-dfs stuff into $HADOOP_HOME/contrib/fuse-dfs - this to be the final place that fuse-dfs can always be found at in a hadoop installation
          • Note: when I run my fuse-dfs mount as root, I have to turn dfs.permissions to off. root doesnt have any permissions in our dfs setup, so hence everything gets permission denied. In future, we should alter libhdfs such that it has a "super-user API" - i.e. if I access file X as user Y, am I permitted to access it? See also HADOOP-3264, which deals with supporting permissions in libhdfs
          Show
          Craig Macdonald added a comment - Comments on the latest patch: +1 It compiles for me from ant -1 but only after I build libhdfs or alter fuse-dfs/src/Makefile.am to refer to $(HADOOP_HOME)/libhdfs and not $(HADOOP_HOME)/build/libhdfs -1fuse_dfs_wrapper.sh : LD_LIBRARY_PATH should be set after JAVA_HOME; LD_LIBRARY_PATH should contain $HADOOP_HOME/libhdfs -1 I think that there should be package target in fuse-dfs/build.xml that copies fuse-dfs stuff into $HADOOP_HOME/contrib/fuse-dfs - this to be the final place that fuse-dfs can always be found at in a hadoop installation Note: when I run my fuse-dfs mount as root, I have to turn dfs.permissions to off. root doesnt have any permissions in our dfs setup, so hence everything gets permission denied. In future, we should alter libhdfs such that it has a "super-user API" - i.e. if I access file X as user Y, am I permitted to access it? See also HADOOP-3264 , which deals with supporting permissions in libhdfs
          Hide
          Allen Wittenauer added a comment -

          a) Rather than use LD_LIBRARY_PATH, would it be better to set a runtime link path that used $ORIGIN?

          b) what happens if root is part of the super group?

          Show
          Allen Wittenauer added a comment - a) Rather than use LD_LIBRARY_PATH, would it be better to set a runtime link path that used $ORIGIN? b) what happens if root is part of the super group?
          Hide
          Craig Macdonald added a comment -

          @Allen

          a) not sure what you mean here. Ideally, I'd like to use fuse_dfs_wrapper.sh in my fstab/automount lines, so it should have all env vars already set, if they can be derived at build or run time.

          b) This works fine. Permissions within fuse-dfs are a whole other kettle of fish, so I think I'll keep quiet until fuse-dfs is committed, then start another JIRA. It's just worth noting that if you want to share a fuse-dfs mount between multiple users, then the DFS permissions model will be broken.

          Show
          Craig Macdonald added a comment - @Allen a) not sure what you mean here. Ideally, I'd like to use fuse_dfs_wrapper.sh in my fstab/automount lines, so it should have all env vars already set, if they can be derived at build or run time. b) This works fine. Permissions within fuse-dfs are a whole other kettle of fish, so I think I'll keep quiet until fuse-dfs is committed, then start another JIRA. It's just worth noting that if you want to share a fuse-dfs mount between multiple users, then the DFS permissions model will be broken.
          Hide
          Doug Cutting added a comment -

          > -1 but only after I build libhdfs

          The confusion is that libhdfs is now conditioned on compile.c+, but fuse-dfs does not, so it's possible to invoke ant at the top level in such a way that it will try to compile fuse-dfs without having compiled libhdfs. To fix this we should probably make fuse-dfs conditional on compile.c+ too. In any case, you need to specify compile.c++=1 for top-level builds of fuse-dfs to work.

          > -1 I think that there should be package target in fuse-dfs/build.xml that copies fuse-dfs stuff into $HADOOP_HOME/contrib/fuse-dfs

          Good idea. The fuse-dfs package target should copy things to $

          {dist.dir}

          /contrib/$

          {name}

          .

          Who will update the patch?

          Show
          Doug Cutting added a comment - > -1 but only after I build libhdfs The confusion is that libhdfs is now conditioned on compile.c+ , but fuse-dfs does not, so it's possible to invoke ant at the top level in such a way that it will try to compile fuse-dfs without having compiled libhdfs. To fix this we should probably make fuse-dfs conditional on compile.c + too. In any case, you need to specify compile.c++=1 for top-level builds of fuse-dfs to work. > -1 I think that there should be package target in fuse-dfs/build.xml that copies fuse-dfs stuff into $HADOOP_HOME/contrib/fuse-dfs Good idea. The fuse-dfs package target should copy things to $ {dist.dir} /contrib/$ {name} . Who will update the patch?
          Hide
          Pete Wyckoff added a comment -

          i added the package target and also made fuse dfs build dependent on both compile.c++ and fusedfs properties.

          I also remove aclocal.m4 as that's a generated file.

          Show
          Pete Wyckoff added a comment - i added the package target and also made fuse dfs build dependent on both compile.c++ and fusedfs properties. I also remove aclocal.m4 as that's a generated file.
          Hide
          Pete Wyckoff added a comment -

          HADOOP-4.patch

          Show
          Pete Wyckoff added a comment - HADOOP-4 .patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381459/HADOOP-4.patch
          against trunk revision 653638.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381459/HADOOP-4.patch against trunk revision 653638. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2401/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          my bad - added if for the package target.

          Show
          Pete Wyckoff added a comment - my bad - added if for the package target.
          Hide
          Doug Cutting added a comment -

          Here's a new version that:

          • includes the README in releases
          • doesn't overload "compile.c+" for libhdfs, but adds a new "libhdfs" property, since "compile.c+" is used by Hudson on Solaris, and libhdfs doesn't (yet) compile on Solaris.
          • includes libhdfs & fuse-dfs in releases if libhdfs=1 and fusedfs=1
          Show
          Doug Cutting added a comment - Here's a new version that: includes the README in releases doesn't overload "compile.c+ " for libhdfs, but adds a new "libhdfs" property, since "compile.c +" is used by Hudson on Solaris, and libhdfs doesn't (yet) compile on Solaris. includes libhdfs & fuse-dfs in releases if libhdfs=1 and fusedfs=1
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381541/HADOOP-4.patch
          against trunk revision 653906.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381541/HADOOP-4.patch against trunk revision 653906. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2415/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          Any reason for me not to commit this now?

          Show
          Doug Cutting added a comment - Any reason for me not to commit this now?
          Hide
          Craig Macdonald added a comment -

          Sorry, will review over the weekend. Thesis writing this week...

          Show
          Craig Macdonald added a comment - Sorry, will review over the weekend. Thesis writing this week...
          Hide
          Craig Macdonald added a comment -

          Minor notes:

          • I had to recompile set ant property -Dlibhdfs=1 to get this to compile - perhaps a warning message would have been desirable if the properties are not set?
          • Concerning the note on compiling libhdfs for 64bit arch:
            NOTE: for amd64 architecture, libhdfs will not compile unless you edit
            the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64
            (probably the same for others too).
            

            In the editing of src/c++/libhdfs/Makefile it is sufficient to just change -m32 to -m64 in the CPPFLAGS and LDFLAGS lines, until HADOOP-3344 is ready.

          Otherwise, good to be committed.

          C

          Show
          Craig Macdonald added a comment - Minor notes: I had to recompile set ant property -Dlibhdfs=1 to get this to compile - perhaps a warning message would have been desirable if the properties are not set? Concerning the note on compiling libhdfs for 64bit arch: NOTE: for amd64 architecture, libhdfs will not compile unless you edit the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64 (probably the same for others too). In the editing of src/c++/libhdfs/Makefile it is sufficient to just change -m32 to -m64 in the CPPFLAGS and LDFLAGS lines, until HADOOP-3344 is ready. Otherwise, good to be committed. C
          Hide
          Pete Wyckoff added a comment -

          I made the change to the README that craig suggested.

          Show
          Pete Wyckoff added a comment - I made the change to the README that craig suggested.
          Hide
          Doug Cutting added a comment -

          > I made the change to the README that craig suggested.

          Yes, but it doesn't look like you started with my patch, but rather with an older version...

          Show
          Doug Cutting added a comment - > I made the change to the README that craig suggested. Yes, but it doesn't look like you started with my patch, but rather with an older version...
          Hide
          Pete Wyckoff added a comment -

          yes, it looks like i was thinking the jira attachment #-ing would be chronological. I think how to set the architecture to 64 bit is pretty trivial difference and would be done either way, so i recommend we just commit the latest good patch and then move on from there.

          – pete

          Show
          Pete Wyckoff added a comment - yes, it looks like i was thinking the jira attachment #-ing would be chronological. I think how to set the architecture to 64 bit is pretty trivial difference and would be done either way, so i recommend we just commit the latest good patch and then move on from there. – pete
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Pete!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Pete!
          Hide
          Craig Macdonald added a comment -

          Excellent. Thanks for your good work Pete & Doug. Doug or someone, can you create a contrib/fuse-dfs component to JIRA for any future issues.

          Cheers

          Show
          Craig Macdonald added a comment - Excellent. Thanks for your good work Pete & Doug. Doug or someone, can you create a contrib/fuse-dfs component to JIRA for any future issues. Cheers
          Hide
          Owen O'Malley added a comment -

          done!

          Show
          Owen O'Malley added a comment - done!
          Hide
          Doug Cutting added a comment -

          Since we have the component, we might as well use it!

          Show
          Doug Cutting added a comment - Since we have the component, we might as well use it!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Some facebook related stuff found in the patch and got into trunk. For example, search "facebook" in src/contrib/fuse-dfs/configure.ac

          Show
          Tsz Wo Nicholas Sze added a comment - Some facebook related stuff found in the patch and got into trunk. For example, search "facebook" in src/contrib/fuse-dfs/configure.ac
          Hide
          Doug Cutting added a comment -

          > Some facebook related stuff found in the patch and got into trunk.

          Do you want to file another issue for this?

          Show
          Doug Cutting added a comment - > Some facebook related stuff found in the patch and got into trunk. Do you want to file another issue for this?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Sure, created HADOOP-3476

          Show
          Tsz Wo Nicholas Sze added a comment - Sure, created HADOOP-3476
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #509 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/509/ )
          Hide
          chosuan added a comment - - edited

          Fuse-dfs Make Error!!!

          Help me please..

          Make Compile Error Message is....
          ==================================================================================================
          [root@chosuan2 fuse-dfs]# make
          Making all in .
          make[1]: Entering directory `/root/fuse/fuse-dfs'
          make[1]: Leaving directory `/root/fuse/fuse-dfs'
          Making all in src
          make[1]: Entering directory `/root/fuse/fuse-dfs/src'
          gcc -Wall -O3 -L/hdfs/shared -lhdfs -L/usr/local/lib -lfuse -L/usr/jdk1.6.0_06/jre/lib/amd64/server -ljvm -o fuse_dfs fuse_dfs.o
          /usr/bin/ld: skipping incompatible /usr/jdk1.6.0_06/jre/lib/amd64/server/libhdfs.so when searching for -lhdfs
          /usr/bin/ld: cannot find -lhdfs
          collect2: ld returned 1 exit status
          make[1]: *** [fuse_dfs] error 1
          make[1]: Leaving directory `/root/fuse/fuse-dfs/src'
          make: *** [all-recursive] error 1
          ==================================================================================================

          Show
          chosuan added a comment - - edited Fuse-dfs Make Error!!! Help me please.. Make Compile Error Message is.... ================================================================================================== [root@chosuan2 fuse-dfs] # make Making all in . make [1] : Entering directory `/root/fuse/fuse-dfs' make [1] : Leaving directory `/root/fuse/fuse-dfs' Making all in src make [1] : Entering directory `/root/fuse/fuse-dfs/src' gcc -Wall -O3 -L/hdfs/shared -lhdfs -L/usr/local/lib -lfuse -L/usr/jdk1.6.0_06/jre/lib/amd64/server -ljvm -o fuse_dfs fuse_dfs.o /usr/bin/ld: skipping incompatible /usr/jdk1.6.0_06/jre/lib/amd64/server/libhdfs.so when searching for -lhdfs /usr/bin/ld: cannot find -lhdfs collect2: ld returned 1 exit status make [1] : *** [fuse_dfs] error 1 make [1] : Leaving directory `/root/fuse/fuse-dfs/src' make: *** [all-recursive] error 1 ==================================================================================================
          Hide
          Craig Macdonald added a comment -

          chousan:

          libhdfs isnt build 64bit by default. You need to rebuild it 64bit, ensuring to remove -m32 from the Makefile.

          See HADOOP-3344 for more details

          Show
          Craig Macdonald added a comment - chousan: libhdfs isnt build 64bit by default. You need to rebuild it 64bit, ensuring to remove -m32 from the Makefile. See HADOOP-3344 for more details
          Hide
          Robert Chansler added a comment -

          Moved the usage info form the release note to the description so that the release notes file can be generated automatically with modest sized notes for each item.

          Congratulations for getting this in!

          Show
          Robert Chansler added a comment - Moved the usage info form the release note to the description so that the release notes file can be generated automatically with modest sized notes for each item. Congratulations for getting this in!
          Hide
          Rita M added a comment -

          By looking at

          hdfs -fstype=fuse,rw,nodev,nonempty,noatime,allow_other :/path/to/fuse_dfs_moutn/fuse_dfs.sh#dfs\://namenode\:9000

          Does this still work with a recent version of autofs? Can someone please confirm

          Show
          Rita M added a comment - By looking at hdfs -fstype=fuse,rw,nodev,nonempty,noatime,allow_other :/path/to/fuse_dfs_moutn/fuse_dfs.sh#dfs\://namenode\:9000 Does this still work with a recent version of autofs? Can someone please confirm

            People

            • Assignee:
              Pete Wyckoff
              Reporter:
              John Xing
            • Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development