Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduce write support for Fuse; requires Linux kernel 2.6.15 or better.

      Description

      1. dfs_write should return the #of bytes written and not 0
      2. implement dfs_flush
      3. uncomment/fix dfs_create
      4. fix the flags argument passed to libhdfs openFile to get around the bug in hadoop-3723
      5. Since I am adding a write unit test, I noticed the unit tests are in the wrong directory - should be in contrib/fuse-dfs/src/test and not contrib/fuse-dfs/test

      1. patch1.txt
        2 kB
        Pete Wyckoff
      2. patch2.txt
        28 kB
        Pete Wyckoff
      3. patch3.txt
        24 kB
        Pete Wyckoff
      4. patch4.txt
        25 kB
        Pete Wyckoff
      5. TEST-TestFuseDFS.txt
        17 kB
        Pete Wyckoff

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          This sounds reasonable to me.

          Show
          Doug Cutting added a comment - This sounds reasonable to me.
          Hide
          Chuang Liu added a comment -

          Hi:

          Is it possible to use struct fuse_file_info * to pass file handler created in fuse 'create' call to the following I/O calls like FUSE "write"? In this way, you do not need close the file in fuse "create()" and reopen it in the following fuse "write()" call.

          For example, for a linux 'cp' command, The fuse call sequence is like the following:
          static int create (const char *, mode_t, struct fuse_file_info *) ;
          static int write (const char *, const char *, size_t, off_t,struct fuse_file_info *)
          static int flush(...)
          static int release(...)

          The data in struct fuse_file_info* is passed along the call sequence. Therefore, we could use it to store hadoop file handler or whatever info needed for following I/O info. Thanks.

          Chuang

          Show
          Chuang Liu added a comment - Hi: Is it possible to use struct fuse_file_info * to pass file handler created in fuse 'create' call to the following I/O calls like FUSE "write"? In this way, you do not need close the file in fuse "create()" and reopen it in the following fuse "write()" call. For example, for a linux 'cp' command, The fuse call sequence is like the following: static int create (const char *, mode_t, struct fuse_file_info *) ; static int write (const char *, const char *, size_t, off_t,struct fuse_file_info *) static int flush(...) static int release(...) The data in struct fuse_file_info* is passed along the call sequence. Therefore, we could use it to store hadoop file handler or whatever info needed for following I/O info. Thanks. Chuang
          Hide
          Pete Wyckoff added a comment -

          Good question - the problem is that in some situations, fuse seems to do something more like:

          create
          release
          open
          write
          flush
          release.

          So, although we already store the file pointer in the fuse_file_info structure, there's nothing we can do if fuse tells us to close the file.

          And note, the fuse_file_info pointer passed into open doesn't include the original fuse_file_info data. So, even not closing it on the first release won't help - unless we cached the file handle ourselves.

          If you know how we can get around fuse doing this, it would be very helpful.

          – pete

          Show
          Pete Wyckoff added a comment - Good question - the problem is that in some situations, fuse seems to do something more like: create release open write flush release. So, although we already store the file pointer in the fuse_file_info structure, there's nothing we can do if fuse tells us to close the file. And note, the fuse_file_info pointer passed into open doesn't include the original fuse_file_info data. So, even not closing it on the first release won't help - unless we cached the file handle ourselves. If you know how we can get around fuse doing this, it would be very helpful. – pete
          Hide
          Chuang Liu added a comment -

          Could you please clarify the situations when this call sequence happens? Could it be fuse version related? I used FUSE 2.6.

          Also, in fuse.h, the description of fuse "create" says "If this method is not implemented or under linux kernel versions earlier than 2.6.15, the mknod() and open() methods will be called instead". It might be interesting to try geting rid of 'create' and moving logic in this function to 'open' such that the file gets closed just once?

          Chuang

          Show
          Chuang Liu added a comment - Could you please clarify the situations when this call sequence happens? Could it be fuse version related? I used FUSE 2.6. Also, in fuse.h, the description of fuse "create" says "If this method is not implemented or under linux kernel versions earlier than 2.6.15, the mknod() and open() methods will be called instead". It might be interesting to try geting rid of 'create' and moving logic in this function to 'open' such that the file gets closed just once? Chuang
          Hide
          Craig Macdonald added a comment -

          Is HADOOP-1700 really so far away that we need to hack around it? There's only two pre-req JIRAs remaining.

          Show
          Craig Macdonald added a comment - Is HADOOP-1700 really so far away that we need to hack around it? There's only two pre-req JIRAs remaining.
          Hide
          Hong Tang added a comment -

          Why would you consider it a hack? Well, maybe true in the sense that writing FUSE for HDFS is a hack because you will never be able to perform random writes.

          In any case, my take is that by not depending on appending, the FUSE-based mounter can be used in current and older versions of HDFS, which benefits users who prefer hold-and-wait when some new things come out.

          BTW, I do not believe this should be a major effort, and I am reasonably comfortable to say this because my old company has an internal product very similar to HDFS and we got it working with FUSE flawlessly.

          -Hong

          Show
          Hong Tang added a comment - Why would you consider it a hack? Well, maybe true in the sense that writing FUSE for HDFS is a hack because you will never be able to perform random writes. In any case, my take is that by not depending on appending, the FUSE-based mounter can be used in current and older versions of HDFS, which benefits users who prefer hold-and-wait when some new things come out. BTW, I do not believe this should be a major effort, and I am reasonably comfortable to say this because my old company has an internal product very similar to HDFS and we got it working with FUSE flawlessly. -Hong
          Hide
          Pete Wyckoff added a comment -

          my first cut at fixing writes and also at cleaning up deletions moving things to trash (when the flag for that is enabled )

          Show
          Pete Wyckoff added a comment - my first cut at fixing writes and also at cleaning up deletions moving things to trash (when the flag for that is enabled )
          Hide
          Pete Wyckoff added a comment -

          I don't know what I was thinking before about writes not working. Must have been confused. There was a bug (the write not returning #bytes written) which I have fixed and also enabled create, write, mknod and flush.

          I'm including this now, but will wait to officially submit a patch until I have a unit test.

          Show
          Pete Wyckoff added a comment - I don't know what I was thinking before about writes not working. Must have been confused. There was a bug (the write not returning #bytes written) which I have fixed and also enabled create, write, mknod and flush. I'm including this now, but will wait to officially submit a patch until I have a unit test.
          Hide
          Pete Wyckoff added a comment -

          FUSE/Java FileOutputStream is causing the close to happen right after the create. I don't know which. Here's the output from fuse running in debug mode:

          [junit] unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
          [junit] INIT: 7.8
          [junit] flags=0x00000003
          [junit] max_readahead=0x00020000
          [junit] INIT: 7.8
          [junit] flags=0x00000001
          [junit] max_readahead=0x00020000
          [junit] max_write=0x00020000
          [junit] unique: 1, error: 0 (Success), outsize: 40
          [junit] unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 50
          [junit] LOOKUP /hello.txt
          [junit] unique: 2, error: -2 (No such file or directory), outsize: 16
          [junit] unique: 3, opcode: MKNOD (8), nodeid: 1, insize: 58
          [junit] MKNOD /hello.txt
          [junit] in dfs_create! NODEID: 2
          [junit] dfs_release unique: 3, error: 0 (Success), outsize: 136
          [junit] unique: 4, opcode: OPEN (14), nodeid: 2, insize: 48
          [junit] unique: 4, error: 0 (Success), outsize: 32
          [junit] OPEN[12151664] flags: 0x8401 /hello.txt
          [junit] unique: 5, opcode: WRITE (16), nodeid: 2, insize: 75
          [junit] WRITE[12151664] 11 bytes to 0
          [junit] WRITE[12151664] 11 bytes
          [junit] unique: 5, error: 0 (Success), outsize: 24
          [junit] unique: 6, opcode: FLUSH (25), nodeid: 2, insize: 64
          [junit] FLUSH[12151664]
          [junit] unique: 6, error: 0 (Success), outsize: 16
          [junit] unique: 7, opcode: RELEASE (18), nodeid: 2, insize: 64
          [junit] RELEASE[12151664] flags: 0x8401

          And here's the java code creating the file:

          // create the file
          File file = new File(mpoint, "hello.txt");
          FileOutputStream f = new FileOutputStream(file);
          String s = "hello world";
          f.write(s.getBytes());
          f.flush();
          f.close();

          Show
          Pete Wyckoff added a comment - FUSE/Java FileOutputStream is causing the close to happen right after the create. I don't know which. Here's the output from fuse running in debug mode: [junit] unique: 1, opcode: INIT (26), nodeid: 0, insize: 56 [junit] INIT: 7.8 [junit] flags=0x00000003 [junit] max_readahead=0x00020000 [junit] INIT: 7.8 [junit] flags=0x00000001 [junit] max_readahead=0x00020000 [junit] max_write=0x00020000 [junit] unique: 1, error: 0 (Success), outsize: 40 [junit] unique: 2, opcode: LOOKUP (1), nodeid: 1, insize: 50 [junit] LOOKUP /hello.txt [junit] unique: 2, error: -2 (No such file or directory), outsize: 16 [junit] unique: 3, opcode: MKNOD (8), nodeid: 1, insize: 58 [junit] MKNOD /hello.txt [junit] in dfs_create! NODEID: 2 [junit] dfs_release unique: 3, error: 0 (Success), outsize: 136 [junit] unique: 4, opcode: OPEN (14), nodeid: 2, insize: 48 [junit] unique: 4, error: 0 (Success), outsize: 32 [junit] OPEN [12151664] flags: 0x8401 /hello.txt [junit] unique: 5, opcode: WRITE (16), nodeid: 2, insize: 75 [junit] WRITE [12151664] 11 bytes to 0 [junit] WRITE [12151664] 11 bytes [junit] unique: 5, error: 0 (Success), outsize: 24 [junit] unique: 6, opcode: FLUSH (25), nodeid: 2, insize: 64 [junit] FLUSH [12151664] [junit] unique: 6, error: 0 (Success), outsize: 16 [junit] unique: 7, opcode: RELEASE (18), nodeid: 2, insize: 64 [junit] RELEASE [12151664] flags: 0x8401 And here's the java code creating the file: // create the file File file = new File(mpoint, "hello.txt"); FileOutputStream f = new FileOutputStream(file); String s = "hello world"; f.write(s.getBytes()); f.flush(); f.close();
          Hide
          Pete Wyckoff added a comment -

          Ignore above - I now see that open will only do what we want on kernels >= 2.6.15, which I was not using when running the above test.

          Show
          Pete Wyckoff added a comment - Ignore above - I now see that open will only do what we want on kernels >= 2.6.15, which I was not using when running the above test.
          Hide
          Pete Wyckoff added a comment -

          This is the patch for supporting writes and adding write unit test.

          Show
          Pete Wyckoff added a comment - This is the patch for supporting writes and adding write unit test.
          Hide
          Pete Wyckoff added a comment -

          run the unit tests with:

          ant test-contrib -Dfusedfs=1 -Dlibhdfs=1 -Dtestcase=TestFuseDFS

          Show
          Pete Wyckoff added a comment - run the unit tests with: ant test-contrib -Dfusedfs=1 -Dlibhdfs=1 -Dtestcase=TestFuseDFS
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385690/patch2.txt
          against trunk revision 676069.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 36 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2838/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385690/patch2.txt against trunk revision 676069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 36 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2838/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          another try - problem was i svn moved fuse-dfs/test to fuse-dfs/src/test. this time i svn rmed and svn - added

          Show
          Pete Wyckoff added a comment - another try - problem was i svn moved fuse-dfs/test to fuse-dfs/src/test. this time i svn rmed and svn - added
          Hide
          Pete Wyckoff added a comment -

          patch3.txt

          Show
          Pete Wyckoff added a comment - patch3.txt
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385996/patch3.txt
          against trunk revision 676772.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385996/patch3.txt against trunk revision 676772. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2862/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          Can one of you do a code review on the patch and we can then ask Doug or someone to commit this?

          thx, pete

          Show
          Pete Wyckoff added a comment - Can one of you do a code review on the patch and we can then ask Doug or someone to commit this? thx, pete
          Hide
          Pete Wyckoff added a comment -

          Fixing the description to more accurately reflect what this issue it about. (now that I know on kernels > 2.6.15, fuse will not try opening and closing and then opening the file if the dfs_create function is there.

          Show
          Pete Wyckoff added a comment - Fixing the description to more accurately reflect what this issue it about. (now that I know on kernels > 2.6.15, fuse will not try opening and closing and then opening the file if the dfs_create function is there.
          Hide
          Pete Wyckoff added a comment -

          I just looked at the code again and see that instead of returning size from the dfs_write, it should return the #of bytes actually written. And instead of forcing those 2 be the same, the error condition should just check that #of bytes written > 0.

          I will supply a new patch.

          Show
          Pete Wyckoff added a comment - I just looked at the code again and see that instead of returning size from the dfs_write, it should return the #of bytes actually written. And instead of forcing those 2 be the same, the error condition should just check that #of bytes written > 0. I will supply a new patch.
          Hide
          Pete Wyckoff added a comment -

          uploading a new one that includes check for O_RDWR and allows for partial writes (i.e., does not force #of bytes requested to equal #of actual bytes written.

          Show
          Pete Wyckoff added a comment - uploading a new one that includes check for O_RDWR and allows for partial writes (i.e., does not force #of bytes requested to equal #of actual bytes written.
          Hide
          Zheng Shao added a comment -

          +1

          Show
          Zheng Shao added a comment - +1
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12386360/patch4.txt
          against trunk revision 677781.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386360/patch4.txt against trunk revision 677781. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2898/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          Doug or Dhruba, I think this is ready to be committed as it passes QA and Zheng reviewed it.

          thanks, pete

          Show
          Pete Wyckoff added a comment - Doug or Dhruba, I think this is ready to be committed as it passes QA and Zheng reviewed it. thanks, pete
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Pete!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Pete!
          Hide
          Pete Wyckoff added a comment -

          just for documentation, here's the output of the unit tests.

          Show
          Pete Wyckoff added a comment - just for documentation, here's the output of the unit tests.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )

            People

            • Assignee:
              Pete Wyckoff
              Reporter:
              Pete Wyckoff
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development