Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-860

fuse-dfs truncate behavior causes issues with scp

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0-alpha
    • Component/s: fuse-dfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      fuse-dfs truncate scp

      Description

      For whatever reason, scp issues a "truncate" once it's written a file to truncate the file to the # of bytes it has written (i.e., if a file is X bytes, it calls truncate(X)).

      This fails on the current fuse-dfs.

      1. HDFS-860.patch
        0.5 kB
        Brian Bockelman
      2. hdfs-860.txt
        0.7 kB
        Eli Collins

        Activity

        Hide
        Brian Bockelman added a comment -

        Attaching a simple patch to get around this problem - silently suppress the error if you call truncate with non-zero size.

        This patch should be considered carefully; for our local community, the benefit (scp can be used to copy files onto a remote HDFS mount) outweighs the cost (breaking error codes for the truncate call).

        I primarily wanted to get this issue and patch documented for others to potentially use (and to make sure it has proper licensing

        Show
        Brian Bockelman added a comment - Attaching a simple patch to get around this problem - silently suppress the error if you call truncate with non-zero size. This patch should be considered carefully; for our local community, the benefit (scp can be used to copy files onto a remote HDFS mount) outweighs the cost (breaking error codes for the truncate call). I primarily wanted to get this issue and patch documented for others to potentially use (and to make sure it has proper licensing
        Hide
        Eli Collins added a comment -

        +1 Looks good to me. Verified that the patch fixes the scp issue on my host. Updated the revive the fuse-dfs test to cover truncate.

        Show
        Eli Collins added a comment - +1 Looks good to me. Verified that the patch fixes the scp issue on my host. Updated the revive the fuse-dfs test to cover truncate.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429047/HDFS-860.patch
        against trunk revision 899456.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429047/HDFS-860.patch against trunk revision 899456. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/98/console This message is automatically generated.
        Hide
        Tom White added a comment -

        Brian,

        Another solution would be to return 0 if the size to truncate to is the same as the file's size. This would cover the scp case, without breaking error codes, no?

        BTW what OS are you using? I haven't been able to reproduce this on Centos or Ubuntu, as scp doesn't seem to be calling truncate.

        Show
        Tom White added a comment - Brian, Another solution would be to return 0 if the size to truncate to is the same as the file's size. This would cover the scp case, without breaking error codes, no? BTW what OS are you using? I haven't been able to reproduce this on Centos or Ubuntu, as scp doesn't seem to be calling truncate.
        Hide
        Jakob Homan added a comment -

        Canceling patch, to get feedback from Brian if Tom's suggestion is workable.

        Show
        Jakob Homan added a comment - Canceling patch, to get feedback from Brian if Tom's suggestion is workable.
        Hide
        Brian Bockelman added a comment -

        Hi,

        Jakob - thanks for the reminder.

        Tom:
        1) You can see 'scp' calling truncate by downloading from a remote server to a mounted FUSE HDFS instance, like so:

        [brian@red ~]$ strace scp brian-test:/tmp/hello_world /mnt/hadoop/dropfiles/test_scp 2>&1 | grep truncate
        ftruncate(3, 13) = 0

        2) IIRC, I tried your suggestion, but the size of the file in the namenode isn't updated until close() is called, right? [Actually, now that I say that out loud, I now suppose we can take advantage of the single-thread-writer rule and just track the number of bytes in the client? That seems doable upon 30 seconds of reflection at 10PM...]

        Show
        Brian Bockelman added a comment - Hi, Jakob - thanks for the reminder. Tom: 1) You can see 'scp' calling truncate by downloading from a remote server to a mounted FUSE HDFS instance, like so: [brian@red ~] $ strace scp brian-test:/tmp/hello_world /mnt/hadoop/dropfiles/test_scp 2>&1 | grep truncate ftruncate(3, 13) = 0 2) IIRC, I tried your suggestion, but the size of the file in the namenode isn't updated until close() is called, right? [Actually, now that I say that out loud, I now suppose we can take advantage of the single-thread-writer rule and just track the number of bytes in the client? That seems doable upon 30 seconds of reflection at 10PM...]
        Hide
        Eli Collins added a comment -

        Same patch but for trunk.

        Show
        Eli Collins added a comment - Same patch but for trunk.
        Hide
        Eli Collins added a comment -

        Details of how this fails on linux:

        ~ $ scp temp localhost:/mnt/fuse-dfs/user/eli/
        temp                                          100%    6     0.0KB/s   00:00    
        scp: /mnt/fuse-dfs/user/eli//temp: truncate: Operation not supported
        
        write[139718171864544] 6 bytes to 0 flags: 0x8001
           write[139718171864544] 6 bytes to 0
           unique: 168, success, outsize: 24
        unique: 169, opcode: SETATTR (4), nodeid: 7, insize: 128
        truncate /user/eli/temp 6
           unique: 169, error: -95 (Operation not supported), outsize: 16
        unique: 170, opcode: FLUSH (25), nodeid: 7, insize: 64
        

        Brian is right wrt #2, the hdfsFileInfo obtained by libhdfs will report size == 0 because we have yet to close the file, and we don't have an interface to get the size of the file according to the client. The attached patch gets scp to a fuse mount working for me so I'm going to commit this per my earlier +1, filed HDFS-3431 to improve truncate per the comments.

        Show
        Eli Collins added a comment - Details of how this fails on linux: ~ $ scp temp localhost:/mnt/fuse-dfs/user/eli/ temp 100% 6 0.0KB/s 00:00 scp: /mnt/fuse-dfs/user/eli//temp: truncate: Operation not supported write[139718171864544] 6 bytes to 0 flags: 0x8001 write[139718171864544] 6 bytes to 0 unique: 168, success, outsize: 24 unique: 169, opcode: SETATTR (4), nodeid: 7, insize: 128 truncate /user/eli/temp 6 unique: 169, error: -95 (Operation not supported), outsize: 16 unique: 170, opcode: FLUSH (25), nodeid: 7, insize: 64 Brian is right wrt #2, the hdfsFileInfo obtained by libhdfs will report size == 0 because we have yet to close the file, and we don't have an interface to get the size of the file according to the client. The attached patch gets scp to a fuse mount working for me so I'm going to commit this per my earlier +1, filed HDFS-3431 to improve truncate per the comments.
        Hide
        Eli Collins added a comment -

        I've committed this to trunk and merged to branch-2. Thanks Brian!

        Show
        Eli Collins added a comment - I've committed this to trunk and merged to branch-2. Thanks Brian!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2329 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2329/)
        HDFS-860. fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2329 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2329/ ) HDFS-860 . fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2255 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2255/)
        HDFS-860. fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2255 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2255/ ) HDFS-860 . fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Hide
        Colin Patrick McCabe added a comment -

        ...the size of the file in the namenode isn't updated until close() is called, right? [Actually, now that I say that out loud, I now suppose we can take advantage of the single-thread-writer rule and just track the number of bytes in the client? That seems doable upon 30 seconds of reflection at 10PM...]

        Is there a JIRA for this? A lot of applications might be calling fstat on the file descriptor that they have open (and are writing to), and giving them the wrong size is... unfortunate.

        Show
        Colin Patrick McCabe added a comment - ...the size of the file in the namenode isn't updated until close() is called, right? [Actually, now that I say that out loud, I now suppose we can take advantage of the single-thread-writer rule and just track the number of bytes in the client? That seems doable upon 30 seconds of reflection at 10PM...] Is there a JIRA for this? A lot of applications might be calling fstat on the file descriptor that they have open (and are writing to), and giving them the wrong size is... unfortunate.
        Hide
        Eli Collins added a comment -

        We have HdfsDataInputstream#getVisibleLength, it's not plumbed through to libhdfs.

        Show
        Eli Collins added a comment - We have HdfsDataInputstream#getVisibleLength, it's not plumbed through to libhdfs.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2272 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2272/)
        HDFS-860. fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413)

        Result = ABORTED
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2272 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2272/ ) HDFS-860 . fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413) Result = ABORTED eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1048 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1048/)
        HDFS-860. fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1048 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1048/ ) HDFS-860 . fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1082 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1082/)
        HDFS-860. fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1082 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1082/ ) HDFS-860 . fuse-dfs truncate behavior causes issues with scp. Contributed by Brian Bockelman (Revision 1339413) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1339413 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c

          People

          • Assignee:
            Brian Bockelman
            Reporter:
            Brian Bockelman
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development