Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-466

hdfs_write infinite loop when dfs fails and cannot write files > 2 GB

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      1. hdfs_write does not check hdfsWrite return code so -1 return code is ignored.
      2. hdfs_write uses int for overall file length

      1. HADOOP-4619.txt
        2 kB
        Pete Wyckoff
      2. HADOOP-4619.txt
        2 kB
        Pete Wyckoff
      3. HADOOP-4619.txt
        2 kB
        Pete Wyckoff
      4. HADOOP-4619.txt
        2 kB
        Pete Wyckoff
      5. HADOOP-4619.txt
        1 kB
        Pete Wyckoff
      6. HADOOP-4619.txt
        1 kB
        Pete Wyckoff
      7. HADOOP-4619.txt
        1 kB
        Pete Wyckoff

        Activity

        Hide
        Tom White added a comment -

        I've just committed this. Thanks Pete!

        Show
        Tom White added a comment - I've just committed this. Thanks Pete!
        Hide
        Eli Collins added a comment -

        lgtm.

        Tested the latest patch:

        • On trunk with hdfs_write temp $((1024*1024*1024*3)) $((1024*1024)) and confirmed that it created a 3gb file
        • Ran the unit tests on trunk though they're blocked by HDFS-940 (libhdfs test uses UnixUserGroupInformation)
        • Applied to branch 20 and ran the unit tests there which passed
        Show
        Eli Collins added a comment - lgtm. Tested the latest patch: On trunk with hdfs_write temp $((1024*1024*1024*3)) $((1024*1024)) and confirmed that it created a 3gb file Ran the unit tests on trunk though they're blocked by HDFS-940 (libhdfs test uses UnixUserGroupInformation) Applied to branch 20 and ran the unit tests there which passed
        Hide
        Craig Macdonald added a comment -

        Anything required to get this patch moving?

        Show
        Craig Macdonald added a comment - Anything required to get this patch moving?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12394183/HADOOP-4619.txt
        against trunk revision 719393.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394183/HADOOP-4619.txt against trunk revision 719393. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3613/console This message is automatically generated.
        Hide
        Brian Bockelman added a comment -

        +1 on the patch now.

        I suspect it's best that we follow the correct Unix way here; things were put there for a reason. Raghu, if you want largefile support on your 32-bit box, you need to compile with the CFLAG -D_FILE_OFFSET_BITS=64.

        Show
        Brian Bockelman added a comment - +1 on the patch now. I suspect it's best that we follow the correct Unix way here; things were put there for a reason. Raghu, if you want largefile support on your 32-bit box, you need to compile with the CFLAG -D_FILE_OFFSET_BITS=64.
        Hide
        Pete Wyckoff added a comment -

        It is still an improvement and doubles the size of the files that can be written from 2GB to 4GB on 32 bit machines.
        It would be nice if we could easily detect overflow, but there is no OFF_MAX.

        Show
        Pete Wyckoff added a comment - It is still an improvement and doubles the size of the files that can be written from 2GB to 4GB on 32 bit machines. It would be nice if we could easily detect overflow, but there is no OFF_MAX.
        Hide
        Raghu Angadi added a comment -

        If 32-bit int is ok for you then off_t is just fine.

        Show
        Raghu Angadi added a comment - If 32-bit int is ok for you then off_t is just fine.
        Hide
        Pete Wyckoff added a comment -

        off64_t is not defined on my box.

        i guess the right way to do this might be through autoconf to check for these typedefs? but i don't want to do that for this patch...
        suggestions?

        Show
        Pete Wyckoff added a comment - off64_t is not defined on my box. i guess the right way to do this might be through autoconf to check for these typedefs? but i don't want to do that for this patch... suggestions?
        Hide
        Raghu Angadi added a comment -

        > I can't think of a system where off_t would still be a 32-bit int...
        on my 32-bit Linux it is.

        Show
        Raghu Angadi added a comment - > I can't think of a system where off_t would still be a 32-bit int... on my 32-bit Linux it is.
        Hide
        Raghu Angadi added a comment -

        I think what you need is off64_t. Sorry, I should have commented earlier.

        Show
        Raghu Angadi added a comment - I think what you need is off64_t. Sorry, I should have commented earlier.
        Hide
        Pete Wyckoff added a comment -

        one more this changes long long to off_t. Hudson hasn't run on this yet so we should be ok.

        Show
        Pete Wyckoff added a comment - one more this changes long long to off_t. Hudson hasn't run on this yet so we should be ok.
        Hide
        Brian Bockelman added a comment -

        Hey Pete,

        I think Hudson will kill us if we post another patch but ....

        There are two issues:
        1) Size of the file to create: this should be off_t (look at the struct defined in stat.h: http://www.opengroup.org/onlinepubs/000095399/basedefs/sys/stat.h.html)
        2) Possible size of the reads returned by hdfsWrite; libhdfs defines this to be tSize. Currently, it's an unsigned int32 (I think), but really should be ssize_t.

        We can't fix (2) without fixing libhdfs. We can fix (1); however, I can't think of a system where off_t would still be a 32-bit int...

        Show
        Brian Bockelman added a comment - Hey Pete, I think Hudson will kill us if we post another patch but .... There are two issues: 1) Size of the file to create: this should be off_t (look at the struct defined in stat.h: http://www.opengroup.org/onlinepubs/000095399/basedefs/sys/stat.h.html ) 2) Possible size of the reads returned by hdfsWrite; libhdfs defines this to be tSize. Currently, it's an unsigned int32 (I think), but really should be ssize_t. We can't fix (2) without fixing libhdfs. We can fix (1); however, I can't think of a system where off_t would still be a 32-bit int...
        Hide
        Pete Wyckoff added a comment -

        answering my own question - going to long long so we can write > 4 GB on 32 bit machines.

        Show
        Pete Wyckoff added a comment - answering my own question - going to long long so we can write > 4 GB on 32 bit machines.
        Hide
        Pete Wyckoff added a comment -

        good catch - attaching new patch that does the error checking on the size relative to SIZE_MAX rather than ULONG_MAX.

        This does mean however that the size that can be written on a 32 bit machine is only 4GB. Should we change from using size_t to using long long in this case?

        Show
        Pete Wyckoff added a comment - good catch - attaching new patch that does the error checking on the size relative to SIZE_MAX rather than ULONG_MAX. This does mean however that the size that can be written on a 32 bit machine is only 4GB. Should we change from using size_t to using long long in this case?
        Hide
        Raghu Angadi added a comment -

        size_t is is a 32 bit unsigned int. Is that ok?

        Show
        Raghu Angadi added a comment - size_t is is a 32 bit unsigned int. Is that ok?
        Hide
        Pete Wyckoff added a comment -

        re-submitting for hudson.

        Show
        Pete Wyckoff added a comment - re-submitting for hudson.
        Hide
        Pete Wyckoff added a comment -

        also tested size of buffer and file size to ensure no overflow.

        Show
        Pete Wyckoff added a comment - also tested size of buffer and file size to ensure no overflow.
        Hide
        Pete Wyckoff added a comment -

        man write shows:

               ssize_t write(int fd, const void *buf, size_t count);
        

        so you are correct - should be size_t. I will fix that. Should also probably change hdfs.h tSize to be size_t if it is defined, else int64 - but should probably be a separate JIRA and not something to go into 0.19.1.

        i will submit a new patch for this.

        – pete

        Show
        Pete Wyckoff added a comment - man write shows: ssize_t write( int fd, const void *buf, size_t count); so you are correct - should be size_t. I will fix that. Should also probably change hdfs.h tSize to be size_t if it is defined, else int64 - but should probably be a separate JIRA and not something to go into 0.19.1. i will submit a new patch for this. – pete
        Hide
        Brian Bockelman added a comment -

        Update: I'm not sure if it's better to have off_t or ssize_t; I always get the two confused.

        Brian

        Show
        Brian Bockelman added a comment - Update: I'm not sure if it's better to have off_t or ssize_t; I always get the two confused. Brian
        Hide
        Brian Bockelman added a comment -

        Hey Pete,

        Reviewed the patch: why did you use "long long" for the file size instead of type "off_t"?

        Other than that, patch looks good (and it's important not to have bad examples!)

        Brian

        Show
        Brian Bockelman added a comment - Hey Pete, Reviewed the patch: why did you use "long long" for the file size instead of type "off_t"? Other than that, patch looks good (and it's important not to have bad examples!) Brian
        Hide
        Pete Wyckoff added a comment -

        think this is a pretty big bug and poor model for user's of libhdfs writing their own code to do writes so should go in 19.1 and 20.

        Show
        Pete Wyckoff added a comment - think this is a pretty big bug and poor model for user's of libhdfs writing their own code to do writes so should go in 19.1 and 20.
        Hide
        Pete Wyckoff added a comment -

        there is no java code in the patch, so the -1 for javadoc isn't a problem with this patch.

        Show
        Pete Wyckoff added a comment - there is no java code in the patch, so the -1 for javadoc isn't a problem with this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12393544/HADOOP-4619.txt
        against trunk revision 712344.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393544/HADOOP-4619.txt against trunk revision 712344. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3557/console This message is automatically generated.
        Hide
        Pete Wyckoff added a comment -

        little cleaner patch.

        Show
        Pete Wyckoff added a comment - little cleaner patch.
        Hide
        Pete Wyckoff added a comment -

        1. added check for hdfsWrite return being less than 0 and if so, break out of loop
        2. changed tSize size to long long size

        No new unit test as not practical to write > 2 GB file in test or fail dfs after open and during write.

        Also, this is just a sample program , not the library itself.

        Show
        Pete Wyckoff added a comment - 1. added check for hdfsWrite return being less than 0 and if so, break out of loop 2. changed tSize size to long long size No new unit test as not practical to write > 2 GB file in test or fail dfs after open and during write. Also, this is just a sample program , not the library itself.

          People

          • Assignee:
            Pete Wyckoff
            Reporter:
            Pete Wyckoff
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development