Hadoop Common
  1. Hadoop Common
  2. HADOOP-3025

ChecksumFileSystem needs to support the new delete method

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: fs
    • Labels:
      None

      Description

      The method FileSystem.delete(path) has been deprecated in favor of the new method delete(path, recursive). Temporary files gets created in the MapReduce framework and when the time for deletion comes, they are deleted via delete(path, recursive). This doesn't delete the associated checksum files. This has a big impact when the FileSystem is the InMemoryFileSystem, where space is at a premium and wasting space here might hurt the performance of MapReduce jobs overall. One solution to this problem is to implement the method delete(path, recursive) in the ChecksumFileSystem but is there is a reason why it was left out as part of HADOOP-771?

      1. HADOOP_3025_4.patch
        7 kB
        Mahadev konar
      2. HADOOP_3025_3.patch
        6 kB
        Mahadev konar
      3. HADOOP_3025_2.patch
        1 kB
        Mahadev konar
      4. HADOOP_3025_1.patch
        1 kB
        Mahadev konar

        Issue Links

          Activity

          Hide
          Mahadev konar added a comment -

          this is a bug introduced by my patch to HADOOP-771. I will fix it as soon as possible.

          Show
          Mahadev konar added a comment - this is a bug introduced by my patch to HADOOP-771 . I will fix it as soon as possible.
          Hide
          Mahadev konar added a comment -

          attaching patch for this. I havent added a unit test. Will add shortly.. mukund can you try with this patch?

          Show
          Mahadev konar added a comment - attaching patch for this. I havent added a unit test. Will add shortly.. mukund can you try with this patch?
          Hide
          Mahadev konar added a comment -

          this patch gets rid of an unnecessary exception thrown by delete.

          Show
          Mahadev konar added a comment - this patch gets rid of an unnecessary exception thrown by delete.
          Hide
          Mahadev konar added a comment -

          this patch includes a test... and remves windows lien delimiters in the file also...

          Show
          Mahadev konar added a comment - this patch includes a test... and remves windows lien delimiters in the file also...
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • I think it is better to change FilterFileSystem.delete(Path f) (or even FileSystem.delete(Path f)) to make it calling FilterFileSystem.delete(f, true), instead of changing ChecksumFileSystem.delete(Path f)
          • In ChecksumFileSystem.delete(Path f, boolean recursive), if f is a directory, it calls fs.delete(f, recursive). I think the checksum files won't be deleted.
          • We need a test for deleting a tree for testing the recursive parameter.
          • In RawInMemoryFileSystem.delete(Path f, boolean recursive), the recursive parameter is ignored.
          Show
          Tsz Wo Nicholas Sze added a comment - I think it is better to change FilterFileSystem.delete(Path f) (or even FileSystem.delete(Path f)) to make it calling FilterFileSystem.delete(f, true), instead of changing ChecksumFileSystem.delete(Path f) In ChecksumFileSystem.delete(Path f, boolean recursive), if f is a directory, it calls fs.delete(f, recursive). I think the checksum files won't be deleted. We need a test for deleting a tree for testing the recursive parameter. In RawInMemoryFileSystem.delete(Path f, boolean recursive), the recursive parameter is ignored.
          Hide
          Mahadev konar added a comment -
          • makes sense for fileter filesystem to have it ...
          • this would work in our case since checkssum is in the same dir and file..
          • ill add a test case
          • raw in memory does not have an idea of directories ..
          Show
          Mahadev konar added a comment - makes sense for fileter filesystem to have it ... this would work in our case since checkssum is in the same dir and file.. ill add a test case raw in memory does not have an idea of directories ..
          Hide
          Mahadev konar added a comment -

          here is a patch with nicholas's comments incorporated.

          Show
          Mahadev konar added a comment - here is a patch with nicholas's comments incorporated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1

          Show
          Tsz Wo Nicholas Sze added a comment - +1
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378157/HADOOP_3025_4.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 4 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378157/HADOOP_3025_4.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1993/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Mahadev!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Mahadev!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #435 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/435/ )

            People

            • Assignee:
              Mahadev konar
              Reporter:
              Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development