Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None

      Description

      There are two inefficiencies in the Trash functionality right now that have caused some problems for us.
      First if you configured your trash interval to be one day (24 hours) that means that you store 2 days worth of data eventually. The Current and the previous timestamp that will not be deleted until the end of the interval.
      And another problem is accumulating a lot of data in Trash before the Emptier wakes up. If there are a couple of million files trashed and the Emptier does deletion on HDFS the NameNode will freeze until everything is removed. (this particular problem hopefully will be addressed with HDFS-1143).

      My proposal is to have two configuration intervals. One for deleting the trashed data and another for checkpointing. This way for example for intervals of one day and one hour we will only store 25 hours of data instead of 48 right now and the deletions will be happening in smaller chunks every hour of the day instead of a huge deletion at the end of the day now.

      1. HADOOP-6761.5.patch
        11 kB
        Dmytro Molkov
      2. HADOOP-6761.4.patch
        11 kB
        Dmytro Molkov
      3. HADOOP-6761.3.patch
        11 kB
        Dmytro Molkov
      4. HADOOP-6761.2.patch
        11 kB
        Dmytro Molkov
      5. HADOOP-6761.patch
        4 kB
        Dmytro Molkov

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #346 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/346/)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #346 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/346/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/255/)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/255/ )
        Hide
        dhruba borthakur added a comment -

        I just committed this. Thanks Dmytro!

        Show
        dhruba borthakur added a comment - I just committed this. Thanks Dmytro!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12444743/HADOOP-6761.5.patch
        against trunk revision 944521.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444743/HADOOP-6761.5.patch against trunk revision 944521. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/526/console This message is automatically generated.
        Hide
        Dmytro Molkov added a comment -

        This should fix the javac warning

        Show
        Dmytro Molkov added a comment - This should fix the javac warning
        Hide
        Doug Cutting added a comment -

        This looks good to me, except for the compiler warning Hudson notes.

        Show
        Doug Cutting added a comment - This looks good to me, except for the compiler warning Hudson notes.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12444737/HADOOP-6761.4.patch
        against trunk revision 944521.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 1018 javac compiler warnings (more than the trunk's current 1017 warnings).

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444737/HADOOP-6761.4.patch against trunk revision 944521. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1018 javac compiler warnings (more than the trunk's current 1017 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/523/console This message is automatically generated.
        Hide
        Dmytro Molkov added a comment -

        Addressing all the comments

        Show
        Dmytro Molkov added a comment - Addressing all the comments
        Hide
        Doug Cutting added a comment -

        Overall this looks good. Some comments:

        • the unit test takes a full minute. might we change it to only take 10 seconds or so? that would require changing both parameters to floats, which i don't think is unreasonable.
        • when the emptier interval is misconfigured, shouldn't we print a warning?
        • the new config parameter might better be called 'checkpoint' rather than 'check', since checking and checkpointing mean very different things for a filesystem.
        Show
        Doug Cutting added a comment - Overall this looks good. Some comments: the unit test takes a full minute. might we change it to only take 10 seconds or so? that would require changing both parameters to floats, which i don't think is unreasonable. when the emptier interval is misconfigured, shouldn't we print a warning? the new config parameter might better be called 'checkpoint' rather than 'check', since checking and checkpointing mean very different things for a filesystem.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12444715/HADOOP-6761.3.patch
        against trunk revision 944521.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 1018 javac compiler warnings (more than the trunk's current 1017 warnings).

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444715/HADOOP-6761.3.patch against trunk revision 944521. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1018 javac compiler warnings (more than the trunk's current 1017 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/console This message is automatically generated.
        Hide
        Dmytro Molkov added a comment -

        Submitting to Hudson, can someone review the code please?

        Show
        Dmytro Molkov added a comment - Submitting to Hudson, can someone review the code please?
        Hide
        Dmytro Molkov added a comment -

        To have backward compatible behaviour the default value of the emptier interval should be 0, then the trash value interval is used for checkpointing.

        Show
        Dmytro Molkov added a comment - To have backward compatible behaviour the default value of the emptier interval should be 0, then the trash value interval is used for checkpointing.
        Hide
        Dmytro Molkov added a comment -

        This patch has a unittest and the modifications to core-default.

        The idea of the test is to run Emptier and keep deleting files until we have enough checkpoints that the old ones are getting deleted.
        This tests that we are creating multiple checkpoints for these values of the intervals and that we are deleting the older checkpoints correctly.

        I am not including modifications to the documentation here since documentation is in different project (HDFS) and it will require a different jira.

        Show
        Dmytro Molkov added a comment - This patch has a unittest and the modifications to core-default. The idea of the test is to run Emptier and keep deleting files until we have enough checkpoints that the old ones are getting deleted. This tests that we are creating multiple checkpoints for these values of the intervals and that we are deleting the older checkpoints correctly. I am not including modifications to the documentation here since documentation is in different project (HDFS) and it will require a different jira.
        Hide
        dhruba borthakur added a comment -

        Ok, sounds good.

        The current description says "The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface." This is old information anyways and can be updated. I agree.

        Show
        dhruba borthakur added a comment - Ok, sounds good. The current description says "The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface." This is old information anyways and can be updated. I agree.
        Hide
        Ravi Phulari added a comment -

        Dhruba, I am fine with adding completely new section titled "Trash behavior" and describing HDFS trash in that.
        But the Space Reclamation in HDFS Architecture Guide (hdfs_design.html) already talks about Trash behavior and user facing details of trash along with data retention policy.

        I think adding trash section on user guide will be duplication of existing section from hdfs design.

        Show
        Ravi Phulari added a comment - Dhruba, I am fine with adding completely new section titled "Trash behavior" and describing HDFS trash in that. But the Space Reclamation in HDFS Architecture Guide (hdfs_design.html) already talks about Trash behavior and user facing details of trash along with data retention policy. I think adding trash section on user guide will be duplication of existing section from hdfs design.
        Hide
        Doug Cutting added a comment -

        > Can anyone think of a clean and nice way to override these so that we can have a quick test that tests everything?

        One possibility might be to interpret the configuration values as floating point doubles so that you can compatibly configure sub-minute intervals.

        Show
        Doug Cutting added a comment - > Can anyone think of a clean and nice way to override these so that we can have a quick test that tests everything? One possibility might be to interpret the configuration values as floating point doubles so that you can compatibly configure sub-minute intervals.
        Hide
        dhruba borthakur added a comment -

        Hi Ravi, the HDFS Architecture Guide (hdfs_design.html) does not actually list the various configuration parameters for HDFS; that document is very high level and we would like to keep it that way, isn't it?

        One could add a section titled "Trash behavour" in the user guide (hdfs_user_guide.html), would you be ok with that?

        Show
        dhruba borthakur added a comment - Hi Ravi, the HDFS Architecture Guide (hdfs_design.html) does not actually list the various configuration parameters for HDFS; that document is very high level and we would like to keep it that way, isn't it? One could add a section titled "Trash behavour" in the user guide (hdfs_user_guide.html), would you be ok with that?
        Hide
        Ravi Phulari added a comment -

        Thanks Dymtro, Along with adding description to default conf this information should be documented in Space Reclamation section of HDFS Architecture Guide (hdfs_design.html) page.

        Show
        Ravi Phulari added a comment - Thanks Dymtro, Along with adding description to default conf this information should be documented in Space Reclamation section of HDFS Architecture Guide (hdfs_design.html) page.
        Hide
        Dmytro Molkov added a comment -

        Sure Ravi, what is it that you want me to modify? I can add this new parameter to the default conf with description. Would that be enough? Or is there an actual trash documentation that you want me to update?

        Show
        Dmytro Molkov added a comment - Sure Ravi, what is it that you want me to modify? I can add this new parameter to the default conf with description. Would that be enough? Or is there an actual trash documentation that you want me to update?
        Hide
        Ravi Phulari added a comment -

        Dymtro , Could you please include user facing documentation in this patch.

        Show
        Ravi Phulari added a comment - Dymtro , Could you please include user facing documentation in this patch.
        Hide
        Dmytro Molkov added a comment -

        The fix itself is very simple. Please see patch attached. Have two configuration parameters instead of one. The rest of the code should work as is. Since we only delete something that has a timestamp older than now - interval. Which would mean that only the 25-th hour would get deleted in my example of 24 hours retention and 1 hour checkpointing.
        And checkpointing happens every time the Emptier starts. So that part should work fine too.

        The only problem is writing the test for this one. Since the Trash and Emptier have minute long granularity and checkpoint format is the timestamp up to the minute this test will need to run for a couple of minutes to finish.
        Can anyone think of a clean and nice way to override these so that we can have a quick test that tests everything?

        Thanks

        Show
        Dmytro Molkov added a comment - The fix itself is very simple. Please see patch attached. Have two configuration parameters instead of one. The rest of the code should work as is. Since we only delete something that has a timestamp older than now - interval. Which would mean that only the 25-th hour would get deleted in my example of 24 hours retention and 1 hour checkpointing. And checkpointing happens every time the Emptier starts. So that part should work fine too. The only problem is writing the test for this one. Since the Trash and Emptier have minute long granularity and checkpoint format is the timestamp up to the minute this test will need to run for a couple of minutes to finish. Can anyone think of a clean and nice way to override these so that we can have a quick test that tests everything? Thanks
        Hide
        Doug Cutting added a comment -

        +1 When I originally implemented this I elected to use a single parameter only to keep things simpler. Logically they can be distinct.

        Show
        Doug Cutting added a comment - +1 When I originally implemented this I elected to use a single parameter only to keep things simpler. Logically they can be distinct.

          People

          • Assignee:
            Dmytro Molkov
            Reporter:
            Dmytro Molkov
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development