Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1892

RaidNode can allow layered policies more efficiently

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/raid
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The RaidNode policy file can have layered policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.

      This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.

      This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.

      So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.

      1. MAPREDUCE-1892.patch
        32 kB
        Ramkumar Vadali
      2. MAPREDUCE-1892.patch
        32 kB
        Ramkumar Vadali

        Activity

        Ramkumar Vadali created issue -
        Ramkumar Vadali made changes -
        Field Original Value New Value
        Summary RaidNode can identify processed files with lesser memory usage RaidNode can allow layered policies more efficiently
        Description The RaidNode policy file can have policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.

        This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.

        This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.

        So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.
        The RaidNode policy file can have layered policies that can cover a file more than once. To avoid processing a file multiple times (for RAIDing), RaidNode maintains a list of processed files that is used to avoid duplicate processing attempts.

        This is problematic in that a large number of processed files could cause the RaidNode to run out of memory.

        This task proposes a better method of detecting processed files. The method is based on the observation that a more selective policy will have a better match with a file name than a less selective one. Specifically, the more selective policy will have a longer common prefix with the file name.

        So to detect if a file has already been processed, the RaidNode only needs to maintain a list of processed policies and compare the lengths of the common prefixes. If the file has a longer common prefix with one of the processed policies than with the current policy, it can be assumed to be processed already.
        Ramkumar Vadali made changes -
        Assignee Ramkumar Vadali [ rvadali ]
        Hide
        Ramkumar Vadali added a comment -

        This patch implements the proposal described.

        TEST RESULTS:

        ant test-patch:

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]     +1 system tests framework.  The patch passed system tests framework compile.
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]     Finished build.
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]
             [exec]
        
        BUILD SUCCESSFUL
        Total time: 16 minutes 14 seconds
        

        ant test under src/contrib/raid:

        test-junit:
            [junit] WARNING: multiple versions of ant detected in path for junit
            [junit]          jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
            [junit]      and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
            [junit] Running org.apache.hadoop.hdfs.TestRaidDfs
            [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.868 sec
            [junit] Running org.apache.hadoop.raid.TestBlockFixer
            [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 135.269 sec
            [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.923 sec
            [junit] Running org.apache.hadoop.raid.TestErasureCodes
            [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.949 sec
            [junit] Running org.apache.hadoop.raid.TestGaloisField
            [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.397 sec
            [junit] Running org.apache.hadoop.raid.TestHarIndexParser
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
            [junit] Running org.apache.hadoop.raid.TestRaidFilter
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.476 sec
            [junit] Running org.apache.hadoop.raid.TestRaidHar
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.123 sec
            [junit] Running org.apache.hadoop.raid.TestRaidNode
            [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 466.8 sec
            [junit] Running org.apache.hadoop.raid.TestRaidPurge
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 108.928 sec
            [junit] Running org.apache.hadoop.raid.TestRaidShell
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.628 sec
        
        test:
        
        BUILD SUCCESSFUL
        Total time: 15 minutes 6 seconds
        
        
        Show
        Ramkumar Vadali added a comment - This patch implements the proposal described. TEST RESULTS: ant test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] BUILD SUCCESSFUL Total time: 16 minutes 14 seconds ant test under src/contrib/raid: test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.868 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 135.269 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.923 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.949 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.397 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.476 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.123 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 466.8 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 108.928 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.628 sec test: BUILD SUCCESSFUL Total time: 15 minutes 6 seconds
        Ramkumar Vadali made changes -
        Attachment MAPREDUCE-1892.patch [ 12458583 ]
        Ramkumar Vadali made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Scott Chen added a comment -
        +    List<PolicyInfo> allPolicies = null;
        

        We can remove this field because it is not used.

        +1 Looks good to me.

        Show
        Scott Chen added a comment - + List<PolicyInfo> allPolicies = null ; We can remove this field because it is not used. +1 Looks good to me.
        Hide
        Ramkumar Vadali added a comment -

        Removed unused field.

        This patch passes ant test and ant test-patch.

        Show
        Ramkumar Vadali added a comment - Removed unused field. This patch passes ant test and ant test-patch.
        Ramkumar Vadali made changes -
        Attachment MAPREDUCE-1892.patch [ 12458660 ]
        Hide
        Scott Chen added a comment -

        I just committed this. Thanks Ram.

        Show
        Scott Chen added a comment - I just committed this. Thanks Ram.
        Scott Chen made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #527 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/527/)
        MAPREDUCE-1892. RaidNode can allow layered policies more efficiently.
        (Ramkumar Vadali via schen)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #527 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/527/ ) MAPREDUCE-1892 . RaidNode can allow layered policies more efficiently. (Ramkumar Vadali via schen)
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )
        Konstantin Shvachko made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        130d 22h 47m 1 Ramkumar Vadali 01/Nov/10 21:04
        Patch Available Patch Available Resolved Resolved
        21h 27m 1 Scott Chen 02/Nov/10 18:31
        Resolved Resolved Closed Closed
        404d 11h 47m 1 Konstantin Shvachko 12/Dec/11 06:19

          People

          • Assignee:
            Ramkumar Vadali
            Reporter:
            Ramkumar Vadali
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development