Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/raid
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The RaidNode currently iterates over the directory structure to figure out which files to RAID. With millions of files, this can take a long time - especially if some files are already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine if the file needs to be RAIDed.

      The directory traversal is encapsulated inside the class DirectoryTraversal, which examines one file at a time, using the caller's thread.

      My proposal is to make this multi-threaded as follows:

      • use a pool of threads inside DirectoryTraversal
      • The caller's thread is used to retrieve directories, and each new directory is assigned to a thread in the pool. The worker thread examines all the files the directory.
      • If there sub-directories, those are added back as workitems to the pool.

      Comments?

      1. MAPREDUCE-2167.patch
        7 kB
        Ramkumar Vadali
      2. MAPREDUCE-2167.4.patch
        8 kB
        Ramkumar Vadali
      3. MAPREDUCE-2167.3.patch
        6 kB
        Ramkumar Vadali
      4. MAPREDUCE-2167.2.patch
        6 kB
        Ramkumar Vadali

        Activity

        Hide
        dhruba borthakur added a comment -

        This approach looks fine to me, the changes u mention are all encapsulated inside DirectoryTraversal.java, isn;t it?
        The only drawback is that if you bump up the number of threads to high, then the load on the NN would increase dramatically.

        Show
        dhruba borthakur added a comment - This approach looks fine to me, the changes u mention are all encapsulated inside DirectoryTraversal.java, isn;t it? The only drawback is that if you bump up the number of threads to high, then the load on the NN would increase dramatically.
        Hide
        Ramkumar Vadali added a comment -

        yes, the changes will be restricted to DirectoryTraversal.java

        Show
        Ramkumar Vadali added a comment - yes, the changes will be restricted to DirectoryTraversal.java
        Hide
        Ramkumar Vadali added a comment -

        This patch implements the following fix:

        • the signature of getFilteredFiles() does not change
        • the caller's thread is used to get the next directories
        • a thread pool is used to process the files in the directory
        Show
        Ramkumar Vadali added a comment - This patch implements the following fix: the signature of getFilteredFiles() does not change the caller's thread is used to get the next directories a thread pool is used to process the files in the directory
        Hide
        Scott Chen added a comment -

        We can use a BlockingQueue to make the patch simpler.

        Show
        Scott Chen added a comment - We can use a BlockingQueue to make the patch simpler.
        Hide
        Ramkumar Vadali added a comment -

        Using a semaphore now to track the active threads. The logic is much simpler now.

        Show
        Ramkumar Vadali added a comment - Using a semaphore now to track the active threads. The logic is much simpler now.
        Hide
        Scott Chen added a comment -

        +1 Looks good to me.
        Just one more thing, can you add some comments explaining the motivation of using the semaphore?
        It is confusing when you are using both the thread pool and semaphore.

        Show
        Scott Chen added a comment - +1 Looks good to me. Just one more thing, can you add some comments explaining the motivation of using the semaphore? It is confusing when you are using both the thread pool and semaphore.
        Hide
        Ramkumar Vadali added a comment -

        Added a comment explaining the use of the slots semaphore.

        Show
        Ramkumar Vadali added a comment - Added a comment explaining the use of the slots semaphore.
        Hide
        Scott Chen added a comment -

        Thanks Ram. I will commit it once hudson returns 0.

        Show
        Scott Chen added a comment - Thanks Ram. I will commit it once hudson returns 0.
        Hide
        Ramkumar Vadali added a comment -

        Fixed a broken test.

        TEST RESULTS:

        ant test-patch has the same number of failures as a clean checkout

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     -1 findbugs.  The patch appears to introduce 13 new Findbugs warnings.
             [exec]
             [exec]     -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).
             [exec]
             [exec]     +1 system test framework.  The patch passed system test framework compile.
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]     Finished build.
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]
             [exec]
        

        ant test succeeds:

        
        
        test-junit:
            [junit] WARNING: multiple versions of ant detected in path for junit
            [junit]          jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
            [junit]      and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
            [junit] Running org.apache.hadoop.hdfs.TestRaidDfs
            [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec
            [junit] Running org.apache.hadoop.raid.TestBlockFixer
            [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec
            [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec
            [junit] Running org.apache.hadoop.raid.TestErasureCodes
            [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec
            [junit] Running org.apache.hadoop.raid.TestGaloisField
            [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec
            [junit] Running org.apache.hadoop.raid.TestHarIndexParser
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
            [junit] Running org.apache.hadoop.raid.TestRaidFilter
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec
            [junit] Running org.apache.hadoop.raid.TestRaidHar
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec
            [junit] Running org.apache.hadoop.raid.TestRaidNode
            [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec
            [junit] Running org.apache.hadoop.raid.TestRaidPurge
            [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec
            [junit] Running org.apache.hadoop.raid.TestRaidShell
            [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec
        
        test:
        
        BUILD SUCCESSFUL
        Total time: 15 minutes 6 seconds
        
        Show
        Ramkumar Vadali added a comment - Fixed a broken test. TEST RESULTS: ant test-patch has the same number of failures as a clean checkout [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 13 new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] ant test succeeds: test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec test: BUILD SUCCESSFUL Total time: 15 minutes 6 seconds
        Hide
        Scott Chen added a comment -

        I just committed this. Thanks Ram.

        Show
        Scott Chen added a comment - I just committed this. Thanks Ram.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #540 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/540/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #540 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/540/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )

          People

          • Assignee:
            Ramkumar Vadali
            Reporter:
            Ramkumar Vadali
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development