Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1568

TrackerDistributedCacheManager should clean up cache in a background thread

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      MAPREDUCE-1568. TrackerDistributedCacheManager should clean up cache in a background thread. (Scott Chen via zshao)
      Show
      MAPREDUCE-1568 . TrackerDistributedCacheManager should clean up cache in a background thread. (Scott Chen via zshao)

      Description

      Right now the TrackerDistributedCacheManager do the clean up with the following code path:

      TaskRunner.run() -> 
      TrackerDistributedCacheManager.setup() ->
      TrackerDistributedCacheManager.getLocalCache() -> 
      TrackerDistributedCacheManager.deleteCache()
      

      The deletion of the cache files can take a long time and it should not be done by a task. We suggest that there should be a separate thread checking and clean up the cache files.

      1. MAPREDUCE-1568-v3.txt
        17 kB
        Scott Chen
      2. MAPREDUCE-1568-v3.1.txt
        17 kB
        Scott Chen
      3. MAPREDUCE-1568-v2.txt
        15 kB
        Scott Chen
      4. MAPREDUCE-1568-v2.1.txt
        16 kB
        Scott Chen
      5. MAPREDUCE-1568.txt
        6 kB
        Scott Chen

        Issue Links

          Activity

          Hide
          Scott Chen added a comment -

          In this patch:

          1. Add a Runnable called CacheFileCleanTask in TrackerDistributedCacheManager for deleting a collection of cache files.
          2. Use this to launch a thread to perform the file deletion in TrackerDistributedCacheManager.deleteCache().

          Show
          Scott Chen added a comment - In this patch: 1. Add a Runnable called CacheFileCleanTask in TrackerDistributedCacheManager for deleting a collection of cache files. 2. Use this to launch a thread to perform the file deletion in TrackerDistributedCacheManager.deleteCache().
          Hide
          Zheng Shao added a comment -

          +1

          Show
          Zheng Shao added a comment - +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442627/MAPREDUCE-1568.txt
          against trunk revision 936166.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442627/MAPREDUCE-1568.txt against trunk revision 936166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/129/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          I propose we should remove the deletion call from getLocalCache() itself. There should be a separate cleanup thread in TrackerDistributedCacheManager which monitors the disk space and starts deleting whenever disk goes high or number of subdirs increase.
          I don't think starting a thread for each deletion (currently done in the patch) is scalable, there could be race for deletion if two threads started by two getLocalCache() calls starts deleteCache().
          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - I propose we should remove the deletion call from getLocalCache() itself. There should be a separate cleanup thread in TrackerDistributedCacheManager which monitors the disk space and starts deleting whenever disk goes high or number of subdirs increase. I don't think starting a thread for each deletion (currently done in the patch) is scalable, there could be race for deletion if two threads started by two getLocalCache() calls starts deleteCache(). Thoughts?
          Hide
          Scott Chen added a comment -

          Hey Amareshwari,

          deleteCache will first get the global lock of all cache and put the one needs with zero reference count in toBeDeleted (this is done by you guys in MAPREDUCE-1098). And the asynchronous deletion will start from there.

          When the deletion condition is valid, only one task will get the global lock and after it comes out of the global lock the deletion condition will no longer valid. So there cannot be two threads deleting same set of cache at the same moment.

            private void deleteCache(Configuration conf) throws IOException {
              Collection<CacheStatus> toBeDeleted = new LinkedList<CacheStatus>();
              synchronized (cachedArchives) {  // Global lock of all caches
              // Find cache Status with refcount of zero and put them in to toBeDeleted
              }
          
              // do the deletion asynchronously, after releasing the global lock
              ...
              cacheFileCleaner.start();
            }
          

          A separate cleanup thread is another option. I think that will work fine as well. But that will require more change. I think the good thing about the current patch is that it is simple and safe.

          Show
          Scott Chen added a comment - Hey Amareshwari, deleteCache will first get the global lock of all cache and put the one needs with zero reference count in toBeDeleted (this is done by you guys in MAPREDUCE-1098 ). And the asynchronous deletion will start from there. When the deletion condition is valid, only one task will get the global lock and after it comes out of the global lock the deletion condition will no longer valid. So there cannot be two threads deleting same set of cache at the same moment. private void deleteCache(Configuration conf) throws IOException { Collection<CacheStatus> toBeDeleted = new LinkedList<CacheStatus>(); synchronized (cachedArchives) { // Global lock of all caches // Find cache Status with refcount of zero and put them in to toBeDeleted } // do the deletion asynchronously, after releasing the global lock ... cacheFileCleaner.start(); } A separate cleanup thread is another option. I think that will work fine as well. But that will require more change. I think the good thing about the current patch is that it is simple and safe.
          Hide
          Scott Chen added a comment -

          Amareshwari, Actually, I do not have a strong preference on whether to use another cleanup thread or not. I think it is more elegant because getLocalCache() will actually just do get local cache instead of checking and start the cleaning thread. But on the other hand the current change is simple and safe.

          What do you think?

          Show
          Scott Chen added a comment - Amareshwari, Actually, I do not have a strong preference on whether to use another cleanup thread or not. I think it is more elegant because getLocalCache() will actually just do get local cache instead of checking and start the cleaning thread. But on the other hand the current change is simple and safe. What do you think?
          Hide
          Amareshwari Sriramadasu added a comment -

          I agree that the current change is simple and safe, but I don't see any value getting added by the patch. The file deletion is already asynchronous in a sense, after MAPREDUCE-1302, which uses MRAsyncDiskService for deletion. Now, the new thread is added just for iterating over the deleteSet.
          I think this jira should remove the deletion logic from getLocalCache so that Task will just localize files it needs and the deletion is taken care by TrackerDIstributedCacheManager. Also, with security coming in, deletion of the distributed cache should not be in Task's space.

          Show
          Amareshwari Sriramadasu added a comment - I agree that the current change is simple and safe, but I don't see any value getting added by the patch. The file deletion is already asynchronous in a sense, after MAPREDUCE-1302 , which uses MRAsyncDiskService for deletion. Now, the new thread is added just for iterating over the deleteSet. I think this jira should remove the deletion logic from getLocalCache so that Task will just localize files it needs and the deletion is taken care by TrackerDIstributedCacheManager. Also, with security coming in, deletion of the distributed cache should not be in Task's space.
          Hide
          Scott Chen added a comment -

          Amareshwari,

          The reason we made this change is because we saw that even with MRAsyncDiskService, the task which performs the deletion can still get 600 second timeout.

          I agree with your point. The deletion of distributed cache should not be in Task's space. I will work on it and upload the patch soon. Thanks for your help.

          Show
          Scott Chen added a comment - Amareshwari, The reason we made this change is because we saw that even with MRAsyncDiskService, the task which performs the deletion can still get 600 second timeout. I agree with your point. The deletion of distributed cache should not be in Task's space. I will work on it and upload the patch soon. Thanks for your help.
          Hide
          Scott Chen added a comment -

          I have made the change based on Amareshwari's suggestion.

          1. Create a class called BaseDir and move the check and cleanup cache into this class.

          2. Create a thread called cleanup thread. It periodically check if any of the base directories requires needs cleanup and cleans it.

          3. The cleanup thread will clean only the base directory which exceeds the limit. In the old case, if one directory is full, it will cleanup all the caches no mater which base directory it is in.

          Show
          Scott Chen added a comment - I have made the change based on Amareshwari's suggestion. 1. Create a class called BaseDir and move the check and cleanup cache into this class. 2. Create a thread called cleanup thread. It periodically check if any of the base directories requires needs cleanup and cleans it. 3. The cleanup thread will clean only the base directory which exceeds the limit. In the old case, if one directory is full, it will cleanup all the caches no mater which base directory it is in.
          Hide
          Scott Chen added a comment -

          I have changed the title and description of this JIRA to fit our current idea.

          Show
          Scott Chen added a comment - I have changed the title and description of this JIRA to fit our current idea.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442905/MAPREDUCE-1568-v2.txt
          against trunk revision 938249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442905/MAPREDUCE-1568-v2.txt against trunk revision 938249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/142/console This message is automatically generated.
          Hide
          Scott Chen added a comment -

          Fix the warning generate in findbugs.
          The failed test is not related. There is a corresponding JIRA: MAPREDUCE-1727.

          Show
          Scott Chen added a comment - Fix the warning generate in findbugs. The failed test is not related. There is a corresponding JIRA: MAPREDUCE-1727 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442919/MAPREDUCE-1568-v2.1.txt
          against trunk revision 938324.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442919/MAPREDUCE-1568-v2.1.txt against trunk revision 938324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/145/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Functionally looks good. Some comments on the patch:

          • Shall we rename the class BaseDir to something like BaseDirManager?
          • cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment?
          • Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock.
          • catch(InterruptedException) block in CleanupThread can do a return instead of ignoring the exception.
          • I think cleanup thread should catch all Exceptions (not just IOException) because thread should never crash, similar to CleanupQueue.
          • Test case can call baseDir.checkAndCleanup() inline to avoid timing issues, what do you think?
          Show
          Amareshwari Sriramadasu added a comment - Functionally looks good. Some comments on the patch: Shall we rename the class BaseDir to something like BaseDirManager? cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment? Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock. catch(InterruptedException) block in CleanupThread can do a return instead of ignoring the exception. I think cleanup thread should catch all Exceptions (not just IOException) because thread should never crash, similar to CleanupQueue. Test case can call baseDir.checkAndCleanup() inline to avoid timing issues, what do you think?
          Hide
          Amareshwari Sriramadasu added a comment -

          Also, Can you run linux task controller tests and update the results here?

          Show
          Amareshwari Sriramadasu added a comment - Also, Can you run linux task controller tests and update the results here?
          Hide
          Scott Chen added a comment -

          Thanks for the comment. Your comments are very helpful. I wil update the patch soon.

          Show
          Scott Chen added a comment - Thanks for the comment. Your comments are very helpful. I wil update the patch soon.
          Hide
          Scott Chen added a comment -

          Hi Amareshwari, I have a stupid question.

          Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock.

          I also notice this in the code. It is guarded by cachedArchives everywhere in the code. But why locking the individual cachestatus is not enough?

          Show
          Scott Chen added a comment - Hi Amareshwari, I have a stupid question. Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock. I also notice this in the code. It is guarded by cachedArchives everywhere in the code. But why locking the individual cachestatus is not enough?
          Hide
          Scott Chen added a comment -

          cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment?

          What do you think if we make localizedBaseDir immutable by

          final Path localizedBaseDir;
          

          Then we can avoid this problem.

          Show
          Scott Chen added a comment - cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment? What do you think if we make localizedBaseDir immutable by final Path localizedBaseDir; Then we can avoid this problem.
          Hide
          Scott Chen added a comment -

          Sorry for the stupid question. I think I figure it out. It is our choice to pick the lock for a field. In this case we pick the global lock instead of the individual lock.

          I will add comments on each field in CacheStatus to indicate which fields are guarded by cachedArchieves and which are guarded by this. And the rest I will make them immutable.

          Show
          Scott Chen added a comment - Sorry for the stupid question. I think I figure it out. It is our choice to pick the lock for a field. In this case we pick the global lock instead of the individual lock. I will add comments on each field in CacheStatus to indicate which fields are guarded by cachedArchieves and which are guarded by this. And the rest I will make them immutable.
          Hide
          Scott Chen added a comment -

          Here's the results of running TestLinuxTaskController

          Testsuite: org.apache.hadoop.mapred.TestLinuxTaskController
          Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.149 sec 
          ------------- Standard Output ---------------
          2010-04-27 14:21:31,692 INFO  mapred.ClusterWithLinuxTaskController (ClusterWithLinuxTaskController.java:isTaskExecPathPassed(264)) - Invalid taskcontroller-path : ${taskcontroller-path}
          2010-04-27 14:21:31,695 INFO  mapred.ClusterWithLinuxTaskController (ClusterWithLinuxTaskController.java:isTaskExecPathPassed(264)) - Invalid taskcontroller-path : ${taskcontroller-path}
          ------------- ---------------- ---------------
          
          Testcase: testTaskControllerGroup took 0.135 sec 
          
          Show
          Scott Chen added a comment - Here's the results of running TestLinuxTaskController Testsuite: org.apache.hadoop.mapred.TestLinuxTaskController Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.149 sec ------------- Standard Output --------------- 2010-04-27 14:21:31,692 INFO mapred.ClusterWithLinuxTaskController (ClusterWithLinuxTaskController.java:isTaskExecPathPassed(264)) - Invalid taskcontroller-path : ${taskcontroller-path} 2010-04-27 14:21:31,695 INFO mapred.ClusterWithLinuxTaskController (ClusterWithLinuxTaskController.java:isTaskExecPathPassed(264)) - Invalid taskcontroller-path : ${taskcontroller-path} ------------- ---------------- --------------- Testcase: testTaskControllerGroup took 0.135 sec
          Hide
          Scott Chen added a comment -

          Made the change based on Amareshwari's suggestions.

          Shall we rename the class BaseDir to something like BaseDirManager?

          That's a very good name. I have changed it.

          cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment?

          I have made baseDir immutable

          Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock.

          I have add comments on all fields in CacheStatus to explain how to access them.

          catch(InterruptedException) block in CleanupThread can do a return instead of ignoring the exception.
          I think cleanup thread should catch all Exceptions (not just IOException) because thread should never crash, similar to CleanupQueue.

          Now it catches all exceptions, logs them and keeps running.

          Test case can call baseDir.checkAndCleanup() inline to avoid timing issues, what do you think?

          I feel that it is better to test the whole CleanupThread. That way we can make sure the thread actually does the right thing.

          Show
          Scott Chen added a comment - Made the change based on Amareshwari's suggestions. Shall we rename the class BaseDir to something like BaseDirManager? That's a very good name. I have changed it. cacheStatus.baseDir is being accessed under global lock in BaseDir.checkAndCleanup() method, every-where else it is accessed under CacheStatus lock. I think it is no harm. Can you also check this once and add a comment? I have made baseDir immutable Though not related to this issue, can you add a comment for CacheStatus.refCount saying it should be always accessed under the global cachedArchives lock. I have add comments on all fields in CacheStatus to explain how to access them. catch(InterruptedException) block in CleanupThread can do a return instead of ignoring the exception. I think cleanup thread should catch all Exceptions (not just IOException) because thread should never crash, similar to CleanupQueue. Now it catches all exceptions, logs them and keeps running. Test case can call baseDir.checkAndCleanup() inline to avoid timing issues, what do you think? I feel that it is better to test the whole CleanupThread. That way we can make sure the thread actually does the right thing.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12443005/MAPREDUCE-1568-v3.txt
          against trunk revision 938576.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443005/MAPREDUCE-1568-v3.txt against trunk revision 938576. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/149/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          I also notice this in the code. It is guarded by cachedArchives everywhere in the code. But why locking the individual cachestatus is not enough?

          This was the decision made in MAPREDUCE-1098 when we separated global lock and individual cacheStatus lock to avoid race between deleteCache and getLocalCache.

          Looked at the patch. I just have one comment:
          In the testcase, checkCacheDeletion waits for 5 minutes. It seems very high. This might result into test timeouts because checkCacheDeletion is done twice in the test. I would say 30 second wait is good enough for failing the test, because thread sleep time is set to 100 milliseconds. To be strict, 200msec wait is also fine.

          Other changes in the patch look very good.

          Show
          Amareshwari Sriramadasu added a comment - I also notice this in the code. It is guarded by cachedArchives everywhere in the code. But why locking the individual cachestatus is not enough? This was the decision made in MAPREDUCE-1098 when we separated global lock and individual cacheStatus lock to avoid race between deleteCache and getLocalCache. Looked at the patch. I just have one comment: In the testcase, checkCacheDeletion waits for 5 minutes. It seems very high. This might result into test timeouts because checkCacheDeletion is done twice in the test. I would say 30 second wait is good enough for failing the test, because thread sleep time is set to 100 milliseconds. To be strict, 200msec wait is also fine. Other changes in the patch look very good.
          Hide
          Scott Chen added a comment -

          Thanks again for the comment.

          Looked at the patch. I just have one comment:
          In the testcase, checkCacheDeletion waits for 5 minutes. It seems very high. This might result into test timeouts because checkCacheDeletion is done twice in the test. I would say 30 second wait is good enough for failing the test, because thread sleep time is set to 100 milliseconds. To be strict, 200msec wait is also fine.

          Good point. I have made the change on the time constants in the test.

          Show
          Scott Chen added a comment - Thanks again for the comment. Looked at the patch. I just have one comment: In the testcase, checkCacheDeletion waits for 5 minutes. It seems very high. This might result into test timeouts because checkCacheDeletion is done twice in the test. I would say 30 second wait is good enough for failing the test, because thread sleep time is set to 100 milliseconds. To be strict, 200msec wait is also fine. Good point. I have made the change on the time constants in the test.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12443105/MAPREDUCE-1568-v3.1.txt
          against trunk revision 938805.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443105/MAPREDUCE-1568-v3.1.txt against trunk revision 938805. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/154/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 Patch looks good to me.

          Show
          Amareshwari Sriramadasu added a comment - +1 Patch looks good to me.
          Hide
          Amareshwari Sriramadasu added a comment -

          Here's the results of running TestLinuxTaskController

          Sorry, I meant all Linux task controller tests here and by passing taskcontroller-path and taskcontroller-ugi. I understand that running these tests is difficult until MAPREDUCE-1429 is fixed.
          I ran all linux task controller tests with the patch (both as tt user and some other user), all tests passed.

          Show
          Amareshwari Sriramadasu added a comment - Here's the results of running TestLinuxTaskController Sorry, I meant all Linux task controller tests here and by passing taskcontroller-path and taskcontroller-ugi. I understand that running these tests is difficult until MAPREDUCE-1429 is fixed. I ran all linux task controller tests with the patch (both as tt user and some other user), all tests passed.
          Hide
          Scott Chen added a comment -

          Sorry, I meant all Linux task controller tests here and by passing taskcontroller-path and taskcontroller-ugi. I understand that running these tests is difficult until MAPREDUCE-1429 is fixed.
          I ran all linux task controller tests with the patch (both as tt user and some other user), all tests passed.

          Sorry, I did not know about this. Thanks for the help.

          I will ask Zheng to see if he can commit this patch.

          Show
          Scott Chen added a comment - Sorry, I meant all Linux task controller tests here and by passing taskcontroller-path and taskcontroller-ugi. I understand that running these tests is difficult until MAPREDUCE-1429 is fixed. I ran all linux task controller tests with the patch (both as tt user and some other user), all tests passed. Sorry, I did not know about this. Thanks for the help. I will ask Zheng to see if he can commit this patch.
          Hide
          Scott Chen added a comment -

          And thank you very much for helping me on the patch, Amareshwari.
          I think the quality of the patch has been improved a lot compare to the original one.

          Show
          Scott Chen added a comment - And thank you very much for helping me on the patch, Amareshwari. I think the quality of the patch has been improved a lot compare to the original one.
          Hide
          Zheng Shao added a comment -

          Committed. Thanks Scott!

          Show
          Zheng Shao added a comment - Committed. Thanks Scott!

            People

            • Assignee:
              Scott Chen
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development