Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2495

The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: distributed-cache
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node.

      1. MAPREDUCE-2495-v4.patch
        5 kB
        Robert Joseph Evans
      2. MAPREDUCE-2495-20.20X-V4.patch
        5 kB
        Robert Joseph Evans
      3. MAPREDUCE-2495-v3.patch
        5 kB
        Robert Joseph Evans
      4. MAPREDUCE-2495-20.20X-V3.patch
        5 kB
        Robert Joseph Evans
      5. MAPREDUCE-2495-v2.patch
        5 kB
        Robert Joseph Evans
      6. MAPREDUCE-2495-20.20X-V2.patch
        5 kB
        Robert Joseph Evans
      7. MAPREDUCE-2495-20.20X-V1.patch
        5 kB
        Robert Joseph Evans
      8. MAPREDUCE-2495-v1.patch
        5 kB
        Robert Joseph Evans

        Activity

        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was just released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/)
        MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
        thread dies. Contributed by Robert Joseph Evans

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361
        Files :

        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java
        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/ ) MAPREDUCE-2495 . exit() the TaskTracker when the distributed cache cleanup thread dies. Contributed by Robert Joseph Evans cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361 Files : /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #698 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/698/)
        MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
        thread dies. Contributed by Robert Joseph Evans

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361
        Files :

        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java
        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #698 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/698/ ) MAPREDUCE-2495 . exit() the TaskTracker when the distributed cache cleanup thread dies. Contributed by Robert Joseph Evans cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127361 Files : /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TrackerDistributedCacheManager.java /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapreduce/filecache/TestTrackerDistributedCacheManager.java
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, Robert!

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, Robert!
        Hide
        Robert Joseph Evans added a comment -

        Tests no longer sleep

        Show
        Robert Joseph Evans added a comment - Tests no longer sleep
        Hide
        Robert Joseph Evans added a comment -

        Chris indicated as a side comment in a different conversation that the sleeps in the tests are not very good, so I am reworking the tests to avoid using sleep.

        Show
        Robert Joseph Evans added a comment - Chris indicated as a side comment in a different conversation that the sleeps in the tests are not very good, so I am reworking the tests to avoid using sleep.
        Hide
        Robert Joseph Evans added a comment -

        Just like before.
        The contrib test issues are with RAID, and appear to be a known issue.

        Show
        Robert Joseph Evans added a comment - Just like before. The contrib test issues are with RAID, and appear to be a known issue.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479790/MAPREDUCE-2495-v3.patch
        against trunk revision 1124553.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479790/MAPREDUCE-2495-v3.patch against trunk revision 1124553. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/274//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Incorporated Owens Comments.

        Show
        Robert Joseph Evans added a comment - Incorporated Owens Comments.
        Hide
        Robert Joseph Evans added a comment -

        OK, Will have an updated patch shortly. But just to clarify. You want the code to look like

        } catch (InterruptedException e)

        { LOG.info("Cleanup...",e); //To force us to exit cleanly running = false; }
        Show
        Robert Joseph Evans added a comment - OK, Will have an updated patch shortly. But just to clarify. You want the code to look like } catch (InterruptedException e) { LOG.info("Cleanup...",e); //To force us to exit cleanly running = false; }
        Hide
        Owen O'Malley added a comment -

        This looks good, except that you shouldn't do a shutdown for interruptedexception. Those are only thrown when another thread is trying to do a clean shutdown. Just log the exception as info and exit nicely.

        Show
        Owen O'Malley added a comment - This looks good, except that you shouldn't do a shutdown for interruptedexception. Those are only thrown when another thread is trying to do a clean shutdown. Just log the exception as info and exit nicely.
        Hide
        Robert Joseph Evans added a comment -

        The contrib test issues are with RAID, and appear to be a known issue.

        Show
        Robert Joseph Evans added a comment - The contrib test issues are with RAID, and appear to be a known issue.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479597/MAPREDUCE-2495-v2.patch
        against trunk revision 1104687.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479597/MAPREDUCE-2495-v2.patch against trunk revision 1104687. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/262//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Changed to exit the Task Tracker when an unexpected exception is thrown.

        Show
        Robert Joseph Evans added a comment - Changed to exit the Task Tracker when an unexpected exception is thrown.
        Hide
        Robert Joseph Evans added a comment -

        Will add in new patches incorporating new comments

        Show
        Robert Joseph Evans added a comment - Will add in new patches incorporating new comments
        Hide
        Robert Joseph Evans added a comment -

        Looks good, should have an updated patch shortly.

        Show
        Robert Joseph Evans added a comment - Looks good, should have an updated patch shortly.
        Hide
        Owen O'Malley added a comment -

        There are lots of places where we do it wrong, but in general HDFS is better.

        http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java look at the ReplicationMonitor.

        Show
        Owen O'Malley added a comment - There are lots of places where we do it wrong, but in general HDFS is better. http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java look at the ReplicationMonitor.
        Hide
        Robert Joseph Evans added a comment -

        Sorry, should have been a bit more verbose. I am fine with catching a throwable and shutting down the server. I just am not completely sure how to go about shutting down the task tracker appropriately. I will look through the code for an example, but a pointer would be helpful.

        Show
        Robert Joseph Evans added a comment - Sorry, should have been a bit more verbose. I am fine with catching a throwable and shutting down the server. I just am not completely sure how to go about shutting down the task tracker appropriately. I will look through the code for an example, but a pointer would be helpful.
        Hide
        Robert Joseph Evans added a comment -

        What is the proper way to shut down the server?

        Show
        Robert Joseph Evans added a comment - What is the proper way to shut down the server?
        Hide
        Owen O'Malley added a comment -

        This doesn't match the approach we use other places.

        All threads in the servers should have catch clauses for Throwable that log and then shutdown the server.

        Show
        Owen O'Malley added a comment - This doesn't match the approach we use other places. All threads in the servers should have catch clauses for Throwable that log and then shutdown the server.
        Hide
        Robert Joseph Evans added a comment -

        Please ignore the previous comment, the patch it is complaining about is not for trunk, but the 20 security branch.

        The following is from the 20 security branch

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.

        The javadocs issue is wrong, as both of them generated 6 warnings, and the Eclipse issue is a known issue.

        Show
        Robert Joseph Evans added a comment - Please ignore the previous comment, the patch it is complaining about is not for trunk, but the 20 security branch. The following is from the 20 security branch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. The javadocs issue is wrong, as both of them generated 6 warnings, and the Eclipse issue is a known issue.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479465/MAPREDUCE-2495-20.20X-V1.patch
        against trunk revision 1103993.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/257//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479465/MAPREDUCE-2495-20.20X-V1.patch against trunk revision 1103993. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/257//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Attaching patch for the 20.20X security line too.

        Show
        Robert Joseph Evans added a comment - Attaching patch for the 20.20X security line too.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479365/MAPREDUCE-2495-v1.patch
        against trunk revision 1103921.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479365/MAPREDUCE-2495-v1.patch against trunk revision 1103921. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/250//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Added simple patch to verify that the cleanup thread is always running when cache archives are being added in.

        Show
        Robert Joseph Evans added a comment - Added simple patch to verify that the cleanup thread is always running when cache archives are being added in.

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Robert Joseph Evans
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development