Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1699

JobHistory shouldn't be disabled for any reason

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2
    • Fix Version/s: 0.20.203.0
    • Component/s: jobtracker
    • Labels:
      None

      Description

      Recently we have had issues with JobTracker silently disabling job-history and starting to keep all completed jobs in memory. This leads to OOM on the JobTracker. We should never do this.

      1. mapred-1699-8.patch
        58 kB
        Krishna Ramachandran
      2. mapred-1699-7.patch
        54 kB
        Krishna Ramachandran
      3. mapred-1699-5.patch
        54 kB
        Krishna Ramachandran
      4. mapred-1699-3.patch
        11 kB
        Krishna Ramachandran
      5. mapred-1699-2.patch
        14 kB
        Krishna Ramachandran
      6. mapred-1699-1.patch
        13 kB
        Krishna Ramachandran
      7. mapred-1699.patch
        8 kB
        Krishna Ramachandran

        Activity

        Hide
        Allen Wittenauer added a comment -

        We just hit this as well. Trying to build stats for capacity planning... and 'lo, we see no job stats.

        Show
        Allen Wittenauer added a comment - We just hit this as well. Trying to build stats for capacity planning... and 'lo, we see no job stats.
        Hide
        Sharad Agarwal added a comment -

        This has been fixed by MAPREDUCE-157 in 0.21

        Show
        Sharad Agarwal added a comment - This has been fixed by MAPREDUCE-157 in 0.21
        Hide
        Chris Douglas added a comment -

        It would be nice if this were fixed in the 0.20 branch

        Show
        Chris Douglas added a comment - It would be nice if this were fixed in the 0.20 branch
        Hide
        Krishna Ramachandran added a comment -

        Patch for Y! distribution security branch

        With this change, if job history location is mis-configured JobTracker will throw an exception and fail.

        A similar fix in place for trunk (part of https://issues.apache.org/jira/browse/MAPREDUCE-157)

        This patch also addresses

        https://issues.apache.org/jira/browse/MAPREDUCE-1778

        If CompletedJobStatusStore is active & location is badly configured JobTracker will not start

        Show
        Krishna Ramachandran added a comment - Patch for Y! distribution security branch With this change, if job history location is mis-configured JobTracker will throw an exception and fail. A similar fix in place for trunk (part of https://issues.apache.org/jira/browse/MAPREDUCE-157 ) This patch also addresses https://issues.apache.org/jira/browse/MAPREDUCE-1778 If CompletedJobStatusStore is active & location is badly configured JobTracker will not start
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12445602/mapred-1699.patch
        against trunk revision 947758.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/207/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445602/mapred-1699.patch against trunk revision 947758. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/207/console This message is automatically generated.
        Hide
        Krishna Ramachandran added a comment -

        the patch is for 20.1xxS not for commit here

        Show
        Krishna Ramachandran added a comment - the patch is for 20.1xxS not for commit here
        Hide
        Krishna Ramachandran added a comment -

        attached revision with test cases
        not for commit in the trunk
        fix for earlier release

        Show
        Krishna Ramachandran added a comment - attached revision with test cases not for commit in the trunk fix for earlier release
        Hide
        Chris Douglas added a comment -
        • It looks like MAPREDUCE-1778 is included in the latest patch
        • JobHistory can still be disabled through JobInfo::logSubmitted
        Show
        Chris Douglas added a comment - It looks like MAPREDUCE-1778 is included in the latest patch JobHistory can still be disabled through JobInfo::logSubmitted
        Hide
        Krishna Ramachandran added a comment -

        Yes fix for 1778 was included based on discussions with Arun

        oops I missed logSubmitted

        In my latest revision I catch IOException and rethrow JobHistory::JobInfo::logSubmitted

        Show
        Krishna Ramachandran added a comment - Yes fix for 1778 was included based on discussions with Arun oops I missed logSubmitted In my latest revision I catch IOException and rethrow JobHistory::JobInfo::logSubmitted
        Hide
        Krishna Ramachandran added a comment -

        I have revised the patch to address history being disabled from JobInfo.logSubmitted

        Show
        Krishna Ramachandran added a comment - I have revised the patch to address history being disabled from JobInfo.logSubmitted
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12446441/mapred-1699-2.patch
        against trunk revision 951832.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/225/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446441/mapred-1699-2.patch against trunk revision 951832. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/225/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12446441/mapred-1699-2.patch
        against trunk revision 951832.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/226/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446441/mapred-1699-2.patch against trunk revision 951832. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/226/console This message is automatically generated.
        Hide
        Krishna Ramachandran added a comment -

        Modified per Chris's comments

        Removed changes related to CompletedJobStatusStore
        Fix logSubmitted - will throw an error if logging fails

        Fix for earlier release not for commit in the trunk

        Show
        Krishna Ramachandran added a comment - Modified per Chris's comments Removed changes related to CompletedJobStatusStore Fix logSubmitted - will throw an error if logging fails Fix for earlier release not for commit in the trunk
        Hide
        Krishna Ramachandran added a comment -

        Revised patch based on Chris review

        Show
        Krishna Ramachandran added a comment - Revised patch based on Chris review
        Hide
        Arun C Murthy added a comment -

        This needs more work...

        1. Remove JobHistory.disableHistory
        2. JobHistory.logSubmitted shouldn't throw an IOException, failure to log job-history shouldn't fail a job. We probably need a map from JobID -> disabled which should be checked everywhere in-lieu of JobHistory.disableHistory
        Show
        Arun C Murthy added a comment - This needs more work... Remove JobHistory.disableHistory JobHistory.logSubmitted shouldn't throw an IOException, failure to log job-history shouldn't fail a job. We probably need a map from JobID -> disabled which should be checked everywhere in-lieu of JobHistory.disableHistory
        Hide
        Krishna Ramachandran added a comment -

        yes - this is good. I have added a map and logSubmitted will initialize in case of an error for that job and use that for subsequent logging (failed, killed etc.,)

        Will update patch once testing is completed

        Am not able to entirely take out disableHistory because of this code (incorrect javadoc - there is no param - jobID)

        /**

        • Logs history meta-info to the history file. This needs to be called once
        • per history file.
        • @param jobId job id, assigned by jobtracker.
          */
          static void logMetaInfo(ArrayList<PrintWriter> writers){
          if (!disableHistory){
          if (null != writers)
          Unknown macro: { JobHistory.log(writers, RecordTypes.Meta, new Keys[] {Keys.VERSION}, new String[] {String.valueOf(VERSION)}); }

          }
          }

        also there are couple of (public) methods that use this flag. though I do not see where they are called

        /**

        • Returns history disable status. by default history is enabled so this
        • method returns false.
        • @return true if history logging is disabled, false otherwise.
          */
          public static boolean isDisableHistory() { return disableHistory; }

        /**

        • Enable/disable history logging. Default value is false, so history
        • is enabled by default.
        • @param disableHistory true if history should be disabled, false otherwise.
          */
          public static void setDisableHistory(boolean disableHistory) { JobHistory.disableHistory = disableHistory; }
        Show
        Krishna Ramachandran added a comment - yes - this is good. I have added a map and logSubmitted will initialize in case of an error for that job and use that for subsequent logging (failed, killed etc.,) Will update patch once testing is completed Am not able to entirely take out disableHistory because of this code (incorrect javadoc - there is no param - jobID) /** Logs history meta-info to the history file. This needs to be called once per history file. @param jobId job id, assigned by jobtracker. */ static void logMetaInfo(ArrayList<PrintWriter> writers){ if (!disableHistory){ if (null != writers) Unknown macro: { JobHistory.log(writers, RecordTypes.Meta, new Keys[] {Keys.VERSION}, new String[] {String.valueOf(VERSION)}); } } } also there are couple of (public) methods that use this flag. though I do not see where they are called /** Returns history disable status. by default history is enabled so this method returns false. @return true if history logging is disabled, false otherwise. */ public static boolean isDisableHistory() { return disableHistory; } /** Enable/disable history logging. Default value is false, so history is enabled by default. @param disableHistory true if history should be disabled, false otherwise. */ public static void setDisableHistory(boolean disableHistory) { JobHistory.disableHistory = disableHistory; }
        Hide
        Krishna Ramachandran added a comment -

        In trunk I believe, all job state loggings are handled in jobHistory.logEvent(jobId) {

        /**

        • Method to log the specified event
        • @param event The event to log
        • @param id The Job ID of the event
          */
          public void logEvent(HistoryEvent event, JobID id) {
          try
          Unknown macro: { final MetaInfo mi = fileMap.get(id); if (mi != null) { mi.writeEvent(event); } }

          catch (IOException e)

          { LOG.error("Error Logging event, " + e.getMessage()); }

          }
          }

        if logging fails just log a message. This is a lot cleaner - disableHistory is never used

        Show
        Krishna Ramachandran added a comment - In trunk I believe, all job state loggings are handled in jobHistory.logEvent(jobId) { /** Method to log the specified event @param event The event to log @param id The Job ID of the event */ public void logEvent(HistoryEvent event, JobID id) { try Unknown macro: { final MetaInfo mi = fileMap.get(id); if (mi != null) { mi.writeEvent(event); } } catch (IOException e) { LOG.error("Error Logging event, " + e.getMessage()); } } } if logging fails just log a message. This is a lot cleaner - disableHistory is never used
        Hide
        Krishna Ramachandran added a comment -

        Arun
        I took out references to disableHistory

        I do not believe we need a separate structure to store "disabled_state" for each jobId.
        In jobHistory.logSubmitted, writer (entry point for history logging) is initialized for each jobID

        If fileManager.addWriter(jobId, writer) fails, we log an error in catch block (just like trunk)

        During subsequent writing of all log events (started/failed/killed/completed/inited/ogInfo ..) we check for

        fileManager.getWriter(jobId) != null

        (will be null if logSubmitted failed to initialize)

        This should be sufficient and is similar to what you proposed (jobId->disabled)

        I am modifying the patch accordingly

        Let me know!

        Show
        Krishna Ramachandran added a comment - Arun I took out references to disableHistory I do not believe we need a separate structure to store "disabled_state" for each jobId. In jobHistory.logSubmitted, writer (entry point for history logging) is initialized for each jobID If fileManager.addWriter(jobId, writer) fails, we log an error in catch block (just like trunk) During subsequent writing of all log events (started/failed/killed/completed/inited/ogInfo ..) we check for fileManager.getWriter(jobId) != null (will be null if logSubmitted failed to initialize) This should be sufficient and is similar to what you proposed (jobId->disabled) I am modifying the patch accordingly Let me know!
        Hide
        Krishna Ramachandran added a comment -

        Revised (from Arun's review)

        if logSubmitted writer fails we remove the entry from fileManager list (if not null)

        Patch for an earlier release. Not for commit in trunk

        Show
        Krishna Ramachandran added a comment - Revised (from Arun's review) if logSubmitted writer fails we remove the entry from fileManager list (if not null) Patch for an earlier release. Not for commit in trunk
        Hide
        Krishna Ramachandran added a comment -

        Revised

        if JobHistory.log() fails for a job, the writer (PrintWriter) for that will be removed from FileManager to prevent repeated failed attempts

        Show
        Krishna Ramachandran added a comment - Revised if JobHistory.log() fails for a job, the writer (PrintWriter) for that will be removed from FileManager to prevent repeated failed attempts
        Hide
        Chris Douglas added a comment -

        This was not fixed in the 0.20 branch. There are two options:

        1) If the problem does not exist in trunk, please change resolution to "won't fix"
        2) Reopen the bug, optionally attaching a patch for the Apache 0.20 branch

        Show
        Chris Douglas added a comment - This was not fixed in the 0.20 branch. There are two options: 1) If the problem does not exist in trunk, please change resolution to "won't fix" 2) Reopen the bug, optionally attaching a patch for the Apache 0.20 branch
        Hide
        Todd Lipcon added a comment -

        Does this still need to be put in trunk for 22?

        Show
        Todd Lipcon added a comment - Does this still need to be put in trunk for 22?
        Hide
        Arun C Murthy added a comment -

        Fixed in 0.20.203, not relevant for MR2.

        Show
        Arun C Murthy added a comment - Fixed in 0.20.203, not relevant for MR2.

          People

          • Assignee:
            Krishna Ramachandran
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development