Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.205.0
    • Component/s: tasktracker
    • Labels:
      None

      Description

      Tracks improvements to MR-2413. See this comment.

      1. mapreduce-2928-3.patch
        26 kB
        Eli Collins
      2. mapreduce-2928-2.patch
        29 kB
        Eli Collins
      3. mapreduce-2928-1.patch
        26 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Eli Collins added a comment -

          Here's a preliminary patch. Hasn't been tested extensively.

          Show
          Eli Collins added a comment - Here's a preliminary patch. Hasn't been tested extensively.
          Hide
          Owen O'Malley added a comment -

          This mostly looks fine, although removing the function getGoodLocalDirsString and inlining the code into all of the calling sites is a step backwards.

          Can you undo that part of the change, Eli?

          Since this is a bunch of minor cleanups, I'd propose it for 206.

          Show
          Owen O'Malley added a comment - This mostly looks fine, although removing the function getGoodLocalDirsString and inlining the code into all of the calling sites is a step backwards. Can you undo that part of the change, Eli? Since this is a bunch of minor cleanups, I'd propose it for 206.
          Hide
          Ravi Gummadi added a comment -

          This patch removed the boolean variable "diskFailed" and introduced "lastNumDirs".

          (1) In the original patch of MR2413, in offerService(), I didn't check the number of good local dirs count against previous/last good local dirs count for determining the health of disks because there could be a case where a failed disk becomes good and a good disk becomes bad between 2 consecutive checks-in-offerService(). I mean if a nonwritable disk is made writable(may be manually) at some time and in the same minute(i.e. before the next disk-health-check-in-offerService) another good disk failed, then that won't result in the re-init-of-TT with this changed code in this patch. This will result in wrong list of good local dirs in TT's memory and will cause using bad disks. Possible race condition ?

          if (numDirs < lastNumDirs) {
            return State.STALE;
          }
          

          (2) Also, this new variable lastNumDirs needs to be initialized before the code of offerService() gets executed first time ---- to avoid re-init of TT the first time control comes to disk-health-check-code of offerService().
          Right ?

          Show
          Ravi Gummadi added a comment - This patch removed the boolean variable "diskFailed" and introduced "lastNumDirs". (1) In the original patch of MR2413, in offerService(), I didn't check the number of good local dirs count against previous/last good local dirs count for determining the health of disks because there could be a case where a failed disk becomes good and a good disk becomes bad between 2 consecutive checks-in-offerService(). I mean if a nonwritable disk is made writable(may be manually) at some time and in the same minute(i.e. before the next disk-health-check-in-offerService) another good disk failed, then that won't result in the re-init-of-TT with this changed code in this patch. This will result in wrong list of good local dirs in TT's memory and will cause using bad disks. Possible race condition ? if (numDirs < lastNumDirs) { return State.STALE; } (2) Also, this new variable lastNumDirs needs to be initialized before the code of offerService() gets executed first time ---- to avoid re-init of TT the first time control comes to disk-health-check-code of offerService(). Right ?
          Hide
          Eli Collins added a comment -

          Owen, Ravi - thank you for the feedback.

          Will re-introduce getGoodLocalDirsString.

          Wrt #1 are we worried about this case? It doesn't seem like something that would happen in practice as disk failures aren't typically recoverable, and per MAPREDUCE-3011 I'm thinking once a local path is identified as bad we should remove it from the config. In the DN we maintain a count of total failures, how about we do that here and use it instead of the # of valid dirs? This value always increases and is therefore not susceptible to the bug you described.

          Wrt #2 - good point, will fix this.

          Show
          Eli Collins added a comment - Owen, Ravi - thank you for the feedback. Will re-introduce getGoodLocalDirsString. Wrt #1 are we worried about this case? It doesn't seem like something that would happen in practice as disk failures aren't typically recoverable, and per MAPREDUCE-3011 I'm thinking once a local path is identified as bad we should remove it from the config. In the DN we maintain a count of total failures, how about we do that here and use it instead of the # of valid dirs? This value always increases and is therefore not susceptible to the bug you described. Wrt #2 - good point, will fix this.
          Hide
          Eli Collins added a comment -

          Patch attached.

          Show
          Eli Collins added a comment - Patch attached.
          Hide
          Ravi Gummadi added a comment -

          One minor comment:

          +          int numFailures = lastNumFailures;
          +          localStorage.checkDirs();
                     lastCheckDirsTime = now;
          -          // If any of the good disks failed, re-init the task tracker
          -          if (localStorage.isDiskFailed()) {
          +          lastNumFailures = localStorage.numFailures();
          +          // Re-init the task tracker if there were any new failures
          +          if (numFailures < lastNumFailures) {
          

          can be cleaner/understandable the following way — so that lastNumFailures represents previous value of number of failures instead of current number of failures when we do the comparison ?

          +          localStorage.checkDirs();
                     lastCheckDirsTime = now;
          -          // If any of the good disks failed, re-init the task tracker
          -          if (localStorage.isDiskFailed()) {
          +          int numFailures = localStorage.numFailures();
          +          // Re-init the task tracker if there were any new failures
          +          if (numFailures > lastNumFailures) {
          +            lastNumFailures = numFailures;
          




          Related to this patch(and MR-3011):
          May be in a separate JIRA, we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good. For this, LocalStorage needs to maintain bad dirs list also.

          Show
          Ravi Gummadi added a comment - One minor comment: + int numFailures = lastNumFailures; + localStorage.checkDirs(); lastCheckDirsTime = now; - // If any of the good disks failed, re-init the task tracker - if (localStorage.isDiskFailed()) { + lastNumFailures = localStorage.numFailures(); + // Re-init the task tracker if there were any new failures + if (numFailures < lastNumFailures) { can be cleaner/understandable the following way — so that lastNumFailures represents previous value of number of failures instead of current number of failures when we do the comparison ? + localStorage.checkDirs(); lastCheckDirsTime = now; - // If any of the good disks failed, re-init the task tracker - if (localStorage.isDiskFailed()) { + int numFailures = localStorage.numFailures(); + // Re-init the task tracker if there were any new failures + if (numFailures > lastNumFailures) { + lastNumFailures = numFailures; Related to this patch(and MR-3011): May be in a separate JIRA, we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good. For this, LocalStorage needs to maintain bad dirs list also.
          Hide
          Matt Foley added a comment -

          Owen, can we close on this? Thanks.

          Show
          Matt Foley added a comment - Owen, can we close on this? Thanks.
          Hide
          Owen O'Malley added a comment -

          I think the change from isDiskFailed to numFailures isn't very well motivated, but most of the clean up makes things better.

          Show
          Owen O'Malley added a comment - I think the change from isDiskFailed to numFailures isn't very well motivated, but most of the clean up makes things better.
          Hide
          Eli Collins added a comment -

          The motivation for tracking numFailures is that the TT re-init code just wants to know if there have been any failures since the last time it checked directories, this can easily be determined by seeing if the total number of failures has changed. The isDiskFailed method is stateful, eg you have to know whether you've called it previously (which is why TT#initialize calls it and ignores the return value), ie it tells you if there have been any failures since the last time isDiskFailed was called, not if there have been any failures since the disks were checked. Make sense? It be might more clear if checkDirs returned the failure count but it throws DiskErrors itself so that doesn't work.

          Show
          Eli Collins added a comment - The motivation for tracking numFailures is that the TT re-init code just wants to know if there have been any failures since the last time it checked directories, this can easily be determined by seeing if the total number of failures has changed. The isDiskFailed method is stateful, eg you have to know whether you've called it previously (which is why TT#initialize calls it and ignores the return value), ie it tells you if there have been any failures since the last time isDiskFailed was called, not if there have been any failures since the disks were checked. Make sense? It be might more clear if checkDirs returned the failure count but it throws DiskErrors itself so that doesn't work.
          Hide
          Eli Collins added a comment -

          @Ravi, good suggestion - updated patch attached.

          May be in a separate JIRA, we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good.

          Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart.

          Show
          Eli Collins added a comment - @Ravi, good suggestion - updated patch attached. May be in a separate JIRA, we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good. Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart.
          Hide
          Eli Collins added a comment -

          Btw apologies for the slow response. I'm out of town for the following three weeks, will be back Monday October 17th.

          Show
          Eli Collins added a comment - Btw apologies for the slow response. I'm out of town for the following three weeks, will be back Monday October 17th.
          Hide
          Matt Foley added a comment -

          Opened MAPREDUCE-3077 to capture Ravi and Eli's suggested enhancement, for future work.

          Show
          Matt Foley added a comment - Opened MAPREDUCE-3077 to capture Ravi and Eli's suggested enhancement, for future work.
          Hide
          Kihwal Lee added a comment -

          Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart.

          We need to consider the opposite case as well. The file system in a failing disk can be remounted rw and may work for a short amount of time but then fail again. I am sure you've seen this too since you said "sometimes". We need to make sure the potential performance degradation caused by bad disk health check is limited. The thread can hang for a long time (e.g. 300 seconds) until the kernel turns it back to ro.

          Hey, I will also be out of town for three weeks and come back on 10/17. GMTA.

          Show
          Kihwal Lee added a comment - Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart. We need to consider the opposite case as well. The file system in a failing disk can be remounted rw and may work for a short amount of time but then fail again. I am sure you've seen this too since you said "sometimes". We need to make sure the potential performance degradation caused by bad disk health check is limited. The thread can hang for a long time (e.g. 300 seconds) until the kernel turns it back to ro. Hey, I will also be out of town for three weeks and come back on 10/17. GMTA.
          Hide
          Matt Foley added a comment -

          Copied Kiwal's response to MAPREDUCE-3077. Please continue this thread there, not here. Thanks.

          Show
          Matt Foley added a comment - Copied Kiwal's response to MAPREDUCE-3077 . Please continue this thread there, not here. Thanks.
          Hide
          Matt Foley added a comment -

          The patches mapreduce-2928-2.patch and mapreduce-2928-3.patch were generated from different CMS and ended up with files in different order, which always complicates comparison. However, analysis shows that the only substantive diffs are:

          > Index: src/mapred/org/apache/hadoop/mapred/TaskTracker.java
          > ===================================================================
          @@ -856,13 +825,6 @@ public class TaskTracker implements MRConstants, TaskUmbilicalProtocol,
          [[ +12 lines ]]
          < Class<? extends TaskTrackerInstrumentation> metricsInst =
          < getInstrumentationClass(fConf);

          > try {
          > Class<? extends TaskTrackerInstrumentation> metricsInst =

          @@ -1605,10 +1560,12 @@ public class TaskTracker implements MRConstants, TaskUmbilicalProtocol,
          [[ +4 lines ]]
          < + int numFailures = lastNumFailures;
          [[ +4 lines ]]
          < + lastNumFailures = localStorage.numFailures();

          > + int numFailures = localStorage.numFailures();
          [[ +1 line ]]
          < + if (numFailures < lastNumFailures) {

          > + if (numFailures > lastNumFailures) {
          > + lastNumFailures = numFailures;

          The first chunk is differences only in surrounding context, not in patched lines. Thus, these changes are irrelevant to the code changed by this patch.

          The second chunk is precisely the changes suggested by Ravi.

          So, +1 on code review, per the previous reviewers' comments.

          Show
          Matt Foley added a comment - The patches mapreduce-2928-2.patch and mapreduce-2928-3.patch were generated from different CMS and ended up with files in different order, which always complicates comparison. However, analysis shows that the only substantive diffs are: > Index: src/mapred/org/apache/hadoop/mapred/TaskTracker.java > =================================================================== @@ -856,13 +825,6 @@ public class TaskTracker implements MRConstants, TaskUmbilicalProtocol, [[ +12 lines ]] < Class<? extends TaskTrackerInstrumentation> metricsInst = < getInstrumentationClass(fConf); — > try { > Class<? extends TaskTrackerInstrumentation> metricsInst = @@ -1605,10 +1560,12 @@ public class TaskTracker implements MRConstants, TaskUmbilicalProtocol, [[ +4 lines ]] < + int numFailures = lastNumFailures; [[ +4 lines ]] < + lastNumFailures = localStorage.numFailures(); — > + int numFailures = localStorage.numFailures(); [[ +1 line ]] < + if (numFailures < lastNumFailures) { — > + if (numFailures > lastNumFailures) { > + lastNumFailures = numFailures; The first chunk is differences only in surrounding context, not in patched lines. Thus, these changes are irrelevant to the code changed by this patch. The second chunk is precisely the changes suggested by Ravi. So, +1 on code review, per the previous reviewers' comments.
          Hide
          Matt Foley added a comment -

          Committed to 0.20-security and 0.20.205. Thanks Eli! And thanks to Owen and Ravi for code review.

          Show
          Matt Foley added a comment - Committed to 0.20-security and 0.20.205. Thanks Eli! And thanks to Owen and Ravi for code review.
          Hide
          Matt Foley added a comment -

          Closed upon release of 0.20.205.0

          Show
          Matt Foley added a comment - Closed upon release of 0.20.205.0

            People

            • Assignee:
              Eli Collins
              Reporter:
              Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development