Details

    • Hadoop Flags:
      Reviewed

      Description

      At present, TaskTracker doesn't handle disk failures properly both at startup and runtime.

      (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs.
      (2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn't do anything special. This results in either
      (a) TaskTracker continues to "try to use that bad disk" and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR
      (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk.

      This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs.

      1. MR-2413.v0.1.patch
        44 kB
        Ravi Gummadi
      2. MR-2413.v0.2.patch
        44 kB
        Jagane Sundar
      3. MR-2413.v0.3.patch
        44 kB
        Ravi Gummadi
      4. MR-2413.v0.patch
        44 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Ravi Gummadi added a comment -

          Attaching patch solving the 2 issues mentioned in the JIRA description.

          The patch does the following:

          (1) TaskTracker maintains good mapred-local-dirs list and bad mapred-local-dirs list.
          (2) When TT is starting up, all mapred-local-dirs are checked if they are on good disks or not. This updates the good dirs list and bad dirs list.
          (3) TaskTracker periodically checks the health of good mapred-local-dirs. If any good mapred-local-dir becomes bad, then TaskTracker reinitilizes itself. So the effect at TaskTracker side is similar to getting ReinitTrackerAction from JobTracker. In the currently existing code, JobTracker sends ReinitTrackerAction when it finds that this TaskTracker was lost some time back and came back now.
          (4) A new configuration property mapred.disk.healthChecker.interval (whose value is in milli sec) is added with a default value of 60000. This is the interval between 2 consecutive checks of health of mapred-local-dirs by TaskTracker.
          (5) Task Tracker's in-memory configuration is also updated everytime initialize() happens. Correct configuration value for mapred.local.dir in tasks' configurations is set before launching tasks.
          (6) TaskTracker passes the list of good mapred-local-dirs to Linux Task Controller binary as a parameter(comma separated list). Linux Task Controller uses this good mapred-local-dirs only. So with this patch, Linux Task Controller's configuration file taskcontroller.cfg doesn't have to contain mapred.local.dir. Even if taskcontroller.cfg contains mapred.loca.dir, it is just ignored by Linux Task Controller.
          ------------------------------------------------------------------
          With this patch,

          What happens when a disk failed and before TaskTracker reinits itself ?

          Currently running tasks and tasks that are getting launched now which try to use the bad disk can fail.

          What happens after TT re-initialization ?

          All the mapred-local-dirs are cleaned up during re-initialization. So running tasks can fail because of this clean up. All finished maps of those jobs whose reduces still haven't fetched these maps' outputs will also fail with "too many fetch failures" error because all these maps' outputs are also cleaned up and thus this TaskTracker can't serve these maps' outputs to reduces.

          Show
          Ravi Gummadi added a comment - Attaching patch solving the 2 issues mentioned in the JIRA description. The patch does the following: (1) TaskTracker maintains good mapred-local-dirs list and bad mapred-local-dirs list. (2) When TT is starting up, all mapred-local-dirs are checked if they are on good disks or not. This updates the good dirs list and bad dirs list. (3) TaskTracker periodically checks the health of good mapred-local-dirs. If any good mapred-local-dir becomes bad, then TaskTracker reinitilizes itself. So the effect at TaskTracker side is similar to getting ReinitTrackerAction from JobTracker. In the currently existing code, JobTracker sends ReinitTrackerAction when it finds that this TaskTracker was lost some time back and came back now. (4) A new configuration property mapred.disk.healthChecker.interval (whose value is in milli sec) is added with a default value of 60000. This is the interval between 2 consecutive checks of health of mapred-local-dirs by TaskTracker. (5) Task Tracker's in-memory configuration is also updated everytime initialize() happens. Correct configuration value for mapred.local.dir in tasks' configurations is set before launching tasks. (6) TaskTracker passes the list of good mapred-local-dirs to Linux Task Controller binary as a parameter(comma separated list). Linux Task Controller uses this good mapred-local-dirs only. So with this patch, Linux Task Controller's configuration file taskcontroller.cfg doesn't have to contain mapred.local.dir. Even if taskcontroller.cfg contains mapred.loca.dir, it is just ignored by Linux Task Controller. ------------------------------------------------------------------ With this patch, What happens when a disk failed and before TaskTracker reinits itself ? Currently running tasks and tasks that are getting launched now which try to use the bad disk can fail. What happens after TT re-initialization ? All the mapred-local-dirs are cleaned up during re-initialization. So running tasks can fail because of this clean up. All finished maps of those jobs whose reduces still haven't fetched these maps' outputs will also fail with "too many fetch failures" error because all these maps' outputs are also cleaned up and thus this TaskTracker can't serve these maps' outputs to reduces.
          Hide
          Ravi Gummadi added a comment -

          MAPREDUCE-134 is solved as part of the attached patch.

          Show
          Ravi Gummadi added a comment - MAPREDUCE-134 is solved as part of the attached patch.
          Hide
          Owen O'Malley added a comment -

          The comments in configuration.c aren't correct.

          The result of get_value should be released via free.
          The result of extract_values and get_values should be release via free_values.

          initialize_job goes past 80 chars.

          We like to have braces around even single line branches in if statements. Your changes in TaskTracker don't do that.

          Why do you call localStorage.isDiskFailed and then ignore the results?

          Rather than setting the "conf" attribute for the http server, you should set an attribute with the localStorage object. All uses of MAPRED_LOCALDIR_PROPERTY should be removed, other than the original creation of the localStorage. Furthermore, the property should never be set.

          Show
          Owen O'Malley added a comment - The comments in configuration.c aren't correct. The result of get_value should be released via free. The result of extract_values and get_values should be release via free_values. initialize_job goes past 80 chars. We like to have braces around even single line branches in if statements. Your changes in TaskTracker don't do that. Why do you call localStorage.isDiskFailed and then ignore the results? Rather than setting the "conf" attribute for the http server, you should set an attribute with the localStorage object. All uses of MAPRED_LOCALDIR_PROPERTY should be removed, other than the original creation of the localStorage. Furthermore, the property should never be set.
          Hide
          Ravi Gummadi added a comment -

          >> Why do you call localStorage.isDiskFailed and then ignore the results?

          This is done in initialize() because we don't want the flag localStorage.diskFailed to be true ( this happens if there are new disk failures just before the control comes to initialize()->localStorage.checkLLocalDirs() ) when we go to offerService() as that will unnecessarily trigger re-init-TT. So we just want to set localStorage.diskFailed to false in initialize() because we are handling/ignoring failed disks/mapred-local-dirs already.

          Show
          Ravi Gummadi added a comment - >> Why do you call localStorage.isDiskFailed and then ignore the results? This is done in initialize() because we don't want the flag localStorage.diskFailed to be true ( this happens if there are new disk failures just before the control comes to initialize()->localStorage.checkLLocalDirs() ) when we go to offerService() as that will unnecessarily trigger re-init-TT. So we just want to set localStorage.diskFailed to false in initialize() because we are handling/ignoring failed disks/mapred-local-dirs already.
          Hide
          Ravi Gummadi added a comment -

          >> when we go to offerService()

          I mean when control goes to offerService() first time after initialize-TT/re-initialize-TT.

          Show
          Ravi Gummadi added a comment - >> when we go to offerService() I mean when control goes to offerService() first time after initialize-TT/re-initialize-TT.
          Hide
          Ravi Gummadi added a comment -

          Attaching updated patch incorporating review comments.
          As it is leading to lot of complex code changes, I didn't incorporate the comment "using localStorage only everywhere and not updating TaskTracker.fConf at all with good local directories". Also httpserver need not take another attribute localStorage in addition to conf as conf is anyway sent in existing code and conf is up to date regarding good mapred local dirs.

          Show
          Ravi Gummadi added a comment - Attaching updated patch incorporating review comments. As it is leading to lot of complex code changes, I didn't incorporate the comment "using localStorage only everywhere and not updating TaskTracker.fConf at all with good local directories". Also httpserver need not take another attribute localStorage in addition to conf as conf is anyway sent in existing code and conf is up to date regarding good mapred local dirs.
          Hide
          Jagane Sundar added a comment -

          >> Why do you call localStorage.isDiskFailed and then ignore the results?
          Here is some more context as to why we're ignoring the return value from the call to isDiskFailed():
          LocalStorage.isDiskFailed() returns true if a disk has failed since the last time this method was called. When called from initialize(), we're calling it only to reset the state.

          Also, Owen I would like to add to Ravi's comment regarding the following comment that you make:

          >> Rather than setting the "conf" attribute for the http server, you should set an attribute with the localStorage object. All uses of MAPRED_LOCALDIR_PROPERTY should be removed, other than the original creation of the localStorage. Furthermore, the property should never be set.

          This change will result in a lot of changes to existing code. I am not certain that these changes are worth the effort. I acknowledge that the software will be more elegant if written the way that you are proposing, but my concern is that we will end up changing a lot of code that is already inelegant in its use of the MAPRED_LOCALDIR_PROPERTY. Our desire is to keep changes limited in scope, I am requesting that you accept the patch as Ravi has last submitted it, without this change.

          Show
          Jagane Sundar added a comment - >> Why do you call localStorage.isDiskFailed and then ignore the results? Here is some more context as to why we're ignoring the return value from the call to isDiskFailed(): LocalStorage.isDiskFailed() returns true if a disk has failed since the last time this method was called. When called from initialize(), we're calling it only to reset the state. Also, Owen I would like to add to Ravi's comment regarding the following comment that you make: >> Rather than setting the "conf" attribute for the http server, you should set an attribute with the localStorage object. All uses of MAPRED_LOCALDIR_PROPERTY should be removed, other than the original creation of the localStorage. Furthermore, the property should never be set. This change will result in a lot of changes to existing code. I am not certain that these changes are worth the effort. I acknowledge that the software will be more elegant if written the way that you are proposing, but my concern is that we will end up changing a lot of code that is already inelegant in its use of the MAPRED_LOCALDIR_PROPERTY. Our desire is to keep changes limited in scope, I am requesting that you accept the patch as Ravi has last submitted it, without this change.
          Hide
          Owen O'Malley added a comment -

          The comment on get_value should be:

           /*
            * function used to get a configuration value.
            * The function for the first time populates the configuration details into
            * array, next time onwards uses the populated array.
            *
            * Memory returned here should be freed using free.
            */
          

          free_values should be commented as:

          // free an entry set of values
          void free_values(char** values) {
            if (*values != NULL) {
              // the values were tokenized from the same malloc, so freeing the first
              // frees the entire block.
              free(*values);
            }
            if (values != NULL) {
              free(values);
            }
          }
          
          Show
          Owen O'Malley added a comment - The comment on get_value should be: /* * function used to get a configuration value. * The function for the first time populates the configuration details into * array, next time onwards uses the populated array. * * Memory returned here should be freed using free. */ free_values should be commented as: // free an entry set of values void free_values( char ** values) { if (*values != NULL) { // the values were tokenized from the same malloc, so freeing the first // frees the entire block. free(*values); } if (values != NULL) { free(values); } }
          Hide
          Jagane Sundar added a comment -

          Owen - I have made the comment change that you suggested, and uploaded MR-2413.v0.2.patch. Please review and accept.

          Show
          Jagane Sundar added a comment - Owen - I have made the comment change that you suggested, and uploaded MR-2413.v0.2.patch. Please review and accept.
          Hide
          Ravi Gummadi added a comment -

          Attaching new patch removing an unused field in LocalStorage class.

          Show
          Ravi Gummadi added a comment - Attaching new patch removing an unused field in LocalStorage class.
          Hide
          Owen O'Malley added a comment -

          Can you run test-patch on the patch?

          Show
          Owen O'Malley added a comment - Can you run test-patch on the patch?
          Hide
          Ravi Gummadi added a comment -

          Unit tests and test-patch passed on my local machine.

          1 javadoc warning was reported, but that was because of MR-2429.

          1 findbugs warning is "Inconsistent synchronization of org.apache.hadoop.mapred.TaskTracker.fConf; locked 62% of time", which I think can be ignored because fConf need not be accessed in synchronized block only. Right ?

          Show
          Ravi Gummadi added a comment - Unit tests and test-patch passed on my local machine. 1 javadoc warning was reported, but that was because of MR-2429. 1 findbugs warning is "Inconsistent synchronization of org.apache.hadoop.mapred.TaskTracker.fConf; locked 62% of time", which I think can be ignored because fConf need not be accessed in synchronized block only. Right ?
          Hide
          Owen O'Malley added a comment -

          You need to fix the findbugs warning.

          Synchronization of fConf is critical since your code is modifying the fConf, which was previously read-only.

          Show
          Owen O'Malley added a comment - You need to fix the findbugs warning. Synchronization of fConf is critical since your code is modifying the fConf, which was previously read-only.
          Hide
          Owen O'Malley added a comment -

          The synchronization of TaskTracker.fConf is quite complicated and will require a larger refactoring to fix it completely. This patch substantially improves the performance on systems with many disks and does not worsen the locking of TaskTracker.fConf.

          I just committed this to 204.

          Show
          Owen O'Malley added a comment - The synchronization of TaskTracker.fConf is quite complicated and will require a larger refactoring to fix it completely. This patch substantially improves the performance on systems with many disks and does not worsen the locking of TaskTracker.fConf. I just committed this to 204.
          Hide
          Ravi Gummadi added a comment -

          Am working on porting this patch to trunk.

          Show
          Ravi Gummadi added a comment - Am working on porting this patch to trunk.
          Hide
          Eli Collins added a comment -

          @Ravi - trunk's task tracker or as a feature for MR2?

          Show
          Eli Collins added a comment - @Ravi - trunk's task tracker or as a feature for MR2?
          Hide
          Ravi Gummadi added a comment -

          Planning to work on the porting to trunk for now. Not MR2 because it is a lot different.

          Show
          Ravi Gummadi added a comment - Planning to work on the porting to trunk for now. Not MR2 because it is a lot different.
          Show
          Eli Collins added a comment - Heads up, per this thread on mr-dev [1] this may be a wasted effort. http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201107.mbox/%3CCAPn_vTsdiiqfCB2G0HfsOr3W_4PKoocPcTf2VB93Y3MZrzRczQ@mail.gmail.com%3E
          Hide
          Eli Collins added a comment -

          What testing was done with this change before it was committed? The patch doesn't have any tests that cover this functionality and I discovered MR-2920 and MR-2921 from doing some basic sanity checking.

          Also, why was Owen's feedback not addressed before committing this change?

          Show
          Eli Collins added a comment - What testing was done with this change before it was committed? The patch doesn't have any tests that cover this functionality and I discovered MR-2920 and MR-2921 from doing some basic sanity checking. Also, why was Owen's feedback not addressed before committing this change?
          Hide
          Eli Collins added a comment -

          Here's review feedback on the patch that was committed:

          • LocalStorage should not be public, adding a method in UtilsForTests will allow it to have package protection
          • This is a larger issue, but LocalStorage doesn't need to be tied to MR (see HADOOP-7551)
          • getBadLocalDirs and the array of bad dirs are dead code, should be removed
          • TT#getLocalStorage is dead code too
          • getGoodLocalDirsString should not reimplement StringUtils#join. A better name would be getDirs as we know it returns local dirs and it's should only return good dirs, ie all the callers should use it to get a list of local dirs to alloc from vs having to care if they're good or bad.
          • The LocalStorage#isDiskFailed method is goofy, this would be cleaner if it just returned the number of valid directories and then the code below would return STALE if the number of good dirs changed since it last checked.
          Show
          Eli Collins added a comment - Here's review feedback on the patch that was committed: LocalStorage should not be public, adding a method in UtilsForTests will allow it to have package protection This is a larger issue, but LocalStorage doesn't need to be tied to MR (see HADOOP-7551 ) getBadLocalDirs and the array of bad dirs are dead code, should be removed TT#getLocalStorage is dead code too getGoodLocalDirsString should not reimplement StringUtils#join. A better name would be getDirs as we know it returns local dirs and it's should only return good dirs, ie all the callers should use it to get a list of local dirs to alloc from vs having to care if they're good or bad. The LocalStorage#isDiskFailed method is goofy, this would be cleaner if it just returned the number of valid directories and then the code below would return STALE if the number of good dirs changed since it last checked.
          Hide
          Eli Collins added a comment -

          TT should start even if at least one of the mapred-local-dirs is on a good disk

          Why is this a good policy? Such a TT will perform poorly. I filed MAPREDUCE-2924 to make this configurable.

          Show
          Eli Collins added a comment - TT should start even if at least one of the mapred-local-dirs is on a good disk Why is this a good policy? Such a TT will perform poorly. I filed MAPREDUCE-2924 to make this configurable.
          Hide
          Bharath Mundlapudi added a comment -

          Hi Eli,

          >> What testing was done with this change before it was committed?
          There was tremendous testing went into testing these patches. We have tested this feature at many levels.

          Here are the things we tested.

          1. Simulating disk failures.
          2. Randomly makings disk read-only via mounting.
          3. Randomly making directory read/write only.
          4. Our QA team has written more functional tests.
          5. There was lots of manual verification of this feature.
          6. We have run Terasort and Gridmixv3 for testing verification with disk failures.

          There was huge effort went into this feature. Many many nam-hours of testing went into this.

          >> TT should start even if at least one of the mapred-local-dirs is on a good disk
          Having configurable option for this might be good idea. But the rationale for this decision - Something is better than nothing. If we have one disk to run TT, why not utilize the compute capacity on this machine. Since certain percentage of our cluster runs with cpu intensive jobs too.

          Let me know if you need any further explanation.

          Show
          Bharath Mundlapudi added a comment - Hi Eli, >> What testing was done with this change before it was committed? There was tremendous testing went into testing these patches. We have tested this feature at many levels. Here are the things we tested. 1. Simulating disk failures. 2. Randomly makings disk read-only via mounting. 3. Randomly making directory read/write only. 4. Our QA team has written more functional tests. 5. There was lots of manual verification of this feature. 6. We have run Terasort and Gridmixv3 for testing verification with disk failures. There was huge effort went into this feature. Many many nam-hours of testing went into this. >> TT should start even if at least one of the mapred-local-dirs is on a good disk Having configurable option for this might be good idea. But the rationale for this decision - Something is better than nothing. If we have one disk to run TT, why not utilize the compute capacity on this machine. Since certain percentage of our cluster runs with cpu intensive jobs too. Let me know if you need any further explanation.
          Hide
          Eli Collins added a comment -

          Thanks for the update Bharath. Could you share the functional tests that your QA team wrote? How will other developers know whether they broke this feature?

          In your experiments, does a machine with only a single functioning disk warrant staying up? I suspect tasks on this machine will perform poorly. I suspect at Yahoo! you're using some configuration that blacklists a TT after X disk failures. If someone isn't using such a configuration their cluster will perform poorly.

          Did you guys test both the default and link task controllers?

          Show
          Eli Collins added a comment - Thanks for the update Bharath. Could you share the functional tests that your QA team wrote? How will other developers know whether they broke this feature? In your experiments, does a machine with only a single functioning disk warrant staying up? I suspect tasks on this machine will perform poorly. I suspect at Yahoo! you're using some configuration that blacklists a TT after X disk failures. If someone isn't using such a configuration their cluster will perform poorly. Did you guys test both the default and link task controllers?
          Hide
          Owen O'Malley added a comment -

          Hadoop 0.20.204.0 was just released.

          Show
          Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
          Hide
          Eli Collins added a comment -

          Another testing question - what value was used for dfs.datanode.failed.volumes.tolerated when testing this change? If there are N disks and the DN say only tolerates N / 2 failures (or some other reasonable number) then you'll get a host where the TT is up and the DN is down, which doesn't make sense right?

          Show
          Eli Collins added a comment - Another testing question - what value was used for dfs.datanode.failed.volumes.tolerated when testing this change? If there are N disks and the DN say only tolerates N / 2 failures (or some other reasonable number) then you'll get a host where the TT is up and the DN is down, which doesn't make sense right?
          Hide
          Eli Collins added a comment -

          Another testing question - was this tested in conjunction with a mapred health checker script?

          Show
          Eli Collins added a comment - Another testing question - was this tested in conjunction with a mapred health checker script?
          Hide
          Ravi Gummadi added a comment -

          Yes. It was tested with health check script.

          Show
          Ravi Gummadi added a comment - Yes. It was tested with health check script.
          Hide
          Owen O'Malley added a comment -

          Eli,
          It isn't unreasonable to have a TT without a DN or the other way around. I agree that we should make symmetric config knobs so that if someone has them tuned differently they did it explicitly. (In reality, I think the failed.volumes.tolerated is a mistake and we need to move to a list of required partitions and everything else is optional. Even a node with a single good drive can do useful work and getting it to do something would be good. (Although we should also scale down the number of tasks/containers scheduled on such a node...)

          Show
          Owen O'Malley added a comment - Eli, It isn't unreasonable to have a TT without a DN or the other way around. I agree that we should make symmetric config knobs so that if someone has them tuned differently they did it explicitly. (In reality, I think the failed.volumes.tolerated is a mistake and we need to move to a list of required partitions and everything else is optional. Even a node with a single good drive can do useful work and getting it to do something would be good. (Although we should also scale down the number of tasks/containers scheduled on such a node...)
          Hide
          Bharath Mundlapudi added a comment -

          Hi Eli,

          Please note that lot of testing needs to be done as root like cases where we need to mount disk as 'ro' or if you want to inject a failure. These are cases where we can't write unit tests. There was lot of manual testing went into this feature.
          Of course, we can add some more unit test which is true for any feature. That is the nature of this problem.

          And regarding your question related to N disks, I think, Owen answered it. I agree too. Its reasonable to make TT run without DN and vice-versa. If you want old behavior, one can do the following:

          1. Set the threshold in DN say 'k' disks.
          2. Send 'ERROR' msg from health check script after 'k' disks fail so TT can be blacklisted as it is today.

          You can have this behavior today with the existing code.

          Show
          Bharath Mundlapudi added a comment - Hi Eli, Please note that lot of testing needs to be done as root like cases where we need to mount disk as 'ro' or if you want to inject a failure. These are cases where we can't write unit tests. There was lot of manual testing went into this feature. Of course, we can add some more unit test which is true for any feature. That is the nature of this problem. And regarding your question related to N disks, I think, Owen answered it. I agree too. Its reasonable to make TT run without DN and vice-versa. If you want old behavior, one can do the following: 1. Set the threshold in DN say 'k' disks. 2. Send 'ERROR' msg from health check script after 'k' disks fail so TT can be blacklisted as it is today. You can have this behavior today with the existing code.
          Hide
          Eli Collins added a comment -

          Why is a TT w/o a DN reasonable in 20x? The scheduler won't throttle down task allocations for such hosts so you'll get tasks on hosts performing lots of local IO to a small # of spindles, and a lot of remote IO.

          Wrt testing, the issue here is that there are no tests for this feature. We usually don't permit changes w/o some test coverage in the automated (unit or system) tests. Ie just manual coverage is insufficient, especially when the manual test plan has not been specified or reviewed. Could you upload the test plan that was you used? Are you going to execute this test plan for 205?

          Show
          Eli Collins added a comment - Why is a TT w/o a DN reasonable in 20x? The scheduler won't throttle down task allocations for such hosts so you'll get tasks on hosts performing lots of local IO to a small # of spindles, and a lot of remote IO. Wrt testing, the issue here is that there are no tests for this feature. We usually don't permit changes w/o some test coverage in the automated (unit or system) tests. Ie just manual coverage is insufficient, especially when the manual test plan has not been specified or reviewed. Could you upload the test plan that was you used? Are you going to execute this test plan for 205?
          Hide
          Dave Winters added a comment -

          Ok, we tested this after field failures resulted in the TT going down. DN still functioning for data ops.
          With dfs.datanode.failed.volumes.tolerated = 3, the TT stopped accepting work after 2 disks were "failed". We induced this by 'mount -l /dataX'
          It seems that the TT stops at N-1 failures of the tolerated setting N.

          Show
          Dave Winters added a comment - Ok, we tested this after field failures resulted in the TT going down. DN still functioning for data ops. With dfs.datanode.failed.volumes.tolerated = 3, the TT stopped accepting work after 2 disks were "failed". We induced this by 'mount -l /dataX' It seems that the TT stops at N-1 failures of the tolerated setting N.

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Bharath Mundlapudi
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development