Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3851

Allow more aggressive action on detection of the jetty issue

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: tasktracker
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      added new configuration variables to control when TT aborts if it sees a certain number of exceptions:

          // Percent of shuffle exceptions (out of sample size) seen before it's
          // fatal - acceptable values are from 0 to 1.0, 0 disables the check.
          // ie. 0.3 = 30% of the last X number of requests matched the exception,
          // so abort.
            conf.getFloat(
                "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);

          // The number of trailing requests we track, used for the fatal
          // limit calculation
            conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);
      Show
      added new configuration variables to control when TT aborts if it sees a certain number of exceptions:     // Percent of shuffle exceptions (out of sample size) seen before it's     // fatal - acceptable values are from 0 to 1.0, 0 disables the check.     // ie. 0.3 = 30% of the last X number of requests matched the exception,     // so abort.       conf.getFloat(           "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);     // The number of trailing requests we track, used for the fatal     // limit calculation       conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);

      Description

      MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I propose we add a periodic check inside TT and configurable action to self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts connection and it takes very long time for clients to fail out. Short jobs are delayed for hours because of this. This feature will be a nice companion to MAPREDUCE-3184.

        Attachments

        1. MAPREDUCE-3851.patch
          22 kB
          Thomas Graves
        2. MAPREDUCE-3851.patch
          22 kB
          Thomas Graves
        3. MAPREDUCE-3851.patch
          22 kB
          Thomas Graves
        4. MAPREDUCE-3851.patch
          11 kB
          Thomas Graves

          Activity

            People

            • Assignee:
              tgraves Thomas Graves
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: