Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3851

Allow more aggressive action on detection of the jetty issue

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: tasktracker
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      added new configuration variables to control when TT aborts if it sees a certain number of exceptions:

          // Percent of shuffle exceptions (out of sample size) seen before it's
          // fatal - acceptable values are from 0 to 1.0, 0 disables the check.
          // ie. 0.3 = 30% of the last X number of requests matched the exception,
          // so abort.
            conf.getFloat(
                "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);

          // The number of trailing requests we track, used for the fatal
          // limit calculation
            conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);
      Show
      added new configuration variables to control when TT aborts if it sees a certain number of exceptions:     // Percent of shuffle exceptions (out of sample size) seen before it's     // fatal - acceptable values are from 0 to 1.0, 0 disables the check.     // ie. 0.3 = 30% of the last X number of requests matched the exception,     // so abort.       conf.getFloat(           "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);     // The number of trailing requests we track, used for the fatal     // limit calculation       conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);

      Description

      MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I propose we add a periodic check inside TT and configurable action to self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts connection and it takes very long time for clients to fail out. Short jobs are delayed for hours because of this. This feature will be a nice companion to MAPREDUCE-3184.

      1. MAPREDUCE-3851.patch
        11 kB
        Thomas Graves
      2. MAPREDUCE-3851.patch
        22 kB
        Thomas Graves
      3. MAPREDUCE-3851.patch
        22 kB
        Thomas Graves
      4. MAPREDUCE-3851.patch
        22 kB
        Thomas Graves

        Activity

        Kihwal Lee created issue -
        Thomas Graves made changes -
        Field Original Value New Value
        Assignee Thomas Graves [ tgraves ]
        Thomas Graves made changes -
        Attachment MAPREDUCE-3851.patch [ 12514896 ]
        Thomas Graves made changes -
        Attachment MAPREDUCE-3851.patch [ 12515044 ]
        Thomas Graves made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note added new configuration variables to control when TT aborts if it sees a certain number of exceptions:

            // Percent of shuffle exceptions (out of sample size) seen before it's
            // fatal - acceptable values are from 0 to 1.0, 0 disables the check.
            // ie. 0.3 = 30% of the last X number of requests matched the exception,
            // so abort.
              conf.getFloat(
                  "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);

            // The number of trailing requests we track, used for the fatal
            // limit calculation
              conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);
        Target Version/s 1.0.1 [ 12319503 ]
        Thomas Graves made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Thomas Graves made changes -
        Attachment MAPREDUCE-3851.patch [ 12515092 ]
        Thomas Graves made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Thomas Graves made changes -
        Attachment MAPREDUCE-3851.patch [ 12515608 ]
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.0.2 [ 12320047 ]
        Fix Version/s 1.0.1 [ 12319503 ]
        Resolution Fixed [ 1 ]
        Matt Foley made changes -
        Fix Version/s 1.1.0 [ 12317960 ]
        Target Version/s 1.0.1 [ 12319503 ] 1.0.2 [ 12320047 ]
        Matt Foley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development