Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1943

Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.203.0
    • Component/s: None
    • Labels:
      None

      Description

      We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split.

      1. MAPREDUCE-1943-0.20-yahoo.patch
        11 kB
        Mahadev konar
      2. MAPREDUCE-1943-0.20-yahoo.patch
        14 kB
        Mahadev konar
      3. MAPREDUCE-1943-yahoo-hadoop-0.20S.patch
        23 kB
        Mahadev konar
      4. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        4 kB
        Mahadev konar
      5. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        6 kB
        Mahadev konar
      6. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        13 kB
        Mahadev konar
      7. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        20 kB
        Mahadev konar
      8. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        18 kB
        Mahadev konar
      9. MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch
        18 kB
        Mahadev konar

        Issue Links

          Activity

          Hide
          Scott Chen added a comment -

          +1 to the idea. We have seen the huge split-size kills JT. This will help.

          Show
          Scott Chen added a comment - +1 to the idea. We have seen the huge split-size kills JT. This will help.
          Hide
          Mahadev konar added a comment -

          this patch imposes some limits.

          the following are the limits it imposes:

          1) The number of counters per group is limited to 40. If the counters increase that amount they are dropped silently.
          2) The number of counter groups is restricted to 40. Again if the groups are more than the limit they are dropped silently.
          3) The string size of counter name is restricted to 64 characters.
          4) the string size of group name is restricted to 128 characters.
          5) The number of block locations returned by a split is restricted to 100, this can be changed with a configuration parameter.
          6) limit the reporter.setstatus() string size to 512 characters.

          I havent added tests yet. Will upload one shortly. Also, this patch is for yahoo 0.20 branch. I will upload one for the trunk shortly.

          Show
          Mahadev konar added a comment - this patch imposes some limits. the following are the limits it imposes: 1) The number of counters per group is limited to 40. If the counters increase that amount they are dropped silently. 2) The number of counter groups is restricted to 40. Again if the groups are more than the limit they are dropped silently. 3) The string size of counter name is restricted to 64 characters. 4) the string size of group name is restricted to 128 characters. 5) The number of block locations returned by a split is restricted to 100, this can be changed with a configuration parameter. 6) limit the reporter.setstatus() string size to 512 characters. I havent added tests yet. Will upload one shortly. Also, this patch is for yahoo 0.20 branch. I will upload one for the trunk shortly.
          Hide
          Mahadev konar added a comment -

          attached the wrong file..

          Show
          Mahadev konar added a comment - attached the wrong file..
          Hide
          Amareshwari Sriramadasu added a comment -

          Limiting task diagnostic info and status are done in MAPREDUCE-1482.

          Show
          Amareshwari Sriramadasu added a comment - Limiting task diagnostic info and status are done in MAPREDUCE-1482 .
          Hide
          Mahadev konar added a comment -

          this patch adds tests to the above featuers.

          also, changed the limits to group = 50 and counters in each group = 70.

          Show
          Mahadev konar added a comment - this patch adds tests to the above featuers. also, changed the limits to group = 50 and counters in each group = 70.
          Hide
          Mahadev konar added a comment -

          an updated patch with test cases and a limit of 80 on counters. This patch throws a runtimeexception if the limit on counters is exceeded. Also, the number of block locations has a hard limit of 100.

          Show
          Mahadev konar added a comment - an updated patch with test cases and a limit of 80 on counters. This patch throws a runtimeexception if the limit on counters is exceeded. Also, the number of block locations has a hard limit of 100.
          Hide
          Mahadev konar added a comment -

          this patch is an addendum to the last patch. This fixes a bug wherein counters arent counted across tasks before the job completes. This patch updates the numer of counters on every heartbeat for the job and kill the job in case it exceeds the limit.

          Show
          Mahadev konar added a comment - this patch is an addendum to the last patch. This fixes a bug wherein counters arent counted across tasks before the job completes. This patch updates the numer of counters on every heartbeat for the job and kill the job in case it exceeds the limit.
          Hide
          Mahadev konar added a comment -

          the fix addendum with a test case.

          I will be uploading a single patch for trunk soon.

          Show
          Mahadev konar added a comment - the fix addendum with a test case. I will be uploading a single patch for trunk soon.
          Hide
          Mahadev konar added a comment -

          the earlier patches on every heartbeat computed to see if the counters had exceeded the limit. I made a change in this patch to make it much lighter. Here the check is done only after job is done running its maps and reduce tasks.

          Show
          Mahadev konar added a comment - the earlier patches on every heartbeat computed to see if the counters had exceeded the limit. I made a change in this patch to make it much lighter. Here the check is done only after job is done running its maps and reduce tasks.
          Hide
          Mahadev konar added a comment -

          this patch is updated with some loopholes we found while testing. It prevents the exceeded exception from interrupting the flow of job expiration.

          Show
          Mahadev konar added a comment - this patch is updated with some loopholes we found while testing. It prevents the exceeded exception from interrupting the flow of job expiration.
          Hide
          Mahadev konar added a comment -

          fixes minor bug in my earlier patch with respect to configuration settings and calling getjobcounters without job being initialized.

          Show
          Mahadev konar added a comment - fixes minor bug in my earlier patch with respect to configuration settings and calling getjobcounters without job being initialized.
          Hide
          Mahadev konar added a comment -

          sorry attached a wrong file.

          Show
          Mahadev konar added a comment - sorry attached a wrong file.
          Hide
          Mahadev konar added a comment -

          an updated patch that fixes findbugs warnings and also makes sure we check for counters return value always.

          Show
          Mahadev konar added a comment - an updated patch that fixes findbugs warnings and also makes sure we check for counters return value always.
          Hide
          Liyin Liang added a comment -

          your latest patch is based on your previous patch, why?

          Show
          Liyin Liang added a comment - your latest patch is based on your previous patch, why?
          Hide
          Tom White added a comment -

          This was fixed in 0.20.203.0 (see Subversion Commits tab, also commit r1077730).

          Show
          Tom White added a comment - This was fixed in 0.20.203.0 (see Subversion Commits tab, also commit r1077730).
          Hide
          Owen O'Malley added a comment -

          Closing for 0.20.203.0

          Show
          Owen O'Malley added a comment - Closing for 0.20.203.0

            People

            • Assignee:
              Mahadev konar
              Reporter:
              Mahadev konar
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development