Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The per-job counter limits introduced in MAPREDUCE-1943 are fixed, except for the total number allowed per job (mapreduce.job.counters.limit). It would be useful to make them all configurable.

      1. MAPREDUCE-2835.patch
        3 kB
        Tom White
      2. MAPREDUCE-2835.patch
        3 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          Results of test-patch:

          [exec] +1 overall.  
          [exec] 
          [exec]     +1 @author.  The patch does not contain any @author tags.
          [exec] 
          [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
          [exec] 
          [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
          [exec] 
          [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          [exec] 
          [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          [exec] 
          
          Show
          Tom White added a comment - Results of test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec]
          Hide
          Tom White added a comment -

          I've updated the patch to use the same parameter names as MAPREDUCE-901, which will be in 0.23.

          I successfully ran test-patch and unit tests.

          Show
          Tom White added a comment - I've updated the patch to use the same parameter names as MAPREDUCE-901 , which will be in 0.23. I successfully ran test-patch and unit tests.
          Hide
          Arun C Murthy added a comment -

          Tom, we purposely kept these not configurable per job (by the user) since they would start to workaround these limits.

          We saw lots of cases where users took down jobs due to this and hence it's only configurable by admins. OTOH other limits like #groups, counter-name-len etc. are generous enough that we never had issues, but was done to be super-safe.

          Show
          Arun C Murthy added a comment - Tom, we purposely kept these not configurable per job (by the user) since they would start to workaround these limits. We saw lots of cases where users took down jobs due to this and hence it's only configurable by admins. OTOH other limits like #groups, counter-name-len etc. are generous enough that we never had issues, but was done to be super-safe.
          Hide
          Brian Bloniarz added a comment -

          If MAPREDUCE-1943 was introduced to avoid memory corruption, 50 groups
          and 120 counters is an awfully low limit. The submitter complained of
          jobs which had millions of counters. I believe that the current limits
          will cap Counters after consuming around 32KB of memory, which IMHO is
          not a reasonable tradeoff in footprint vs inconvenience.

          Regarding inconvenience: note that:
          (1) The counter & group limits are currently hard fails. Jobs will stop working.
          (2) it's a change to the current behavior; jobs which worked under 0.20 will now fail.

          If the limits were raised to (for example):
          10000 counters
          1000 groups
          1000 char name limit
          they would prevent crashing all but the most comically limited clusters
          (I measure Counters consuming 21mb with these limits), and trip up far fewer
          programmers. For the same reason, I think Tom's patch should be applied so they're
          expandable without requiring a recompilation of hadoop.

          Show
          Brian Bloniarz added a comment - If MAPREDUCE-1943 was introduced to avoid memory corruption, 50 groups and 120 counters is an awfully low limit. The submitter complained of jobs which had millions of counters. I believe that the current limits will cap Counters after consuming around 32KB of memory, which IMHO is not a reasonable tradeoff in footprint vs inconvenience. Regarding inconvenience: note that: (1) The counter & group limits are currently hard fails. Jobs will stop working. (2) it's a change to the current behavior; jobs which worked under 0.20 will now fail. If the limits were raised to (for example): 10000 counters 1000 groups 1000 char name limit they would prevent crashing all but the most comically limited clusters (I measure Counters consuming 21mb with these limits), and trip up far fewer programmers. For the same reason, I think Tom's patch should be applied so they're expandable without requiring a recompilation of hadoop.
          Hide
          Harsh J added a comment -

          I agree with Brian's reasoning.

          I understand that if you place a configuration and a newbie user/ops encounters it, they'll just raise it without second consideration. We need to address such best practices with docs/warn logs (logs can warn with hard numbers), rather than magic numbers that would not fit every one/every case.

          Show
          Harsh J added a comment - I agree with Brian's reasoning. I understand that if you place a configuration and a newbie user/ops encounters it, they'll just raise it without second consideration. We need to address such best practices with docs/warn logs (logs can warn with hard numbers), rather than magic numbers that would not fit every one/every case.
          Hide
          Harsh J added a comment -

          Btw, we've already made it configurable in 0.23 via MAPREDUCE-901.

          Show
          Harsh J added a comment - Btw, we've already made it configurable in 0.23 via MAPREDUCE-901 .
          Hide
          Arun C Murthy added a comment -

          The problem with Counters is that they are per-task and, as a result, blow out the JobTracker with large jobs.

          It's fine to make it admin-configurable (not per-job).

          Show
          Arun C Murthy added a comment - The problem with Counters is that they are per-task and, as a result, blow out the JobTracker with large jobs. It's fine to make it admin-configurable (not per-job).
          Hide
          Tom White added a comment -

          > It's fine to make it admin-configurable (not per-job).

          Agreed, and the current patch actually does this. My suggestion is that we commit this patch to 1.x so that operators can raise the cluster limits if needed.

          Show
          Tom White added a comment - > It's fine to make it admin-configurable (not per-job). Agreed, and the current patch actually does this. My suggestion is that we commit this patch to 1.x so that operators can raise the cluster limits if needed.
          Hide
          Alejandro Abdelnur added a comment -

          Initializing a static var with a 'loaded' configuration value at class loading time seems too wrong. If we need to do something like this I would add an 'static init()' method to do this initialization and call this method in the lifecycle of the corresponding services (JT/TT).

          If the code that checks those limits is invoked on the client side when a client is querying counters, this may break the client app if this limit is not set in the client mapred-site.xml. This seems wrong as this is a server config property.

          Also, would this check impact the job tasks? If so the counter initialization should be done in the task bootstrap code.

          +1 as this is enables custom limits but we should follow up with a proper fix (MAPREDUCE-3936)

          Show
          Alejandro Abdelnur added a comment - Initializing a static var with a 'loaded' configuration value at class loading time seems too wrong. If we need to do something like this I would add an 'static init()' method to do this initialization and call this method in the lifecycle of the corresponding services (JT/TT). If the code that checks those limits is invoked on the client side when a client is querying counters, this may break the client app if this limit is not set in the client mapred-site.xml. This seems wrong as this is a server config property. Also, would this check impact the job tasks? If so the counter initialization should be done in the task bootstrap code. +1 as this is enables custom limits but we should follow up with a proper fix ( MAPREDUCE-3936 )
          Hide
          Tom White added a comment -

          I've just committed this.

          Show
          Tom White added a comment - I've just committed this.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development