Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6129

Job failed due to counter out of limited in MRAppMaster

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.3.0, 2.5.0, 2.4.1, 2.5.1, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: applicationmaster
    • Labels:
      None

      Description

      Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below

      2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
      org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120
      	at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
      	at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
      	at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
      	at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
      	at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
      	at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
      	at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
      	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
      	at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
      	at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
      	at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
      	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
      	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
      	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
      	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)
      
      

      The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited.
      If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120.

      Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits.

      I will submitt a patch later.

        Attachments

        1. MAPREDUCE-6129.diff
          1 kB
          Min Zhou

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                coderplay Min Zhou
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: