Hadoop Common
  1. Hadoop Common
  2. HADOOP-2247

Mappers fail easily due to repeated failures

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      1400 Node hadoop cluster

      Description

      Related to HADOOP-2220, problem introduced in HADOOP-1158

      At this scale hardcoding the number of fetch failures to a static number: in this case 3 is never going to work. Although the jobs we are running are loading the systems 3 failures can randomly occur within the lifetime of a map. Even fetching the data can cause enough load for so many failures to occur.

      We believe that number of tasks and size of cluster should be taken into account. Based on which we believe that a ratio between total fetch attempts and total failed attempts should be taken into consideration.

      Given our experience with a task should be declared "Too many fetch failures" based on:

      failures > n /could be 3/ && (failures/total attempts) > k% /could be 30-40%/

      Basically the first factor is to give some headstart to the second factor, second factor then takes into account the cluster size and the task size.

      Additionally we could take recency into account, say failures and attempts in last one hour. We do not want to make it too small.

      1. HADOOP-2220.patch
        14 kB
        Amar Kamat
      2. HADOOP-2220.patch
        14 kB
        Amar Kamat
      3. HADOOP-2220.patch
        9 kB
        Amar Kamat

        Issue Links

          Activity

          Srikanth Kakani created issue -
          Christian Kunz made changes -
          Field Original Value New Value
          Fix Version/s 0.15.2 [ 12312877 ]
          Priority Major [ 3 ] Blocker [ 1 ]
          Amar Kamat made changes -
          Link This issue relates to HADOOP-2220 [ HADOOP-2220 ]
          Sameer Paranjpye made changes -
          Assignee Amar Kamat [ amar_kamat ]
          Amar Kamat made changes -
          Attachment HADOOP-2220.patch [ 12371591 ]
          Amar Kamat made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Amar Kamat made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Amar Kamat made changes -
          Attachment HADOOP-2220.patch [ 12371601 ]
          Amar Kamat made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Devaraj Das made changes -
          Link This issue incorporates HADOOP-2220 [ HADOOP-2220 ]
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Amar Kamat made changes -
          Attachment HADOOP-2220.patch [ 12372075 ]
          Amar Kamat made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Reopened [ 4 ]
          Resolution Fixed [ 1 ]
          Christian Kunz made changes -
          Fix Version/s 0.16.0 [ 12312740 ]
          Fix Version/s 0.15.2 [ 12312877 ]
          Arun C Murthy made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Srikanth Kakani
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development