Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6608 Work Preserving AM Restart for MapReduce
  3. MAPREDUCE-6754

Container Ids for an yarn application should be monotonically increasing in the scope of the application

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently across application attempts, container Ids are reused. The container id is stored in AppSchedulingInfo and it is reinitialized with every application attempt. So the containerId scope is limited to the application attempt.

      In the MR Framework, It is important to note that the containerId is being used as part of the JvmId. JvmId has 3 components <jobId, "m/r?", containerId>. The JvmId is used in datastructures in TaskAttemptListener and is passed between the AppMaster and the individual tasks. For an application attempt, no two tasks have the same JvmId.

      This works well currently, since inflight tasks get killed if the AppMaster goes down. However, if we want to enable WorkPreserving nature for the AM, containers (and hence containerIds) live across application attempts. If we recycle containerIds across attempts, then two independent tasks (one inflight from a previous attempt and another new in a succeeding attempt) can have the same JvmId and cause havoc.

      This can be solved in two ways:

      Approach A: Include attempt id as part of the JvmId. This is a viable solution, however, there is a change in the format of the JVMid. Changing something that has existed so long for an optional feature is not persuasive.

      Approach B: Keep the container id to be a monotonically increasing id for the life of an application. So, container ids are not reused across application attempts containers should be able to outlive an application attempt. This is the preferred approach as it is clean, logical and is backwards compatible. Nothing changes for existing applications or the internal workings.
      How this is achieved:
      Currently, we maintain latest containerId only for application attempts and reinitialize for new attempts. With this approach, we retrieve the latest containerId from the just-failed attempt and initialize the new attempt with the latest containerId (instead of 0). I can provide the patch if it helps. It currently exists in MAPREDUCE-6726

      Attachments

        1. MAPREDUCE-6754-001.patch
          31 kB
          D M Murali Krishna Reddy

        Activity

          People

            dmmkr D M Murali Krishna Reddy
            srikanth.sampath Srikanth Sampath
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated: