Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2266

JvmManager sleeps between SIGTERM and SIGKILL while holding many TT locks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      Between sending a task SIGTERM and SIGKILL, the JvmManager will sleep for sleepTimeBeforeSigKill millis. But in many call heirarchies this is done while holding important locks like the TT lock and the JvmManagerForType lock. With the default 5 second sleep, this prevents other tasks from getting scheduled and reduces scheduling throughput.

        Issue Links

          Activity

          Todd Lipcon created issue -
          Hide
          Todd Lipcon added a comment -

          I took a peek at YDH and it looks like the solution used there is to defer the sleep/SIGKILL into a new DelayedProcessKiller thread. But I couldn't find any open source JIRA for this improvement. It would be appreciated if this patch could be contributed - otherwise I'm happy to forward port for trunk.

          Show
          Todd Lipcon added a comment - I took a peek at YDH and it looks like the solution used there is to defer the sleep/SIGKILL into a new DelayedProcessKiller thread. But I couldn't find any open source JIRA for this improvement. It would be appreciated if this patch could be contributed - otherwise I'm happy to forward port for trunk.
          Hide
          Arun C Murthy added a comment -

          Todd, I'll try and find the original author of the fix, but please feel free to forward port it. Thanks!

          Show
          Arun C Murthy added a comment - Todd, I'll try and find the original author of the fix, but please feel free to forward port it. Thanks!
          Todd Lipcon made changes -
          Field Original Value New Value
          Link This issue is blocked by MAPREDUCE-2178 [ MAPREDUCE-2178 ]
          Hide
          Todd Lipcon added a comment -

          This needs to wait on forward-port of MAPREDUCE-2178 before it can really be done

          Show
          Todd Lipcon added a comment - This needs to wait on forward-port of MAPREDUCE-2178 before it can really be done
          Hide
          Konstantin Shvachko added a comment -

          Unblocking as MAPREDUCE-2767 removed LinuxTaskController.

          Show
          Konstantin Shvachko added a comment - Unblocking as MAPREDUCE-2767 removed LinuxTaskController.
          Konstantin Shvachko made changes -
          Priority Blocker [ 1 ] Major [ 3 ]
          Hide
          Todd Lipcon added a comment -

          This is a bug in DefaultTaskController - you're going to want to include it, or else MR performance will have a giant regression.

          Show
          Todd Lipcon added a comment - This is a bug in DefaultTaskController - you're going to want to include it, or else MR performance will have a giant regression.
          Hide
          Konstantin Shvachko added a comment -

          Todd. What do I include? If there was a patch I would be happy to.

          Show
          Konstantin Shvachko added a comment - Todd. What do I include? If there was a patch I would be happy to.
          Hide
          Todd Lipcon added a comment -

          There's no patch, since this was dealt with by MAPREDUCE-2178 where JvmManager was substantially rewritten. I can't contribute time to the 0.22 release, since I'm concentrating on 0.23 and 0.20.20x release lines.

          Show
          Todd Lipcon added a comment - There's no patch, since this was dealt with by MAPREDUCE-2178 where JvmManager was substantially rewritten. I can't contribute time to the 0.22 release, since I'm concentrating on 0.23 and 0.20.20x release lines.
          Konstantin Shvachko made changes -
          Fix Version/s 0.22.1 [ 12319242 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Allen Wittenauer made changes -
          Fix Version/s 0.22.1 [ 12319242 ]
          Allen Wittenauer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          1515d 1h 46m 1 Allen Wittenauer 10/Mar/15 03:12

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development