Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-762

Task's process trees may not be killed if a TT is restarted

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Some work has been done to make sure the tasktrackers kill process trees of tasks when they finish (either successfully, or with failures or when they are killed). Related JIRAs are HADOOP-2721, HADOOP-5488 and HADOOP-5420. But when TTs are restarted, we do not handle killing of process trees - though tasks will themselves die on re-establishing contact with the TT.

        Activity

        Hide
        Allen Wittenauer added a comment -

        Closing this as stale.

        Show
        Allen Wittenauer added a comment - Closing this as stale.
        Hide
        Amar Kamat added a comment -

        I am not sure about this. So, this will be done like in the finally block of the Child by sending a kill -pid to itself ?

        Yeah. Just before the child commits suicide.

        What's the structure under TaskTracker.SUBDIR/pid ?

        The structure would be jvm-id.pid and it would contain user-name, pid and start-time of the child process. username is required for permissions while start-time is required to be sure that the process has not changed.

        We should just follow the path of TaskController.killTaskJVM - that will ensure it will work for all task controllers. Setting permissions to 600 for the pid files owned by TT should be fine.

        Yeah. TaskController.killTask() would be invoked which will do the forceful exit of the process.

        Show
        Amar Kamat added a comment - I am not sure about this. So, this will be done like in the finally block of the Child by sending a kill -pid to itself ? Yeah. Just before the child commits suicide. What's the structure under TaskTracker.SUBDIR/pid ? The structure would be jvm-id.pid and it would contain user-name , pid and start-time of the child process. username is required for permissions while start-time is required to be sure that the process has not changed. We should just follow the path of TaskController.killTaskJVM - that will ensure it will work for all task controllers. Setting permissions to 600 for the pid files owned by TT should be fine. Yeah. TaskController.killTask() would be invoked which will do the forceful exit of the process.
        Hide
        Hemanth Yamijala added a comment -

        Child jvm, before exiting, should try and cleanup/kill all its sub-processes

        I am not sure about this. So, this will be done like in the finally block of the Child by sending a kill -pid to itself ?

        Once a jvm is spawned, its session id should be persisted to task-tracker's private folder (TaskTracker.SUBDIR/pid with 700 permission?)

        What's the structure under TaskTracker.SUBDIR/pid ? I suppose the right solution is to use jvm-id.pid. Another solution could be taskAttemptId.[cleanup].pid. If we use taskAttemptId, we should take care of JVM Reuse. If we use jvm-id.pid, when a task is being killed, we can lookup the jvm-id for the task and then pick up the right pid file. Would this work ? Permissions for each of the files should be 600 owned by tasktracker.

        Once the jvm exits, this pid file should be deleted

        +1. Should be done by the TT.

        Upon restart, the pid files in the private folder should be cleaned up (under appropriate owner permissions)

        This should be done after sending the kill signal to the files in the folder - because they are all potentially running tasks - the reason of this bug.

        pid files should have sufficient information to reconstruct jvm-context object which is required by LinuxTaskController to kill the process under user permission.

        We should just follow the path of TaskController.killTaskJVM - that will ensure it will work for all task controllers. Setting permissions to 600 for the pid files owned by TT should be fine.

        Show
        Hemanth Yamijala added a comment - Child jvm, before exiting, should try and cleanup/kill all its sub-processes I am not sure about this. So, this will be done like in the finally block of the Child by sending a kill -pid to itself ? Once a jvm is spawned, its session id should be persisted to task-tracker's private folder (TaskTracker.SUBDIR/pid with 700 permission?) What's the structure under TaskTracker.SUBDIR/pid ? I suppose the right solution is to use jvm-id.pid. Another solution could be taskAttemptId. [cleanup] .pid. If we use taskAttemptId, we should take care of JVM Reuse. If we use jvm-id.pid, when a task is being killed, we can lookup the jvm-id for the task and then pick up the right pid file. Would this work ? Permissions for each of the files should be 600 owned by tasktracker. Once the jvm exits, this pid file should be deleted +1. Should be done by the TT. Upon restart, the pid files in the private folder should be cleaned up (under appropriate owner permissions) This should be done after sending the kill signal to the files in the folder - because they are all potentially running tasks - the reason of this bug. pid files should have sufficient information to reconstruct jvm-context object which is required by LinuxTaskController to kill the process under user permission. We should just follow the path of TaskController.killTaskJVM - that will ensure it will work for all task controllers. Setting permissions to 600 for the pid files owned by TT should be fine.
        Hide
        Amar Kamat added a comment -

        Here is a proposal :

        1. Child jvm, before exiting, should try and cleanup/kill all its sub-processes
        2. Once a jvm is spawned, its session id should be persisted to task-tracker's private folder (TaskTracker.SUBDIR/pid with 700 permission?)
        3. Once the jvm exits, this pid file should be deleted
        4. Upon restart, the pid files in the private folder should be cleaned up (under appropriate owner permissions)
        5. pid files should have sufficient information to reconstruct jvm-context object which is required by LinuxTaskController to kill the process under user permission.

        @Hemanth, Ravi, Vinod, Sreekanth, Devaraj : Am I missing something here?

        Show
        Amar Kamat added a comment - Here is a proposal : Child jvm, before exiting, should try and cleanup/kill all its sub-processes Once a jvm is spawned, its session id should be persisted to task-tracker's private folder (TaskTracker.SUBDIR/pid with 700 permission?) Once the jvm exits, this pid file should be deleted Upon restart, the pid files in the private folder should be cleaned up (under appropriate owner permissions) pid files should have sufficient information to reconstruct jvm-context object which is required by LinuxTaskController to kill the process under user permission. @Hemanth, Ravi, Vinod, Sreekanth, Devaraj : Am I missing something here?

          People

          • Assignee:
            Unassigned
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development