Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4490

JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0, 1.0.3, 1.2.1
    • Fix Version/s: 1.2.1
    • Labels:
    • Target Version/s:

      Description

      When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 1) with more map tasks in a job than there are map slots in the cluster will result in immediate task failures for the second task in each JVM (and then the JVM exits). We have investigated this bug and the root cause is as follows. When using LinuxTaskController, the userlog directory for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation (when the JVM is launched) because userlogs directories are created by the task-controller binary which only runs once per JVM. Therefore, attempting to create log.index is guaranteed to fail with ENOENT leading to immediate task failure and child JVM exit.

      2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0
      2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child
      ENOENT: No such file or directory
      at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
      at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
      at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
      at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
      at org.apache.hadoop.mapred.Child.main(Child.java:229)

      The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes smoothly. Then Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0 is never created so when mapred.Child tries to write the log.index file for Task27, it fails with ENOENT because the attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore, the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController because the userlogs directories are created for each task (not just for each JVM as with LinuxTaskController).

      For each task, the TaskRunner calls the TaskController's createLogDir method before attempting to write out an index file.

      • DefaultTaskController#createLogDir: creates log directory for each task
      • LinuxTaskController#createLogDir: does nothing
        • task-controller binary creates log directory [create_attempt_directories] (but only for the first task)

      Possible Solution: add a new command to task-controller initialize task to create attempt directories. Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method

      1. MAPREDUCE-4490.patch
        5 kB
        sam liu
      2. MAPREDUCE-4490.patch
        5 kB
        sam liu
      3. MAPREDUCE-4490.patch
        5 kB
        sam liu

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        125d 4h 7m 1 sam liu 14/Feb/14 06:34
        Open Open Patch Available Patch Available
        536d 1h 31m 2 sam liu 19/May/14 08:01
        Patch Available Patch Available Resolved Resolved
        25d 14h 36m 1 Eric Yang 13/Jun/14 22:38
        Eric Yang made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Eric Yang added a comment -

        I just committed this to Branch-1 and Branch-1.2. This patch is not applicable to 2.x nor trunk.

        Show
        Eric Yang added a comment - I just committed this to Branch-1 and Branch-1.2. This patch is not applicable to 2.x nor trunk.
        Hide
        Eric Yang added a comment -

        +1 looks good.

        Show
        Eric Yang added a comment - +1 looks good.
        sam liu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        sam liu made changes -
        Attachment MAPREDUCE-4490.patch [ 12628963 ]
        Hide
        sam liu added a comment -

        New patch basing on latest branch origin/branch-1.2

        Show
        sam liu added a comment - New patch basing on latest branch origin/branch-1.2
        sam liu made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        sam liu added a comment -

        Will upload new patch for latest code base of branch origin/branch-1.2

        Show
        sam liu added a comment - Will upload new patch for latest code base of branch origin/branch-1.2
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12611858/MAPREDUCE-4490.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4170//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611858/MAPREDUCE-4490.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4170//console This message is automatically generated.
        sam liu made changes -
        Attachment MAPREDUCE-4490.patch [ 12611858 ]
        Hide
        sam liu added a comment -

        Update patch to remove create_attempt_directories() invocation from task-controller.c#run_task_as_user(). That invocation is unnecessary because task-controller.c#initialize_task() always does same work.

        Show
        sam liu added a comment - Update patch to remove create_attempt_directories() invocation from task-controller.c#run_task_as_user(). That invocation is unnecessary because task-controller.c#initialize_task() always does same work.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12607976/MAPREDUCE-4490.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4113//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607976/MAPREDUCE-4490.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4113//console This message is automatically generated.
        sam liu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 1.2.1 [ 12324149 ]
        Target Version/s 1.2.1 [ 12324149 ]
        Labels patch
        Fix Version/s 1.2.1 [ 12324149 ]
        Hide
        sam liu added a comment -

        As above comments/description, the root cause of this issue is that userlogs directories are created by the task-controller binary which only runs once per JVM when using LinuxTaskController. So the major purpose of the patch is to add a new command to task-controller initialize task to create attempt directories and invoke it, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method. Below are the main details of the modifications:
        1. src/c++/task-controller/impl/task-controller.h:
        Add declaration to new method initialize_task()
        2. src/c++/task-controller/impl/task-controller.c:
        Implement the new method initialize_task() which invokes existing method create_attempt_directories()
        3. src/c++/task-controller/impl/main.c:
        To allow to invoke new method initialize_task() from ShellCommandExecutor
        4. src/mapred/org/apache/hadoop/mapred/LinuxTaskController.java:
        In method createLogDir() to invoke initialize_task() from ShellCommandExecutor to create attempt directory before launching each task

        Show
        sam liu added a comment - As above comments/description, the root cause of this issue is that userlogs directories are created by the task-controller binary which only runs once per JVM when using LinuxTaskController. So the major purpose of the patch is to add a new command to task-controller initialize task to create attempt directories and invoke it, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method. Below are the main details of the modifications: 1. src/c++/task-controller/impl/task-controller.h: Add declaration to new method initialize_task() 2. src/c++/task-controller/impl/task-controller.c: Implement the new method initialize_task() which invokes existing method create_attempt_directories() 3. src/c++/task-controller/impl/main.c: To allow to invoke new method initialize_task() from ShellCommandExecutor 4. src/mapred/org/apache/hadoop/mapred/LinuxTaskController.java: In method createLogDir() to invoke initialize_task() from ShellCommandExecutor to create attempt directory before launching each task
        sam liu made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        sam liu made changes -
        Attachment MAPREDUCE-4490.patch [ 12607976 ]
        Hide
        sam liu added a comment -

        Attached patch works well in my local environment and could resolve current issue. Any feedback is welcome!

        Show
        sam liu added a comment - Attached patch works well in my local environment and could resolve current issue. Any feedback is welcome!
        sam liu made changes -
        Assignee sam liu [ sam liu ]
        Hide
        sam liu added a comment -

        Hi,

        According to the description, I am trying to provide a patch, as we encountered same issue in our Hadoop cluster.

        First, I added a function in task-controller.c:
        int initialize_task(const char* user,
        const char * good_local_dirs, const char *job_id, const char *task_id)

        { // Prepare the attempt directories for the task JVM. int result = create_attempt_directories(user, good_local_dirs, job_id, task_id); return result; }

        Of cause, I also modified task-controller.h/task-controller.c/main.c. After that, I try to call this feature through ShellCommandExecutor in LinuxTaskController#createLogDir. However, I found the default LinuxTaskController#createLogDir only has two input parameters (TaskAttemptID taskID,boolean isCleanup), and does not satisfy the input parameters of function initialize_task(const char* user,const char * good_local_dirs, const char *job_id, const char *task_id): we can not get user, dir, jobid, taskid from LinuxTaskController#createLogDir.

        Any suggestions on the issue which is blocking my progress?

        Thanks a lot!

        Show
        sam liu added a comment - Hi, According to the description, I am trying to provide a patch, as we encountered same issue in our Hadoop cluster. First, I added a function in task-controller.c: int initialize_task(const char* user, const char * good_local_dirs, const char *job_id, const char *task_id) { // Prepare the attempt directories for the task JVM. int result = create_attempt_directories(user, good_local_dirs, job_id, task_id); return result; } Of cause, I also modified task-controller.h/task-controller.c/main.c. After that, I try to call this feature through ShellCommandExecutor in LinuxTaskController#createLogDir. However, I found the default LinuxTaskController#createLogDir only has two input parameters (TaskAttemptID taskID,boolean isCleanup), and does not satisfy the input parameters of function initialize_task(const char* user,const char * good_local_dirs, const char *job_id, const char *task_id): we can not get user, dir, jobid, taskid from LinuxTaskController#createLogDir. Any suggestions on the issue which is blocking my progress? Thanks a lot!
        Evert Lammerts made changes -
        Field Original Value New Value
        Affects Version/s 0.20.205.0 [ 12316391 ]
        Hide
        Evert Lammerts added a comment -

        We ran into this same issue on 0.20.205 - I'll add it is an affected version.

        Show
        Evert Lammerts added a comment - We ran into this same issue on 0.20.205 - I'll add it is an affected version.
        George Datskos created issue -

          People

          • Assignee:
            sam liu
            Reporter:
            George Datskos
          • Votes:
            4 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development