Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4843

When using DefaultTaskController, JobLocalizer not thread safe

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.1
    • Fix Version/s: 1.2.0
    • Component/s: tasktracker
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In our cluster, some times job will failed due to below exception:
      2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_000023_0:
      org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories
      at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
      at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
      at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
      at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
      at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)

      The root cause is JobLocalizer is not thread safe.
      In DefaultTaskController.initializeJob method:
      JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid);
      but in JobLocalizer, it just simply keep the reference of the conf.
      When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance.
      So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf.
      Then it will cause the previous job's job.xml stored at another user's dir.

      1. MAPREDUCE-4843-branch-1.1.patch
        0.9 kB
        zhaoyunjiong
      2. mr-4843.patch
        3 kB
        Karthik Kambatla

        Issue Links

          Activity

          Hide
          zhaoyunjiong added a comment -

          The fix is very simple:

          diff --git a/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java b/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java
          index 0802b03..625face 100644
          — a/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java
          +++ b/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java
          @@ -108,7 +108,7 @@ public class JobLocalizer

          { throw new IOException("Cannot initialize for null jobid"); }


          this.jobid = jobid;

          • this.ttConf = ttConf;
            + this.ttConf = new JobConf(ttConf);
            lfs = FileSystem.getLocal(ttConf).getRaw();
            this.localDirs = createPaths(user, localDirs);
            ttConf.setStrings(JOB_LOCAL_CTXT, localDirs);
          Show
          zhaoyunjiong added a comment - The fix is very simple: diff --git a/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java b/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java index 0802b03..625face 100644 — a/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java +++ b/src/mapred/org/apache/hadoop/mapred/JobLocalizer.java @@ -108,7 +108,7 @@ public class JobLocalizer { throw new IOException("Cannot initialize for null jobid"); } this.jobid = jobid; this.ttConf = ttConf; + this.ttConf = new JobConf(ttConf); lfs = FileSystem.getLocal(ttConf).getRaw(); this.localDirs = createPaths(user, localDirs); ttConf.setStrings(JOB_LOCAL_CTXT, localDirs);
          Hide
          zhaoyunjiong added a comment -

          Above patch is not working. I'm working on new patch.

          Show
          zhaoyunjiong added a comment - Above patch is not working. I'm working on new patch.
          Hide
          zhaoyunjiong added a comment -

          Update patch.

          Show
          zhaoyunjiong added a comment - Update patch.
          Hide
          zhaoyunjiong added a comment -

          Testing patch

          Show
          zhaoyunjiong added a comment - Testing patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12556081/MAPREDUCE-4843-branch-1.1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3095//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12556081/MAPREDUCE-4843-branch-1.1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3095//console This message is automatically generated.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          zhaoyunjiong The patch looks good. Can you post a patch against trunk for QA to be able to apply it. Also, I was wondering if it would be possible to add a test?

          Show
          Karthik Kambatla (Inactive) added a comment - zhaoyunjiong The patch looks good. Can you post a patch against trunk for QA to be able to apply it. Also, I was wondering if it would be possible to add a test?
          Hide
          zhaoyunjiong added a comment -

          No need for trunk. In hadoop 2.0, the problem doesn't exist.
          It's very difficult to test a thread safe problem, even it's not thread safe, in most case it will pass it.

          Show
          zhaoyunjiong added a comment - No need for trunk. In hadoop 2.0, the problem doesn't exist. It's very difficult to test a thread safe problem, even it's not thread safe, in most case it will pass it.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          My bad - read the branch name wrong. I applied the patch locally, and verified that the tests that directly use DefaultTaskController pass - TestTaskTrackerLocalization, TestJvmManager, TestTaskEnvironment.

          +1

          Show
          Karthik Kambatla (Inactive) added a comment - My bad - read the branch name wrong. I applied the patch locally, and verified that the tests that directly use DefaultTaskController pass - TestTaskTrackerLocalization, TestJvmManager, TestTaskEnvironment. +1
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Uploading the patch from MAPREDUCE-4964 as that solves this issue in a simpler/cleaner way. The discussion on that JIRA has all the details.

          Applied the patch to latest branch-1 and it applies cleanly. Also, verified TestJobLocalizer passes.

          Show
          Karthik Kambatla (Inactive) added a comment - Uploading the patch from MAPREDUCE-4964 as that solves this issue in a simpler/cleaner way. The discussion on that JIRA has all the details. Applied the patch to latest branch-1 and it applies cleanly. Also, verified TestJobLocalizer passes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567889/mr-4843.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3298//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567889/mr-4843.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3298//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          +1. As per discussion in MAPREDUCE-4964 the latest patch seems a better way of doing it.

          Show
          Alejandro Abdelnur added a comment - +1. As per discussion in MAPREDUCE-4964 the latest patch seems a better way of doing it.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Karthik. Committed to branch-1. Arun, thanks for double checking on this one.

          Show
          Alejandro Abdelnur added a comment - Thanks Karthik. Committed to branch-1. Arun, thanks for double checking on this one.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              zhaoyunjiong
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development