Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4843

When using DefaultTaskController, JobLocalizer not thread safe

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.1
    • Fix Version/s: 1.2.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In our cluster, some times job will failed due to below exception:
      2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_000023_0:
      org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories
      at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
      at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
      at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
      at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
      at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)

      The root cause is JobLocalizer is not thread safe.
      In DefaultTaskController.initializeJob method:
      JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid);
      but in JobLocalizer, it just simply keep the reference of the conf.
      When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance.
      So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf.
      Then it will cause the previous job's job.xml stored at another user's dir.

      1. mr-4843.patch
        3 kB
        Karthik Kambatla
      2. MAPREDUCE-4843-branch-1.1.patch
        0.9 kB
        zhaoyunjiong

        Issue Links

          Activity

          zhaoyunjiong created issue -
          zhaoyunjiong made changes -
          Field Original Value New Value
          Description In our cluster, some times job will failed due to below exception:
          Error initializing attempt_201210181806_18566_r_000376_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201210181806_18566/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)

          The root cause is JobLocalizer is not thread safe.
          In DefaultTaskController.initializeJob method:
               JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid);
          but in JobLocalizer, it just simply keep the reference of the conf.
          When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but one conf instance.
          So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf.
          It will cause the previous job's job.xml stored at another user's dir.
          In our cluster, some times job will failed due to below exception:
          2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_000023_0:
          org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories
          at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
          at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
          at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
          at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
          at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)

          The root cause is JobLocalizer is not thread safe.
          In DefaultTaskController.initializeJob method:
               JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid);
          but in JobLocalizer, it just simply keep the reference of the conf.
          When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance.
          So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf.
          Then it will cause the previous job's job.xml stored at another user's dir.
          zhaoyunjiong made changes -
          Attachment MAPREDUCE-4843-branch-1.1.patch [ 12556081 ]
          zhaoyunjiong made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Karthik Kambatla (Inactive) made changes -
          Link This issue is related to MAPREDUCE-4964 [ MAPREDUCE-4964 ]
          Karthik Kambatla (Inactive) made changes -
          Assignee Karthik Kambatla [ kkambatl ]
          Karthik Kambatla (Inactive) made changes -
          Attachment mr-4843.patch [ 12567889 ]
          Alejandro Abdelnur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 1.2.0 [ 12321661 ]
          Resolution Fixed [ 1 ]
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Assignee Karthik Kambatla [ kkambatl ] Karthik Kambatla [ kasha ]

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              zhaoyunjiong
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development