Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2635

Jobs hang indefinitely on failure.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Cannot Reproduce
    • 0.20.1, 0.20.2
    • None
    • None
    • Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.

    Description

      Running the following example hangs the child job indefinitely.

      public class HaltCluster
      {

      public static void main(String[] args) throws IOException
      {
      JobConf jobConf = new JobConf();
      prepareConf(jobConf);
      if (args != null && args.length > 0)

      { jobConf.set("callonceagain", args[0]); jobConf.setMaxMapAttempts(1); jobConf.setJobName("ParentJob"); }

      JobClient.runJob(jobConf);

      }

      public static void prepareConf(JobConf jobConf)

      { jobConf.setJarByClass(HaltCluster.class); jobConf.set("mapred.job.tracker", "<<jobtracker>>"); jobConf.set("fs.default.name", "<<hdfs>>"); MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class); jobConf.setJobName("ChildJob"); jobConf.setMapperClass(MyMapper.class); jobConf.setOutputFormat(NullOutputFormat.class); jobConf.setNumReduceTasks(0); }

      }

      public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
      {
      JobConf myConf = null;

      @Override
      public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
      {
      if (myConf != null && "true".equals(myConf.get("callonceagain")))
      {
      startBackGroundReporting(arg3);
      HaltCluster.main(new String[] {});
      }

      throw new RuntimeException("Throwing exception");
      }

      private void startBackGroundReporting(final Reporter arg3)
      {
      Thread t = new Thread()
      {
      @Override
      public void run()
      {
      while (true)

      { arg3.setStatus("Reporting to be alive at " + System.currentTimeMillis()); }

      }
      };
      t.setDaemon(true);
      t.start();
      }

      @Override
      public void configure(JobConf arg0)

      { myConf = arg0; }

      @Override
      public void close() throws IOException

      { // TODO Auto-generated method stub }

      }

      run using the following command

      java -cp <<classpath>> HaltCluster true

      But if only one job is triggered as java -cp <<classpath>> HaltCluster
      it fails to max number of attempts and quits as expected.

      Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sudhan65 Sudharsan Sampath
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: