Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Cannot Reproduce
-
0.20.1, 0.20.2
-
None
-
None
-
Suse Linux cluster with 2 nodes. One running a jobtracker, namenode, datanode, tasktracker. Other running tasktracker, datanode.
Description
Running the following example hangs the child job indefinitely.
public class HaltCluster
{
public static void main(String[] args) throws IOException
{
JobConf jobConf = new JobConf();
prepareConf(jobConf);
if (args != null && args.length > 0)
JobClient.runJob(jobConf);
}
public static void prepareConf(JobConf jobConf)
{ jobConf.setJarByClass(HaltCluster.class); jobConf.set("mapred.job.tracker", "<<jobtracker>>"); jobConf.set("fs.default.name", "<<hdfs>>"); MultipleInputs.addInputPath(jobConf, new Path("/ignore" + System.currentTimeMillis()), MyInputFormat.class); jobConf.setJobName("ChildJob"); jobConf.setMapperClass(MyMapper.class); jobConf.setOutputFormat(NullOutputFormat.class); jobConf.setNumReduceTasks(0); }}
public class MyMapper implements Mapper<IntWritable, Text, NullWritable, NullWritable>
{
JobConf myConf = null;
@Override
public void map(IntWritable arg0, Text arg1, OutputCollector<NullWritable, NullWritable> arg2, Reporter arg3) throws IOException
{
if (myConf != null && "true".equals(myConf.get("callonceagain")))
{
startBackGroundReporting(arg3);
HaltCluster.main(new String[] {});
}
throw new RuntimeException("Throwing exception");
}
private void startBackGroundReporting(final Reporter arg3)
{
Thread t = new Thread()
{
@Override
public void run()
{
while (true)
}
};
t.setDaemon(true);
t.start();
}
@Override
public void configure(JobConf arg0)
@Override
public void close() throws IOException
}
run using the following command
java -cp <<classpath>> HaltCluster true
But if only one job is triggered as java -cp <<classpath>> HaltCluster
it fails to max number of attempts and quits as expected.
Also, when the jobs hang, running the child job once again, makes it come out of deadlock and completes the three jobs.