Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We recently encountered an interesting problem. In one case, Hive Driver was unable to retrieve the status of a MapReduce job. The following stack trace was printed:
[main] INFO org.apache.hadoop.hive.ql.exec.Task - 2018-01-15 00:18:09,324 Stage-2 map = 0%, reduce = 0%, Cumulative CPU 1679.31 sec [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Ended Job = job_1511036412170_1322169 with exception 'java.io.IOException(Could not find status of job:job_1511036412170_1322169)' java.io.IOException: Could not find status of job:job_1511036412170_1322169 at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
We examined the logs from JHS and AM, but haven't seen anything suspicious. For some reason a null was returned but it's not obvious why. The MR job was running at this point.
Some ideas:
1. We already have logging in place related to JobClient->AM and JobClient->JHS communication, but that's on TRACE level and that could be too low. It might make more sense to raise the level to DEBUG.
2. We need new LOG.debug() calls at some crucial points