Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: client
    • Labels:
      None

      Description

      We recently encountered an interesting problem. In one case, Hive Driver was unable to retrieve the status of a MapReduce job. The following stack trace was printed:

      [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - 2018-01-15 00:18:09,324 Stage-2 map = 0%,  reduce = 0%, Cumulative CPU 1679.31 sec
       [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1511036412170_1322169 with exception 'java.io.IOException(Could not find status of job:job_1511036412170_1322169)'
      java.io.IOException: Could not find status of job:job_1511036412170_1322169
      	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
      	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
      	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
      	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
      	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
      	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
      	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
      	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
      	at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325)
      	at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302)
      	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
      

      We examined the logs from JHS and AM, but haven't seen anything suspicious. For some reason a null was returned but it's not obvious why. The MR job was running at this point.

      Some ideas:
      1. We already have logging in place related to JobClient->AM and JobClient->JHS communication, but that's on TRACE level and that could be too low. It might make more sense to raise the level to DEBUG.

      2. We need new LOG.debug() calls at some crucial points

        Attachments

          Activity

            People

            • Assignee:
              pbacsko Peter Bacsko
              Reporter:
              pbacsko Peter Bacsko
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: