Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7046

Enhance logging related to retrieving Job

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • client
    • None

    Description

      We recently encountered an interesting problem. In one case, Hive Driver was unable to retrieve the status of a MapReduce job. The following stack trace was printed:

      [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - 2018-01-15 00:18:09,324 Stage-2 map = 0%,  reduce = 0%, Cumulative CPU 1679.31 sec
       [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1511036412170_1322169 with exception 'java.io.IOException(Could not find status of job:job_1511036412170_1322169)'
      java.io.IOException: Could not find status of job:job_1511036412170_1322169
      	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
      	at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
      	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
      	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
      	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782)
      	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
      	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
      	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
      	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
      	at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325)
      	at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302)
      	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
      

      We examined the logs from JHS and AM, but haven't seen anything suspicious. For some reason a null was returned but it's not obvious why. The MR job was running at this point.

      Some ideas:
      1. We already have logging in place related to JobClient->AM and JobClient->JHS communication, but that's on TRACE level and that could be too low. It might make more sense to raise the level to DEBUG.

      2. We need new LOG.debug() calls at some crucial points

      Attachments

        1. MAPREDUCE-7046-001.patch
          7 kB
          Peter Bacsko

        Activity

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: