Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1400

JobClient rpc times out getting job status

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.12.3
    • 0.14.0
    • None
    • None

    Description

      JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:

      07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
      07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
      07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
      07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
      07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
      07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
      07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
      07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
      07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
      07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
      07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
      07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
      07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
      07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
      07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
      07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
      07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
      07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
      07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
      07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
      07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
      07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
      07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
      07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
      07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
      07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
      07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
      07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
      07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
      07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
      07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
      07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
      07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
      07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
      07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
      07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
      07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
      07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
      07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
      07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
      07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
      07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
      07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
      07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
      07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
      07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
      07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
      07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
      07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
      at org.apache.hadoop.ipc.Client.call(Client.java:473)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
      at $Proxy1.getJobStatus(Unknown Source)
      at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
      at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
      at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
      at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

      07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
      07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
      07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
      07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
      07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

      Attachments

        1. 1400.patch
          18 kB
          Owen O'Malley

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              nidaley Nigel Daley
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: