Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1400

JobClient rpc times out getting job status

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.12.3
    • 0.14.0
    • None
    • None

    Description

      JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log:

      07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029
      07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0%
      07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0%
      07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0%
      07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0%
      07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0%
      07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0%
      07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0%
      07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0%
      07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0%
      07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0%
      07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0%
      07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0%
      07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0%
      07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0%
      07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1%
      07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2%
      07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3%
      07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4%
      07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5%
      07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6%
      07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7%
      07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8%
      07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9%
      07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10%
      07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11%
      07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12%
      07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13%
      07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14%
      07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15%
      07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16%
      07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17%
      07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18%
      07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19%
      07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20%
      07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21%
      07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22%
      07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23%
      07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24%
      07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25%
      07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26%
      07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27%
      07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28%
      07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29%
      07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30%
      07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31%
      07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32%
      07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35%
      07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response
      at org.apache.hadoop.ipc.Client.call(Client.java:473)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
      at $Proxy1.getJobStatus(Unknown Source)
      at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491)
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556)
      at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188)
      at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140)
      at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

      07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97%
      07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98%
      07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99%
      07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100%
      07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029

      Attachments

        1. 1400.patch
          18 kB
          Owen O'Malley

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            omalley Owen O'Malley
            nidaley Nigel Daley
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment