Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15859

HoS: Write RPC messages in event loop

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.3.0
    • Component/s: Hive, Spark
    • Labels:
      None
    • Environment:

      hadoop2.7.1
      spark1.6.2
      hive2.2

      Description

      Hive on Spark, failed with error:

      2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
      2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
      2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
      Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
      

      application log shows the driver commanded a shutdown with some unknown reason, but hive's log shows Driver could not get RPC header( Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).

      17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519)
      17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
      17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
      17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
      17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
      17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
      17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511)
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515)
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417)
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526)
      17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423)
      

      in hive's log,

      2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520)
      2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
      2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead.
      2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null).
      2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer).
      2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
      2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
      2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection.
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed.
      2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver.
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
      2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2
      2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
      2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
      2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
      2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
      2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info
      java.util.concurrent.TimeoutException
      at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
      at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
      at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
      at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
      at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
      2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
      java.lang.IllegalStateException: RPC channel is closed.
      at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?]
      at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      

      also in container's log, I find Driver still request for executors:

      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s).
      17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
      17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s).
      17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
      17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
      17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
      17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
      17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
      17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777
      17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777
      17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
      17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
      17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
      17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
      

      found only one ERROR in yarn application log:

      17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver
      17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver
      17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver
      17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver
      17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
      17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
      17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
      17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
      17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
      17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
      17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
      

      this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again.

        Attachments

        1. HIVE-15859.1.patch
          3 kB
          Rui Li
        2. HIVE-15859.2.patch
          3 kB
          Rui Li
        3. HIVE-15859.3.patch
          4 kB
          Rui Li

          Activity

            People

            • Assignee:
              lirui Rui Li
              Reporter:
              KaiXu KaiXu
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: