Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.7.2
-
None
-
None
Description
Consider a Node which has NN1 (NameNode) and HiveMetaStore is down but we have HA for both services. Running livy script will create a new session and will wait for ipc.client.connect.timeout (20s) for each jar upload into hdfs
17/07/31 13:59:29 INFO ContextLauncher: 17/07/31 13:59:39 INFO Client: Source and destination file systems are the same. Not copying hdfs://prabhu/hdp/apps/2.6.1.0-129/spark/spark-hdp-assembly.jar 17/07/31 13:59:49 INFO ContextLauncher: 17/07/31 13:59:49 INFO Client: Uploading resource file:/usr/hdp/current/livy-server/rsc-jars/livy-rsc-0.3.0.2.6.1.0-129.jar -> hdfs://prabhu/user/diasmi/.sparkStaging/application_1501501991083_0001/livy-rsc-0.3.0.2.6.1.0-129.jar
and 5 seconds (hive.metastore.client.socket.timeout)
17/07/26 09:09:46 INFO ContextLauncher: 17/07/26 09:09:46 INFO metastore: Trying to connect to metastore with URI thrift://prabhu01:9083 17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 WARN metastore: Failed to connect to the MetaStore Server... 17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 INFO metastore: Trying to connect to metastore with URI thrift://prabhu02:9083 17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 INFO metastore: Connected to metastore.
and finally will fail with timeout with Livy Server Connect Timeout. 90 Seconds is too low for this case. Zeppelin has to have a way for overriding this timeout configuration.
RPC_CLIENT_HANDSHAKE_TIMEOUT("server.connect.timeout", "90s")
17/07/31 14:00:51 ERROR RSCClient: Failed to connect to context. java.util.concurrent.TimeoutException: Timed out waiting for context to start. at com.cloudera.livy.rsc.ContextLauncher.connectTimeout(ContextLauncher.java:133) at com.cloudera.livy.rsc.ContextLauncher.access$200(ContextLauncher.java:62) at com.cloudera.livy.rsc.ContextLauncher$2.run(ContextLauncher.java:121) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745)
prabhujoseph This seems a livy internal configuration. Zeppelin's livy interpreter use livy's rest api to create session and execute code. So if livy rest api don't expose such configuration via its rest api, zeppelin can do nothing. I believe you need to create livy ticket (livy is in the process of donating to apache incubator. jira system may not be ready, but you can ask this in its dev mail list first).