Type: New Feature
Affects Version/s: 0.6.0
Fix Version/s: None
This is a proposal for livy cluster support, it is designed to be a light solution for livy cluster which can compatible with non-cluster livy and different HA level.
Server ID: integer configured by livy.server.id, with default value 0 which is standalone mode. Lagest server id is 213, reason described below.
Session ID: session ID is generated as 3 digit server ID and 7 digit auto-increment integer. As the biggest integer is 2,147,483,647, so largest server ID is 213 and each server can have 9,999,999 sessions. Limitation here is each cluster can have most 213 instance which I think is enough. For standalone mode, as server id is 0, so works the same.
Zookeeper: as zookeeper is required by config livy.server.zookeeper.quorum
Server Registration: each server should register themself to zookeeper path /livy/server/
Leader: one of livy server is elected as leader of cluster on zookeeper path /livy/leader/
Coordination between servers: servers don't talk with earch other directly, server just detect which livy server the session lives on, and send http 307 redirect to the correct livy server. For example, if server A receive a request http://serverA:8999/sessions/1010000001, server A know session is on server B, then it send a 307 redirect to http://serverB:8999/sessions/1010000001.
Session HA: consider server failure case, user should be able to decide if want to keep sessions on failure server. This lead to two different mode:
Non session HA mode:
With this mode, session lost when sever failed(but it can still work with session-store, recover session when server get well)
request redirect: Sever detect session's correct server just by get the first 3 digit of session id.
Session HA mode:
With this mode, session information will be persistent to ZK store(reuse current session-store code), and recover in another server when server failed.
session registration: all sessions are registered to zk path /livy/session/type/
request redirect: each server detect correct server of session by read zk and then send 307 redirect, server may cache session-server pair for a while
server failure: cluster leader should detect server failure, reassign session on failure server to other servers, other server should recover session by read session information in ZK
server notification: server need to send msg to other server, for example leader send command to ask server to recover session. all such msgs are sent through zk, in path /livy/notification/
Thrift Server: Thrift session has thrift session ID, thrift session can be restored with same session ID, however current hive-jdbc doesn't support restore session. So for thrift server, just register server to ZK to implement same server discovery as hive have. Later when hive-jdbc support restore session, thrift server can use exisitng livy cluster solution to adapt hive-jdbc.