Description
In this JIRA we want to discuss how to implement multi-active high availability in Livy.
Currently, Livy only supports single node recovery. This is not sufficient in some production environments. In our scenario, the Livy server serves many notebook and JDBC services. We want to make Livy service more fault-tolerant and scalable.
There're already some proposals in the community for high availability. But they're not so complete or just for active-standby high availability. So we propose a multi-active high availability design to achieve the following goals:
- One or more servers will serve the client requests at the same time.
- Sessions are allocated among different servers.
- When one node crashes, the affected sessions will be moved to other active services.
Here's our design document, please review and comment:
https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
Attachments
Issue Links
1.
|
A Common Zookeeper Wrapper Utility | Resolved | Jie Wang |
|
||||||||
2.
|
Distributed Session ID Generation | Open | Unassigned |
|
||||||||
3.
|
Session Allocation with Consistent Hashing | Open | Unassigned | |||||||||
4.
|
Session Allocation with server-session Mapping | Open | Unassigned | |||||||||
5.
|
Server Registration | Open | Unassigned |
|
||||||||
6.
|
Support Session Lazy Recover | Open | Unassigned |
|
||||||||
7.
|
Support Route Request | Open | Unassigned | |||||||||
8.
|
Support Service Discovery | Open | Unassigned | |||||||||
9.
|
Support getAllSessions in Cluster | Open | Unassigned | |||||||||
10.
|
Fix RPC Channel Closed When Multi Clients Connect to One Driver | Resolved | Jie Wang |
|
||||||||
11.
|
Livy Job API support multi-active HA | Open | Unassigned |