[ZEPPELIN-3612] Cluster High availability module design - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0
Fix Version/s: 0.9.0
Component/s: zeppelin-server
Labels:
None

Flags:

Important
Docs Text:
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.wkegoo48k5hd

Description

In the case of a partial Zeppelin-Server service process or server exception, the service can continue to be served; the Zeppelin-Server service can sense the availability of all Interpreter processes in the server cluster;

1. Raft protocol

The Raft protocol ensures that only N/2+1 servers in the cluster need to be in a normal state without affecting the service.

2. Interpreter process monitoring

The Interpreter process creates a process heartbeat thread through the ClusterMonitor class, and periodically sends the Interpreter process heartbeat information and the IP and port information of the Thrift interface to the cluster.

When the Interpreter process is closed, the process of deleting the process metadata is sent to the cluster.

The process health check thread is created in the Zeppelin-Server through the ClusterMonitor class. The heartbeat of all Interpreter processes in the Cluster MetaData is periodically checked. If the timeout expires, the process metadata is deleted to prevent the Interpreter process from being abnormal.

3. Interpreter process rebuild

When the interpreter process is created, the Zeppelin-Server detects the session information of the Interpreter process. First, it checks whether the process is valid. If it is not available, the corresponding session is cleared and the Interpreter process is re-created. Preventing an Interpreter process from being abnormal or a server exception on the process causes Interpreter to be unavailable.

Attachments

Activity

People

Assignee:: Xun Liu

Reporter:: Xun Liu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Jul/18 11:13

Updated:: 24/Dec/20 03:16

Resolved:: 29/Jun/19 02:52