[KNOX-843] Add support for load balancing multiple clients across backend service instances - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: Server
Labels:
None

Description

Summary
Knox should support load balancing multiple clients across multiple backend service instances.

Problem Statement
The HA mechanism in Knox today only allows failover. What this means is that for every client that connects to Knox, the same backend is chosen even if there are multiple. This single backend will be used until it fails and then Knox will chose another backend sending all clients to this new backend.

The above design is simple, but has a few limitations:

Even if the number of backends scale - there is no way for Knox to utilize this
If any client causes a backend error - it causes Knox to failover all clients to a new backend

Background Information
There are 2 types of backend services - stateful and stateless. Stateful backend services (like HiveServer2) require that once a client initiates a session that client is always sent to that same backend. Stateless backend services (like HBase REST) don't require clients to always be sent to the same backend. When stateful backend services are being used, load balancing can only be done across multiple clients.

There are two options to do stateful load balancing:

Maintain the state at the server level and match clients to state
Maintain the state at the client level and use that on the server side

Maintaining the state on the client side is more beneficial since then it can be used across multiple Knox instances even in the case of Knox failover.

Solution
One option to maintaining client state, that is described in the original design doc here: KNOX-843.pdf, is to store state in a cookie that gets sent back on the response to the client. The general flow is as follows:

clientA makes a request no KNOX_BACKEND cookie
Knox checks if HA is enabled and dispatches request to a random backend
On response to clientA - sets KNOX_BACKEND cookie to hash of server who serviced the response
On second request clientA sends back the KNOX_BACKEND cookie
Knox looks at request and dispatches request to backend that matches hash of KNOX_BACKEND cookie
if backend is down, Knox updates KNOX_BACKEND cookie with server that handled request on response

This approach requires that clients support cookies and many do since this is how many load balancers typically work today [1].

Some solution benefits include:

this approach is used by load balancers like haproxy [1] and Traefik [2]
Multiple clients can effectively use all the available backend services.
A single backend going down won't affect all clients using Knox. Only affects clients using that single backend.
Client side cookie could be used against different Knox servers. This allows for Knox restarts where clients bounce to another Knox server and maintain the backend session state.

Some solution limitations include:

clients must support cookies - this should be the case today since many LB require cookies for sticky sessions. So if a user is concerned with HA, there is most likely a sticky session LB in front of Knox today.

Implementation
The HAProvider provides a per topology per service configuration to opt into this new feature. Since this is a change in behavior we would want this to be an opt in optimization. The HAProvider would provide a config that the dispatch can use to decide to insert and use cookies to influence the dispatched request.

The cookie being sent back to the client needs to hide backend service information so internal host details are not leaked. One option is to hash the backend before placing in the cookie.

This shouldn't affect failover since failover should be internal to Knox. The KNOX_BACKEND cookie should be updated on the response to the client with the server that handled the request in the case of a failover. The state changes should be on the client side response from Knox. There are some edge cases here were failover will need to be decided potentially per client instead of for the entire backend, but that shouldn't cause a big issue here.

[1] https://www.haproxy.com/blog/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/
[2] https://docs.traefik.io/routing/services/#sticky-sessions

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

KNOX-843.002.patch
07/Jun/17 19:22
29 kB
Jeffrey E Rodriguez
KNOX-843.001.patch
22/May/17 22:44
29 kB
Jeffrey E Rodriguez
KNOX-843.pdf
22/May/17 22:44
335 kB
Jeffrey E Rodriguez
KNOX-843.patch
27/Feb/17 04:54
29 kB
Jeffrey E Rodriguez

Issue Links

causes

KNOX-2768 HA Loadbalancing logic should not affect failover urls

Open

KNOX-2545 The new configuration enableStickySession should not loadbalance requests.

Resolved

is a parent of

KNOX-2510 Need UI support for HA sticky session and loadbalancing properties

Open

is duplicated by

KNOX-2054 Accomodate APIs with multiple endpoints

Closed

is related to

KNOX-2511 Consolidate HA Dispatches

Resolved

KNOX-2346 Remove unused maxRetryAttempts and retrySleep

Resolved

links to

GitHub Pull Request #315

GitHub Pull Request #380

mentioned in: Page Loading...

(1 is related to, 2 links to, 1 mentioned in)

Activity

People

Assignee:: Sandeep More

Reporter:: Sam Hjelmfelt

Votes:: 3 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 05/Jan/17 00:44

Updated:: 23/Jun/22 14:08

Resolved:: 28/Oct/20 20:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: