Uploaded image for project: 'Apache Knox'
  1. Apache Knox
  2. KNOX-843

Add support for load balancing multiple clients across backend service instances

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • Server
    • None

    Description

      Summary
      Knox should support load balancing multiple clients across multiple backend service instances.

      Problem Statement
      The HA mechanism in Knox today only allows failover. What this means is that for every client that connects to Knox, the same backend is chosen even if there are multiple. This single backend will be used until it fails and then Knox will chose another backend sending all clients to this new backend.

      The above design is simple, but has a few limitations:

      • Even if the number of backends scale - there is no way for Knox to utilize this
      • If any client causes a backend error - it causes Knox to failover all clients to a new backend

      Background Information
      There are 2 types of backend services - stateful and stateless. Stateful backend services (like HiveServer2) require that once a client initiates a session that client is always sent to that same backend. Stateless backend services (like HBase REST) don't require clients to always be sent to the same backend. When stateful backend services are being used, load balancing can only be done across multiple clients.

      There are two options to do stateful load balancing:

      • Maintain the state at the server level and match clients to state
      • Maintain the state at the client level and use that on the server side

      Maintaining the state on the client side is more beneficial since then it can be used across multiple Knox instances even in the case of Knox failover.

      Solution
      One option to maintaining client state, that is described in the original design doc here: KNOX-843.pdf, is to store state in a cookie that gets sent back on the response to the client. The general flow is as follows:

      • clientA makes a request no KNOX_BACKEND cookie
      • Knox checks if HA is enabled and dispatches request to a random backend
      • On response to clientA - sets KNOX_BACKEND cookie to hash of server who serviced the response
      • On second request clientA sends back the KNOX_BACKEND cookie
      • Knox looks at request and dispatches request to backend that matches hash of KNOX_BACKEND cookie
      • if backend is down, Knox updates KNOX_BACKEND cookie with server that handled request on response

      This approach requires that clients support cookies and many do since this is how many load balancers typically work today [1].

      Some solution benefits include:

      • this approach is used by load balancers like haproxy [1] and Traefik [2]
      • Multiple clients can effectively use all the available backend services.
      • A single backend going down won't affect all clients using Knox. Only affects clients using that single backend.
      • Client side cookie could be used against different Knox servers. This allows for Knox restarts where clients bounce to another Knox server and maintain the backend session state.

      Some solution limitations include:

      • clients must support cookies - this should be the case today since many LB require cookies for sticky sessions. So if a user is concerned with HA, there is most likely a sticky session LB in front of Knox today.

      Implementation
      The HAProvider provides a per topology per service configuration to opt into this new feature. Since this is a change in behavior we would want this to be an opt in optimization. The HAProvider would provide a config that the dispatch can use to decide to insert and use cookies to influence the dispatched request.

      The cookie being sent back to the client needs to hide backend service information so internal host details are not leaked. One option is to hash the backend before placing in the cookie.

      This shouldn't affect failover since failover should be internal to Knox. The KNOX_BACKEND cookie should be updated on the response to the client with the server that handled the request in the case of a failover. The state changes should be on the client side response from Knox. There are some edge cases here were failover will need to be decided potentially per client instead of for the entire backend, but that shouldn't cause a big issue here.

      [1] https://www.haproxy.com/blog/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/
      [2] https://docs.traefik.io/routing/services/#sticky-sessions

      Attachments

        1. KNOX-843.pdf
          335 kB
          Jeffrey E Rodriguez
        2. KNOX-843.patch
          29 kB
          Jeffrey E Rodriguez
        3. KNOX-843.002.patch
          29 kB
          Jeffrey E Rodriguez
        4. KNOX-843.001.patch
          29 kB
          Jeffrey E Rodriguez

        Issue Links

          Activity

            People

              smore Sandeep More
              samhjelmfelt Sam Hjelmfelt
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h