[SOLR-14210] Add replica state option for HealthCheckHandler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 8.5
Fix Version/s: 8.6
Component/s: None
Labels:
None

Description

Background

As was brought up in ~~SOLR-13055~~, in order to run Solr in a more cloud-native way, we need some additional features around node-level healthchecks.

Like in Kubernetes we need 'liveliness' and 'readiness' probe explained in https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n determine if a node is live and ready to serve live traffic.

However there are issues around kubernetes managing it's own rolling restarts. With the current healthcheck setup, it's easy to envision a scenario in which Solr reports itself as "healthy" when all of its replicas are actually recovering. Therefore kubernetes, seeing a healthy pod would then go and restart the next Solr node. This can happen until all replicas are "recovering" and none are healthy. (maybe the last one restarted will be "down", but still there are no "active" replicas)

Proposal

I propose we make an additional healthcheck handler that returns whether all replicas hosted by that Solr node are healthy and "active". That way we will be able to use the default kubernetes rolling restart logic with Solr.

To add on to Jan's point here, this handler should be more friendly for other Content-Types and should use bettter HTTP response statuses.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

docs.patch
07/Apr/20 11:03
3 kB
Jan Høydahl

Issue Links

is part of

SOLR-13055 Introduce check to determine "liveliness" of a Solr node

Resolved

links to

GitHub Pull Request #1387

Activity

People

Assignee:: Jan Høydahl

Reporter:: Houston Putman

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/Jan/20 16:25

Updated:: 15/Jul/20 15:09

Resolved:: 14/Apr/20 15:12

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

4.5h