Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
There seem to be at least two possible thread race conditions that can lead /health?requireHealthyCores=true to returning false positive while CoreContainer is in the process of starting up.
- If the request comes in after CoreContainer has initialized healthCheckHandler but before initializing & running the coreLoadExecutor
- A more complex situation where the request comes in while coreLoadExecutor is loading cores, and all of the cores that have finished initialization are "active" in SolrCloud, but other SolrCores remain to be initialized (and may need recovery)
In both cases, the root of the issue is that requireHealthyCores=true works by checking...
Collection<CloudDescriptor> coreDescriptors =
coreContainer.getCores().stream()
.map(c -> c.getCoreDescriptor().getCloudDescriptor())
.collect(Collectors.toList());
long unhealthyCores = findUnhealthyCores(coreDescriptors, clusterState);
..but that means the only CloudDescriptor s that are checked are the ones that come from loaded cores (which is what coreContainer.getCores() returns). and any currentlyLoadingCores (registered by CoreContainer calling solrCores.markCoreAsLoading(cd) before starting the coreLoadExecutor ) are not considered.