[SOLR-14192] Race condition between SchemaManager and ZkIndexSchemaReader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 8.4
Fix Version/s: 8.5
Component/s: None
Labels:
None

Description

Spin-off from ~~SOLR-14128~~ and ~~SOLR-13368~~.

In SolrCloud when a SolrCore is created and it uses managed schema then its ManagedIndexSchemaFactory performs an automatic upgrade of the initial schema.xml to managed-schema. This includes removing the original schema.xml file.

~~SOLR-13368~~ added some locking to make sure the changed resource name (i.e. managed-schema) becomes visible only when this process is complete, and that in-flight requests to /admin/schema block until this process is complete, to avoid returning inconsistent data. This locking mechanism uses simple Object monitors.

However, if there's more than 1 node in the cluster the subsequent request to retrieve schema may execute on a core that still hasn't reloaded its schema (ZkIndexSchemaReader uses a ZK watcher, which may take some time to trigger), and the resource name in that stale schema still points to schema.xml, which by this time no longer exists because it was removed by ManagedIndexSchemaFactory in the first core.

As I see it there are two bugs here:

there's no distributed locking when this upgrade is performed, so it's natural that there are multiple cores racing against each other to perform this upgrade.
the upgrade process removes schema.xml too early - it triggers all other cores by creating the managed-schema file, and then other cores reload from the new managed schema - but it should wait until this reload is complete on all cores because only then it's safe to delete the non-managed resource as it's no longer in use by any core.

Issue 1. can be solved by adding an ephemeral znode lock so that only one core can perform the upgrade. Issue 2. can be solved by using ManagedIndexSchema.waitForSchemaZkVersionAgreement after upgrade, and deleting schema.xml only after it's done.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-14192.patch
16/Jan/20 20:36
6 kB
Andrzej Bialecki

Activity

People

Assignee:: Andrzej Bialecki

Reporter:: Andrzej Bialecki

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Jan/20 17:40

Updated:: 24/Mar/20 12:46

Resolved:: 21/Jan/20 11:29