Currently, async indexes (Lucene) are marked corrupt if they couldn't be updated for some time. I think the default delay of 30 minutes is far too low. In the past, we have seen the following cases where async indexing is delayed:
- Topology problems (no leader)
- High load prevents updating them
For cases where index update is run, but failing, include:
- Restart of Oak with inconsistent data in the /repository/index directory
For these cases, it's better to stop Oak, clean the index directory, and restart. This might anyway be happening regularly (e.g. daily). Marking the index corrupt, so that reindexing is needed, doesn't seem to be needed or helping.
This setting is already configurable in org.apache.jackrabbit.oak.plugins.index.AsyncIndexerService: failingIndexTimeoutSeconds. But instead of changing the configuration everywhere, it's probably better to change the default value for failingIndexTimeoutSeconds in Oak, to 604800L = 60L * 60 * 24 * 7 (one week).