Affects Version/s: master (9.0)
Fix Version/s: None
Presently, Solr has a variety of timeouts for various connections or operations. These timeouts have been added, tweaked and refined and in some cases made configurable in an ad-hoc manner by the contributors of individual features. Throughout the history of the project. This is all well and good until one experiences a timeout during an otherwise valid use case and needs to adjust it.
This has also made managing timeouts in unit tests "interesting" as noted in SOLR-13389.
Probably nobody has the spare time to do a tour de force through the code and coordinate every single timeout, so in this ticket I'd like to establish a framework for categorizing time outs, a standard for how we make each category configurable, and then add sub-tickets to address individual timeouts.
The intention is that eventually, there will be no "magic number" timeout values in code, and one can predict where to find the configuration for a timeout by determining it's category.
Initial strawman categories (feel free to knock down or suggest alternatives):
- Feature-Instance Timeout: Timeouts that relate to a particular instantiation of a feature, for example a database connection timeout for a connection to a particular database by DIH. These should be set in the configuration of that instance.
- Optional Feature Timeout: A timeout that only has meaning in the context of a particular feature that is not required for solr to function... i.e. something that can be turned on or off. Perhaps a timeout for communication with an external ldap for authentication purposes. These should be configured in the same configuration that enables this feature.
- Global System Timeout: A timeout that will always be an active part of Solr these should be configured in a new <timeouts> section of solr.xml. For example the Jetty thread idle timeout, or the default timeout for http calls between nodes.
- Node Specific Timeout: A timeout which may differ on different nodes. I don't know of any of these, but I'll grant the possibility. These (and only these) should be set by setting system properties. If we don't have any of these, that's just fine .
- Client Timeout: These are timeouts in solrj code that are active in code running outside the server. They should be configurable via java api, and via a config file of some sort from a single location defined in a sysprop or sourced from classpath (in that order). When run on the server, the solrj code should look for a Global System Timeout setting before consulting sysprops or classpath.
Note that in no case is a hard-coded value the correct solution.
If we get a consensus on categories and their locations, then the next step is to begin adding sub tickets to bring specific timeouts into compliance. Every such ticket should include an update to the section of the ref guide documenting the configuration to which the timeout has been added (e.g. docs for solr.xml for Global System Timeouts) describing what exactly is affected by the timeout, the maximum allowed value and how zero and negative numbers are handled.
It is of course true that some of these values will have the potential to destroy system performance or integrity, and that should be mentioned in the update to documentation.
|Make Jetty timeouts configurable system wide||Open||Unassigned|
|Streaming Expressions experience a hard coded timeout||Open||Unassigned|