This component is meant to help with troublesome queries, as it allows to perform a query on a subset of replicas (that could be isolated from user traffic) to verify if the query execution is safe.
Running this component is conditional and if enabled, will happen before other components start processing.
This is ensured by the fact that the whole execution of this component happens in the STAGE_PARSE_QUERY (SOLR-10609 proposes to rename this stage to a more meaningful name).
CanaryComponent reports the status of the analysis as a Boolean via the CanaryComponent.CANARY_SUCCESS response parameter and through a field in ResponseBuilder.
The same return convention applies to both:
- null/non-existing when the CanaryComponent did not execute the query
- true if the CanaryComponent processed the query and did not find any problem
- false if the query execution didn't terminate normally.
The CanaryComponent needs to be properly set-up before being used.
1) Tagging one or more replicas as "canary" replicas (Depends on SOLR-10880 and SOLR-10881):
Using replica properties, set a property (this can be independent from any other shard filtering property, but it is not compulsory for it to be such), to a value (canary type).
A collection can have many canaries of many types, for example:
- shard1replica3, shard2replica1 have the property canaryColour=yellow
- shard4replica2 has the property canaryColour=red
And so on.
There can be multiple canary replicas per shard.
CanaryComponentTest.java shows an example of such tagging.
2) The CanaryComponent needs to be added to the /select RequestHandler (Example included in the cloud-canary test config files).
Optional but encouraged: set the flag canary.timeout to a sensible Long (time in milliseconds) value, this will ensure that all the requests have a timeout specified.
Note: the timeout can be specified on a per-request basis.
This concludes the initial set-up.
For each request that needs to be run through the CanaryComponent the following parameters have to be added (depends on SOLR-10880):
A timeout needs to be specified, but for convenience it can be specified as mentioned in point 2.
Running a query on the canary without a timeout is not permitted, and an exception will be thrown.
This means that the request needs the replica filtering framework enabled (See SOLR-10880), that the canary requests will have to be routed to the replicas having the property birdColour set to yellow and that this request should timeout after 5 seconds.
An example of the requests can be seen in CanaryComponentTest.java
The request will only run on one canary replica, but if there are multiple replicas matching CANARY_TYPE_PROPERTY:CANARY_TYPE, a random one will be picked among them, should it be unreachable, another random one will be chosen, and so on.
Given the following list of replicas matching CANARY_TYPE_PROPERTY:CANARY_TYPE looking like
The component will rearrange them to look like this list
The request will be executed exactly how the QueryComponent would execute it (depends on
SOLR-11343), this is done to ensure that the analysis is as realistic as possible, however its execution will be performed in a separate thread.
This is done so that eventual exceptions thrown by the query can be caught, and that its time of execution can be monitored at a finer level, the execution of the query is halted as soon as an exception is detected or if it timed out.
CanaryComponent will clean its query results so that other components will not see partial results.
- Running a query with the parameter canary set, but without ShardParams.FILTER_BY_REPLICA_PROPERTY will cause the CanaryComponent to throw an exception.
- Running a query with the parameter canary set, but without any replica matching the property and tag specified will cause the CanaryComponent to throw an exception.
- Running a query with the parameter canary set to a value not matching the format CANARY_TYPE_PROPERTY:CANARY_TYPE will cause the CanaryComponent to throw an exception.