This proposal is to build schema/cql analyser which can help users to analyze their queries before it is too late.
User may create stability issues by
- Running expensive queries against cluster like , SELECT * or query without where clause or IN clause with many values etc.
- Creating not so optimal schemas
- Leaving scope for data loss or schema with performance issue (keyspace with durable writes set to false or table with many secondary indexes etc...).
Most of the times these Dos & Don'ts go into some knowledge base/documentation as best practices. Having rules for best practices (which user can execute against statements) can help to avoid bad schema/queries getting executed in cluster. The main idea is to enable the users to take corrective actions, by
1) Allowing a user to validate a DDL/DML statements before it is applied/executed.
2) Allowing a user to validate existing schema/queries.
Imo, a validation result should:
1. Have severity
2. Tell where it hurts like instance/replica set/cluster.
3. Tell if it causes data loss.
4. Tell the strategy to recover.
Few approaches I can think of:
1. Write validation rules at server side + have a new type of statement to run validations (something like MySQL's EXPLAIN) and return validation results/errors.
2. Keep validation rules in sidecar + expose a service to run validations. In this case user can submit his statements to this API and get validation results.
3. Expose a UI in sidecar which accepts statements and run validations. Validation rules can be with UI or UI can make either of above options.
Open for any other approach.