Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
While getting familiar with the SolreCore + CodecFactory + SchemaCodecFactory + FieldType related code relevant to SOLR-17045, SOLR-17046, & SOLR-17047 It occurred to me that there is a lot of ineffeciencies and kludginess to how FieldType based "codec overrides" are used (and validated) by SchemaCodecFactory (and SolrCore.initCodec) :
- SolrCore.initCodec needs to be aware of all the possible ways a FieldType instance might support codec overrides
- ... so it can fail if any are specified unless the CodecFactory instanceOf SolrCoreAware
- ... even though that still doesn't ensure the factory supports those field type overrides
- This validation currently just looks at getPostingsFormatForField & getDocValuesFormatForField
- ... it's ignorant about DenseVectorField 's assumptions about being able to override aspects of the KnnVectorsFormat
- ... and AFAICT, what validation is don't doesn't help if the Schema API is used to add new field types (w/ postingsFormat or docValuesFormat overrides)
- ... so it can fail if any are specified unless the CodecFactory instanceOf SolrCoreAware
- in all of the the SchemaCodecFactory "per-field" methods (getPostingsFormatForField, getDocValuesFormatForField, & getKnnVectorsFormatForField) ...
- ... every call to these methods resolves a SchemaField instance – even though only the (Solr) FieldType is needed
- Asking the IndexSchema for the SchemaField of a fieldName has more overhead then just asking for the FieldType
- None of the things these methods care about can be configured on a per-fieldName bassis anyway.
- For PostingsFormat and DocValuesFormat, every call to these methods repeats the SPI lookup on the "format name" configured on the FieldType instance
- For KnnVectorsFormat every call to this method constructs a new SolrDelegatingKnnVectorsFormat – even though the same instance could be re-used for every field of the same FieldType instance.
- ... every call to these methods resolves a SchemaField instance – even though only the (Solr) FieldType is needed
- In FieldType ...
- ... there is no validation anywhere that the postingsFormat or docValuesFormat are valid
- ... bogus values only cause a problem when the SchemaCodecFactory tries to resolve them (when indexing)
- ... there is no validation anywhere that the postingsFormat or docValuesFormat are valid
- In DenseVectorField ...
- ... checkSchemaField validates (and logs warnings) based on the vectorEncoding and dimensions...
- ... Even though these validations aren't "field" specific – they are "type" specific, and could be validated in DenseVectorField.init()
- BUT! ... there is no validation anywhere that the knnAlgorithm is supported, or that the HNSW options make sense for it
- These are only validated by the Codec.getKnnVectorsFormatForField(...) impl provided by SchemaCodecFactory ...
- ... and they are redundenly validated on every call
- These are only validated by the Codec.getKnnVectorsFormatForField(...) impl provided by SchemaCodecFactory ...
- ... checkSchemaField validates (and logs warnings) based on the vectorEncoding and dimensions...
Attachments
Issue Links
- relates to
-
SOLR-17047 (SolrCore's) CodecFactory validation ignores schema based KnnVectorsFormat options on init
-
- Open
-
-
SOLR-17045 DenseVectorField w/ vectorDimension > 1024 no longer works by default
-
- Closed
-
-
SOLR-17046 SchemaCodecFactory should be the implicit default if no <codeFactory/> is configured
-
- Closed
-