Description
There is an issue with the class SchemaCompatibilityResult, defined in compatibility.py:
class SchemaCompatibilityResult: def __init__( self, compatibility: SchemaCompatibilityType = SchemaCompatibilityType.recursion_in_progress, incompatibilities: Optional[List[SchemaIncompatibilityType]] = None, messages: Optional[Set[str]] = None, locations: Optional[Set[str]] = None, ): self.locations = locations or {"/"} self.messages = messages or set() self.compatibility = compatibility self.incompatibilities = incompatibilities or []
Here, locations and messages are defined as python sets and therefore are unordered. When a compatibility check is made between a reader and a writer schema, the check is made recursively, and results of the above type are merged together for each incompatibility found. The problem is that locations and messages must go in pairs, while they are defined as separate attributes, and are currently merged as follows, see compatibility.py:
def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) -> SchemaCompatibilityResult: ... messages = this.messages.union(that.messages) locations = this.locations.union(that.locations) ...
Since python sets are not ordered, it is possible to get messages that are not in sync with their locations.
Proposed solution
Encapsulate location and message into a simple data class (or named tuple) to keep these two pieces of information together.
Attachments
Issue Links
- is related to
-
AVRO-1751 Add support for Schema Compatibility check in python API
- Open