Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Python's validator silently accepts (generic) records with extra fields and considers them valid.
For example, io.validate silently considers that the schema:
{"type": "record", "name": "Test", "fields": [{"name": "f", "type": "long"}]}
should accept records like:
{'f': 5, 'extra_field': "abc"}
but this is problematic.
This is especially problematic for encoding unions, because internally the Python serializer uses validate to find the appropriate schema with which to encode a given object.
In the current implementation, union schema selection is the last schema that validate(schema, obj) returns True for. If validate isn't picky, this encoding will frequently guess wrong.
I will attach two patches: one to the tests and one to the validate function.