Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1343

Python: validate too permissive on records with extra fields

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.10.0
    • Component/s: python
    • Labels:
      None

      Description

      Python's validator silently accepts (generic) records with extra fields and considers them valid.

      For example, io.validate silently considers that the schema:

      {"type": "record",
       "name": "Test",
       "fields": [{"name": "f", "type": "long"}]}
      

      should accept records like:

      {'f': 5, 'extra_field': "abc"}

      but this is problematic.

      This is especially problematic for encoding unions, because internally the Python serializer uses validate to find the appropriate schema with which to encode a given object.

      In the current implementation, union schema selection is the last schema that validate(schema, obj) returns True for. If validate isn't picky, this encoding will frequently guess wrong.

      I will attach two patches: one to the tests and one to the validate function.

        Attachments

        1. AVRO-1343-tests.patch
          2 kB
          Jeremy Kahn
        2. AVRO-1343-validate.patch
          1 kB
          Jeremy Kahn

          Issue Links

            Activity

              People

              • Assignee:
                trochee Jeremy Kahn
                Reporter:
                trochee Jeremy Kahn
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: