Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2906

Replace recursive validation with traversal-based solution for Python avro

    XMLWordPrintableJSON

Details

    Description

      The existing validation scheme for the Python implementation of avro is recursive.  This is problematic in Python because language support for recursion is not great, and because for more deeply nested schemas, recursion is inefficient.  Another issue with the current scheme is that error reporting for validation problems is generic. Unless a global variable in code is changed to allow errors for sub-schemas to be reported directly, the only report one gets is an exception that says the entire schema is invalid, which is not particularly useful when hunting bugs in serializing very large schemas.

      My proposal is to replace this existing validation approach with a new approach that uses breadth-first traversal of the schema for validation.  The approach solves the inefficiencies of recursion, and at the same time, allows for errors to be reported for the exact spot in the over-all schema where they happened.

      My implementation, in [this PR in github|https://github.com/apache/avro/pull/936] also moves validation from a mapping of type/logical_type to lambda functions into a validate method on each schema type, ensuring that a schema is responsible for validating itself.  

      Attachments

        Issue Links

          Activity

            People

              cewing Cristopher Ewing
              cewing Cristopher Ewing
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: