Profiling Avro serialization in our union heavy schema shows some memory and throughput bottlenecks:
- Validation calls repeatedly allocate constant hashes
- Validation calls repeatedly allocate constant strings
- Validation calls are expensive and can be avoided when determining of a datum matches a null union member type (a common pattern for "optional" fields)
Optimizing these codepaths reduces memory allocations by 78% and improves throughput 1.9X in our encoding benchmarks. A Github PR is coming shortly.
Note: Encoding unions is still expensive because the code must determine which member of the union a datum is targeting. Allowing clients to explicitly specify this would speed up serialization even further but that requires a larger API change.