Description
I was able to isolate a memory leak in the new schema resolver
(resolved-writer.c) in Avro C on the avro-trunk (v1.6.0), in which the
code attempts to access and free memory that has already been freed.
There is a simple one-line fix to this issue, using the reference
counting mechanism provided in Avro, attached in the patch file.
This patch is different from the one I originally sent to the Avro Dev
mailing list, and I think my new patch is better.
Longer Description:
I created a small test program to test schema resolution in C. The
program contains a writer schema with two integer elements and a reader
schema with three integer elements. Since the reader schema has elements
that the writer doesn't contain, and default values are not yet
supported in C, the function try_record() goes into the "error" section
(by design) and tries to clean up after itself. This cleanup process has
a memory bug in it.
Specifically, if the reader schema contains two elements of the same
type (in my case two ints), try_record() uses the same resolver for
both elements. This common resolver is used because the function
avro_resolved_writer_new_memoized() tests to see if the schema's of
the two elements are the same, and if they are matched, it returns the
resolver, and does not create a new resolver. It also does not
increment the reference count of this existing resolver. So, multiple
elements of the array of field_resolvers[] can contain pointers to the
same resolver, and these reference are not accounted for in the
reference count.
During the cleanup process, there is a loop that checks if the pointer
field_resolvers[i] is non-NULL, and tries to free all non-NULL
resolvers. Since we have two (or more) elements of the
field_resolvers[] array pointing to the same resolver, and the
reference count of the resolver doesn't count these additional
references, the code tries to call avro_value_iface_decref() on an
already freed resolver.
This can be simply fixed as follows: Increment the reference count
each time you assign a resolver to the field_resolvers[] array, even
when you are re-assigning the same resolver to a new element in the
array.
I verified this patch using valgrind. Multiple valgrind errors were
seen in try_record() and avro_refcount_dec() before applying the
patch, and these errors went away after applying the patch.
In an earlier email to the avro dev mailing list, I provided a patch
which didn't use the reference count mechanism. Instead it checked the
field_resolvers[] array for duplicate entries, and set the duplicates
to NULL. This patch used more code, but had a more localized impact –
in that it was exercised only when an error condition was
encountered. I can provide this patch if anyone wants it.