Avro
  1. Avro
  2. AVRO-930

Memory leak in resolved-writer.c in Avro C

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: c
    • Labels:
      None
    • Environment:

      All.

      Description

      I was able to isolate a memory leak in the new schema resolver
      (resolved-writer.c) in Avro C on the avro-trunk (v1.6.0), in which the
      code attempts to access and free memory that has already been freed.

      There is a simple one-line fix to this issue, using the reference
      counting mechanism provided in Avro, attached in the patch file.

      This patch is different from the one I originally sent to the Avro Dev
      mailing list, and I think my new patch is better.

      Longer Description:

      I created a small test program to test schema resolution in C. The
      program contains a writer schema with two integer elements and a reader
      schema with three integer elements. Since the reader schema has elements
      that the writer doesn't contain, and default values are not yet
      supported in C, the function try_record() goes into the "error" section
      (by design) and tries to clean up after itself. This cleanup process has
      a memory bug in it.

      Specifically, if the reader schema contains two elements of the same
      type (in my case two ints), try_record() uses the same resolver for
      both elements. This common resolver is used because the function
      avro_resolved_writer_new_memoized() tests to see if the schema's of
      the two elements are the same, and if they are matched, it returns the
      resolver, and does not create a new resolver. It also does not
      increment the reference count of this existing resolver. So, multiple
      elements of the array of field_resolvers[] can contain pointers to the
      same resolver, and these reference are not accounted for in the
      reference count.

      During the cleanup process, there is a loop that checks if the pointer
      field_resolvers[i] is non-NULL, and tries to free all non-NULL
      resolvers. Since we have two (or more) elements of the
      field_resolvers[] array pointing to the same resolver, and the
      reference count of the resolver doesn't count these additional
      references, the code tries to call avro_value_iface_decref() on an
      already freed resolver.

      This can be simply fixed as follows: Increment the reference count
      each time you assign a resolver to the field_resolvers[] array, even
      when you are re-assigning the same resolver to a new element in the
      array.

      I verified this patch using valgrind. Multiple valgrind errors were
      seen in try_record() and avro_refcount_dec() before applying the
      patch, and these errors went away after applying the patch.

      In an earlier email to the avro dev mailing list, I provided a patch
      which didn't use the reference count mechanism. Instead it checked the
      field_resolvers[] array for duplicate entries, and set the duplicates
      to NULL. This patch used more code, but had a more localized impact –
      in that it was exercised only when an error condition was
      encountered. I can provide this patch if anyone wants it.

      1. avro-930-test.c
        3 kB
        Vivek Nadkarni
      2. AVRO-930.patch
        0.4 kB
        Vivek Nadkarni

        Activity

        Hide
        Vivek Nadkarni added a comment -

        Here is the one line patch that increments the ref-count when an existing resolver is assigned to the resolvers[] array.

        Show
        Vivek Nadkarni added a comment - Here is the one line patch that increments the ref-count when an existing resolver is assigned to the resolvers[] array.
        Hide
        Vivek Nadkarni added a comment -

        The patch I attached is ready for review. Please let me know if you need any changes.

        Show
        Vivek Nadkarni added a comment - The patch I attached is ready for review. Please let me know if you need any changes.
        Hide
        Vivek Nadkarni added a comment -

        Test code for JIRA Issue AVRO-930.

        This program tests the error handling that arises when the reader
        schema contains fields that the writer schema has not written. The
        Avro Spec allows the reader to use default values for each field. The
        Avro C code does not currently support default values – this is
        listed as a TODO in the function try_record() in the file
        resolved-writer.c.

        This file was stored in a temporary directory under
        avro-trunk/lang/c/avro-930-test/avro-930-test.c

        It was compiled under Linux using:
        gcc -g avro-930-test.c -o avro930 -I../src -L../src -lavro

        The code was tested with valgrind using the command:
        valgrind -v --leak-check=full ./avro930

        Before the patch for AVRO-930, valgrind shows multiple errors. After
        the patch, no valgrind errors are seen.

        Show
        Vivek Nadkarni added a comment - Test code for JIRA Issue AVRO-930 . This program tests the error handling that arises when the reader schema contains fields that the writer schema has not written. The Avro Spec allows the reader to use default values for each field. The Avro C code does not currently support default values – this is listed as a TODO in the function try_record() in the file resolved-writer.c. This file was stored in a temporary directory under avro-trunk/lang/c/avro-930-test/avro-930-test.c It was compiled under Linux using: gcc -g avro-930-test.c -o avro930 -I../src -L../src -lavro The code was tested with valgrind using the command: valgrind -v --leak-check=full ./avro930 Before the patch for AVRO-930 , valgrind shows multiple errors. After the patch, no valgrind errors are seen.
        Hide
        Vivek Nadkarni added a comment -

        I should add, the desired behavior of the test case I just uploaded is to exit with EXIT_FAILURE, but without any memory problems in Valgrind.

        Show
        Vivek Nadkarni added a comment - I should add, the desired behavior of the test case I just uploaded is to exit with EXIT_FAILURE, but without any memory problems in Valgrind.
        Hide
        Doug Cutting added a comment -

        Upgrading to blocker as we prepare for 1.6.0.

        Show
        Doug Cutting added a comment - Upgrading to blocker as we prepare for 1.6.0.
        Hide
        Douglas Creager added a comment -

        Sorry for the delay in committing this — I was away over the weekend. This patch is in SVN trunk now. Thanks Vivek!

        Show
        Douglas Creager added a comment - Sorry for the delay in committing this — I was away over the weekend. This patch is in SVN trunk now. Thanks Vivek!

          People

          • Assignee:
            Douglas Creager
            Reporter:
            Vivek Nadkarni
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development