Avro
  1. Avro
  2. AVRO-984

Avro C - Resolved reader fails to read nested arrays and reads uninitialized memory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1, 1.6.2, 1.7.0
    • Fix Version/s: 1.6.2, 1.7.0
    • Component/s: c
    • Labels:
      None
    • Environment:

      GNU/Linux Ubuntu 11.10 64-bit

      Description

      Summary:

      I created a test program that creates an avro value corresponding to
      the following schema:

      {"type":"array", "items": {"type": "array", "items": "long"}}

      and tries to read it back using a matched and resolved reader. The
      matched reader is able to reconstruct the avro value, but the resolved
      reader fails to read the value.

      Additionally valgrind indicates that conditional jumps are being
      performed based on uninitialized memory, when trying to reconstruct
      the value using the resolved reader.

      More Details:

      I created a test program that creates an avro value corresponding to
      the following writer schema:

      {"type":"array", "items": {"type": "array", "items": "long"}}

      The avro value is serialized to memory.

      Then this memory is read back into two readers. In both cases the
      reader schema is set to be identical to the writer schema.

      The first reader is a matched reader – i.e. the reader knows that the
      writer and reader schema are identical.

      The second reader is a resolved reader – i.e. the reader tries to
      resolve differences between the writer and reader schema. The schemas
      should resolve perfectly, since the writer and reader schema are
      identical.

      When we try to deserialize the binary buffer with the matched reader,
      the value is reconstructed perfectly.

      When we try to deserialize the binary buffer with the resolved reader,
      the code fails to populate the avro value. This failure indicates that
      the resolved reading of the nested array not working
      properly. Additionally valgrind indicates that conditional jumps are
      being performed based on uninitialized values, in this second case.

      I will attach a test program that shows the issues.

      1. avro-984-test-v2.c
        14 kB
        Vivek Nadkarni
      2. resolved-writer.patch
        0.6 kB
        Douglas Creager
      3. avro-984-valgrind-output.txt
        18 kB
        Vivek Nadkarni
      4. avro-984-output.txt
        1 kB
        Vivek Nadkarni
      5. avro-984-test.c
        13 kB
        Vivek Nadkarni

        Activity

        Hide
        Douglas Creager added a comment -

        Committed to SVN. Thanks, Vivek!

        Show
        Douglas Creager added a comment - Committed to SVN. Thanks, Vivek!
        Hide
        Vivek Nadkarni added a comment -

        I have updated the test program, so that it can be included in the Avro C tests. It contains the following changes:

        1. Exit with exit code EXIT_FAILURE, if the program fails to run to completion.

        2. Unlink the temporary binary file created by the program, if the program runs to completion. (If it fails, the binary file is not removed.)

        3. Include the Apache License boilerplate at the top of the file.

        I have named the file avro-984-test-v2.c to differentiate it from avro-984-test.c. However, feel free to rename the file appropriately before including it into the Avro tests.

        Thanks,
        Vivek

        Show
        Vivek Nadkarni added a comment - I have updated the test program, so that it can be included in the Avro C tests. It contains the following changes: 1. Exit with exit code EXIT_FAILURE, if the program fails to run to completion. 2. Unlink the temporary binary file created by the program, if the program runs to completion. (If it fails, the binary file is not removed.) 3. Include the Apache License boilerplate at the top of the file. I have named the file avro-984-test-v2.c to differentiate it from avro-984-test.c. However, feel free to rename the file appropriately before including it into the Avro tests. Thanks, Vivek
        Hide
        Douglas Creager added a comment -

        Vivek discovered that the resolved-writer array implementation wasn't initializing values correctly when they were appended to the array. For arrays of primitives, this wasn't a problem, since the space for the array element would be overwritten by the new value once you called the appropriate set method. For arrays of arrays, however, you'd get some uninitialized memory for the sub-array. I'm attaching a patch that fixes this. I've also asked Vivek to make a couple of small modifications to his test program so that we can include it in the test suite, to help us avoid seeing this as a regression in the future.

        Show
        Douglas Creager added a comment - Vivek discovered that the resolved-writer array implementation wasn't initializing values correctly when they were appended to the array. For arrays of primitives, this wasn't a problem, since the space for the array element would be overwritten by the new value once you called the appropriate set method. For arrays of arrays, however, you'd get some uninitialized memory for the sub-array. I'm attaching a patch that fixes this. I've also asked Vivek to make a couple of small modifications to his test program so that we can include it in the test suite, to help us avoid seeing this as a regression in the future.
        Hide
        Douglas Creager added a comment -

        I think that the function avro_resolved_array_writer_init() should
        recursively call the init function on its items, but I am not sure how
        to go about doing this yet. This would be similar to how the function
        avro_resolved_record_writer_init() recursively calls the init function
        on its children.

        Show
        Douglas Creager added a comment - I think that the function avro_resolved_array_writer_init() should recursively call the init function on its items, but I am not sure how to go about doing this yet. This would be similar to how the function avro_resolved_record_writer_init() recursively calls the init function on its children.
        Hide
        Vivek Nadkarni added a comment -

        I did a few more tests, and found the following:

        1. When I nest an array inside an array, i.e.
        {"type":"array", "items": {"type": "array", "items": "long"}}
        the function avro_resolved_array_writer_init() is called only once
        for the top level array. (This is the schema in the attached test program.)

        2. When I place an array inside a record, i.e.
        {"type" : "record",
        "name" : "myrecord",
        "fields" : [

        { "name" : "myfield", "type":"long"}

        ,
        { "name" : "myarray", "type":

        { "type":"array", "items" : "long"}

        }
        ] }
        the functions avro_resolved_record_writer_init(), initializes all
        of its children, and in that process calls avro_resolved_array_writer_init().

        3. When I place a record inside an array, i.e.
        {"type":"array", "items": {"type" : "record",
        "name" : "myrecord",
        "fields" : [

        { "name" : "myfield", "type":"long"}

        ] } }
        The function avro_resolved_array_writer_init() is called but the function
        avro_resolved_record_writer_init() is not called.

        I think that the function avro_resolved_array_writer_init() should
        recursively call the init function on its items, but I am not sure how
        to go about doing this yet. This would be similar to how the function
        avro_resolved_record_writer_init() recursively calls the init function
        on its children.

        For comparision when resolving schemas, the function try_array() calls
        avro_resolved_writer_new_memoized() recursively on the schemas of its
        items.

        I would appreciate any ideas on how to go about recursively calling the
        init functions on the array items, and also any comments on whether that
        indeed is the way to fix the behavior I am seeing.

        Thanks,
        Vivek

        Show
        Vivek Nadkarni added a comment - I did a few more tests, and found the following: 1. When I nest an array inside an array, i.e. {"type":"array", "items": {"type": "array", "items": "long"}} the function avro_resolved_array_writer_init() is called only once for the top level array. (This is the schema in the attached test program.) 2. When I place an array inside a record, i.e. {"type" : "record", "name" : "myrecord", "fields" : [ { "name" : "myfield", "type":"long"} , { "name" : "myarray", "type": { "type":"array", "items" : "long"} } ] } the functions avro_resolved_record_writer_init(), initializes all of its children, and in that process calls avro_resolved_array_writer_init(). 3. When I place a record inside an array, i.e. {"type":"array", "items": {"type" : "record", "name" : "myrecord", "fields" : [ { "name" : "myfield", "type":"long"} ] } } The function avro_resolved_array_writer_init() is called but the function avro_resolved_record_writer_init() is not called. I think that the function avro_resolved_array_writer_init() should recursively call the init function on its items, but I am not sure how to go about doing this yet. This would be similar to how the function avro_resolved_record_writer_init() recursively calls the init function on its children. For comparision when resolving schemas, the function try_array() calls avro_resolved_writer_new_memoized() recursively on the schemas of its items. I would appreciate any ideas on how to go about recursively calling the init functions on the array items, and also any comments on whether that indeed is the way to fix the behavior I am seeing. Thanks, Vivek
        Hide
        Vivek Nadkarni added a comment -

        This test program (avro-984-test.c) demonstrates failure of the resolver to resolve a nested array.

        The output of the program is in avro-984-output.txt and the output of running valgrind is in avro-984-valgrind-output.txt.

        Show
        Vivek Nadkarni added a comment - This test program (avro-984-test.c) demonstrates failure of the resolver to resolve a nested array. The output of the program is in avro-984-output.txt and the output of running valgrind is in avro-984-valgrind-output.txt.

          People

          • Assignee:
            Unassigned
            Reporter:
            Vivek Nadkarni
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development