Avro
  1. Avro
  2. AVRO-1528

avro_schema_enum_get returns invalid pointer for unknown indices

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: 1.7.6
    • Fix Version/s: None
    • Component/s: c
    • Labels:
      None

      Description

      When avro_schema_enum_get() is called with invalid indices it returns an invalid pointer instead of NULL, this leads to segfaults. Similar problem is already handled in avro_schema_enum_get_by_name() when converting the other way round, avro_schema_enum_get() could handle it likewise.

      1. AVRO-1528.patch
        1 kB
        Jeno I. Hajdu

        Activity

        Hide
        Jeno I. Hajdu added a comment -

        I have attached a patch with the fix

        Show
        Jeno I. Hajdu added a comment - I have attached a patch with the fix
        Hide
        Douglas Creager added a comment -

        What's the use case for this? The reason I ask is that in C is you don't get a bounds check when you access an array, and so we usually don't do a bounds check when you access something modeled after an array either. That way you get the absolute fastest performance when you're working with safe data — and if you're not working with safe data, it's your responsibility to perform the extra checks.

        There was a similar request in AVRO-1237, where we decided not to do the bounds check in the general function for exactly this reason. We ended up adding a bounds check to the file reader, though, since that function is dealing with unsafe data.

        So instead of changing this function's precondition ("you're responsible for providing a valid index"), I'd suggest we find the code that violates the precondition and fix that. I can also add a documentation patch, since I don't think there's anything that explicitly documents this precondition.

        Show
        Douglas Creager added a comment - What's the use case for this? The reason I ask is that in C is you don't get a bounds check when you access an array, and so we usually don't do a bounds check when you access something modeled after an array either. That way you get the absolute fastest performance when you're working with safe data — and if you're not working with safe data, it's your responsibility to perform the extra checks. There was a similar request in AVRO-1237 , where we decided not to do the bounds check in the general function for exactly this reason. We ended up adding a bounds check to the file reader, though, since that function is dealing with unsafe data. So instead of changing this function's precondition ("you're responsible for providing a valid index"), I'd suggest we find the code that violates the precondition and fix that. I can also add a documentation patch, since I don't think there's anything that explicitly documents this precondition.
        Hide
        Jeno I. Hajdu added a comment -

        OK, so the scenarios is that we have versioned data (schemas for the various versions auto generated) where we would want to use enums and would read the data using a manually assembled reader schema. We would use the reader schema to convert enums to string instead of the writer schema to avoid leaking specifics of a given version. Now if the specific writer schema has additional items for an enum compared to what we have in our reader schema we will eventually try to look up such unknown indices.

        Theoretically we could work around this in other ways, but Avro-C is also lacking in that regard, I am OK, if we choose to fix that. For example I haven't found any specific schema function or straightforward approach which could tell me how many items an enum has. Otherwise I could do the bound checking myself before the lookup.
        This also applies to the writer side, avro_value_set_enum() does not do any bound checking based on the schema, which is understandable due to performance, but there's no way for manual bound checking prior the writing either.

        So, as an alternative proposal what do you think of adding avro_schema_enum_get_item_count(const avro_schema_t), which returns the num_entries of the symbols st_table? Then all the enum bound checkings can be left to the users if they opt to want that.

        If you are OK with this I can attach a new patch with that one.

        Show
        Jeno I. Hajdu added a comment - OK, so the scenarios is that we have versioned data (schemas for the various versions auto generated) where we would want to use enums and would read the data using a manually assembled reader schema. We would use the reader schema to convert enums to string instead of the writer schema to avoid leaking specifics of a given version. Now if the specific writer schema has additional items for an enum compared to what we have in our reader schema we will eventually try to look up such unknown indices. Theoretically we could work around this in other ways, but Avro-C is also lacking in that regard, I am OK, if we choose to fix that. For example I haven't found any specific schema function or straightforward approach which could tell me how many items an enum has. Otherwise I could do the bound checking myself before the lookup. This also applies to the writer side, avro_value_set_enum() does not do any bound checking based on the schema, which is understandable due to performance, but there's no way for manual bound checking prior the writing either. So, as an alternative proposal what do you think of adding avro_schema_enum_get_item_count(const avro_schema_t), which returns the num_entries of the symbols st_table? Then all the enum bound checkings can be left to the users if they opt to want that. If you are OK with this I can attach a new patch with that one.
        Hide
        Jeno I. Hajdu added a comment -

        Hi Doug,

        what do you think of the alternative proposal? I have attached a patch which adds avro_schema_enum_number_of_symbols() so that clients can do bounds checking in a straightforward way.

        Show
        Jeno I. Hajdu added a comment - Hi Doug, what do you think of the alternative proposal? I have attached a patch which adds avro_schema_enum_number_of_symbols() so that clients can do bounds checking in a straightforward way.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jeno I. Hajdu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development