Avro
  1. Avro
  2. AVRO-465

C implementation requires you to know a file's schema before reading

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.0
    • Component/s: c
    • Labels:
      None

      Description

      The C implementation gives the user no way of reading the objects in a data file without knowing the file's schema ahead of time.

      While it does fill in the writers_schema part of the avro_file_reader_t on read, this field is not available to the API as it is left out of avro.h. Two options persent itself: 1) preserve the API as is and add a avro_schema_from_file_reader() function or 2) move the avro_file_reader_t and avro_file_writer_t structs to avro.h.

      A third option, that I don't approve of, is providing a function that reads from a datafile but uses the writers_schema in the reader given instead of requiring another schema to be passed into it. This is problematic because anyone using the API would have fewer debugging and testing options when dealing with interop datasets. Any problem that occurs might just be the schema in the file being off, or whatever.

        Activity

        Hide
        Jeff Hodges added a comment -

        A patch that creates the avro_schema_from_file_reader(avro_file_reader_t reader) function for bulling the writers_schema out of a avro_file_reader_t. (a.k.a. option 1).

        Show
        Jeff Hodges added a comment - A patch that creates the avro_schema_from_file_reader(avro_file_reader_t reader) function for bulling the writers_schema out of a avro_file_reader_t. (a.k.a. option 1).
        Hide
        Matt Massie added a comment -

        Jeff-

        I'd like to understand this problem a little better. The C implementation shouldn't require you know the file schema ahead of time.

        If you pass in NULL for the reader's schema, then the writer's schema will be used. This is a documentation bug since I don't explicitly explain this anywhere.

        Can you please try to read the data file with the reader's schema set to NULL?

        Btw, the relevant code is in datum_read.c around line 303

        if (readers_schema == NULL) {
             readers_schema = writers_schema;
        } else if (!avro_schema_match(writers_schema, readers_schema)) {
             return EINVAL;
        }
        
        Show
        Matt Massie added a comment - Jeff- I'd like to understand this problem a little better. The C implementation shouldn't require you know the file schema ahead of time. If you pass in NULL for the reader's schema, then the writer's schema will be used. This is a documentation bug since I don't explicitly explain this anywhere. Can you please try to read the data file with the reader's schema set to NULL? Btw, the relevant code is in datum_read.c around line 303 if (readers_schema == NULL) { readers_schema = writers_schema; } else if (!avro_schema_match(writers_schema, readers_schema)) { return EINVAL; }
        Hide
        Matt Massie added a comment -

        I just noticed your public tweet...

        "@wilhelmbierbaum True. Fuck the Avro C API."

        Are you referring to this issue specifically in your tweet or are there other issues you'd like addressed with the API? I'd love to hear details about how the C API could be improved.

        Show
        Matt Massie added a comment - I just noticed your public tweet... "@wilhelmbierbaum True. Fuck the Avro C API." Are you referring to this issue specifically in your tweet or are there other issues you'd like addressed with the API? I'd love to hear details about how the C API could be improved.
        Hide
        Jeff Hodges added a comment -

        What I meant was: it should be written in OCaml, obviously.

        Sorry. Lost my patience.

        Show
        Jeff Hodges added a comment - What I meant was: it should be written in OCaml, obviously. Sorry. Lost my patience.
        Hide
        Matt Massie added a comment -

        Did using NULL for the reader's schema work for you?

        Show
        Matt Massie added a comment - Did using NULL for the reader's schema work for you?
        Hide
        Matt Massie added a comment -

        I think the function in your patch could be a good addition to the C API regardless

        avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) {
          return reader->writers_schema;
        }
        

        should probably look like

        avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) {
          return reader? reader->writers_schema: NULL;
        }
        

        in case a NULL reader is passed to the function. Also, please add the new method to avro.h.

        Show
        Matt Massie added a comment - I think the function in your patch could be a good addition to the C API regardless avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) { return reader->writers_schema; } should probably look like avro_schema_t avro_schema_from_file_reader(avro_file_reader_t reader) { return reader? reader->writers_schema: NULL; } in case a NULL reader is passed to the function. Also, please add the new method to avro.h .
        Hide
        Bruce Mitchener added a comment -

        I'd suggest this signature:

        avro_schema_t avro_file_reader_get_schema(avro_file_reader_t reader)

        Show
        Bruce Mitchener added a comment - I'd suggest this signature: avro_schema_t avro_file_reader_get_schema(avro_file_reader_t reader)
        Hide
        Jeff Hodges added a comment -

        Using NULL works great. Would definitely still be nice so I can pass around the schema object for various usages.

        Show
        Jeff Hodges added a comment - Using NULL works great. Would definitely still be nice so I can pass around the schema object for various usages.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jeff Hodges
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development