Avro
  1. Avro
  2. AVRO-891

Change SpecificDatumReader to default reader schema from loaded class

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.4
    • Fix Version/s: 1.6.0
    • Component/s: java
    • Labels:
      None
    • Environment:

      OSX 10.7

      Description

      An AvroRuntimeException exception is thrown when attempting to read an Avro file serialized with an older version of a schema containing a field which has been subsequently removed in the newer schema.

      Exception

      Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
      	at Record.put(Unknown Source)
      	at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
      	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
      	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
      	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
      	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
      	at Read.readFromAvro(Unknown Source)
      	at Read.main(Unknown Source)
      

      Steps to reproduce

      1. Generate code for schema v1 and v2
      2. Write an Avro file with the v1 code-generated Record class using the DataFileWriter and SpecificDatumWriter
      3. (informational only) Read the Avro file using the v1 code-generated Record class using DataFileStream and SpecificDatumReader (output follows)
        Record@2ec791b9[name=r1,id=1]
        Record@bd86fd3[name=r2,id=2]
        
      4. Read the Avro file using the v2 code-generated Record class using DataFileStream and SpecificDatumReader

      Schema details

      v1 schema:

      {"name": "Record", "type": "record",
        "fields": [
          {"name": "name", "type": "string"},
          {"name": "id", "type": "int"}
        ]
      }
      

      v2 schema:

      {"name": "Record", "type": "record",
        "fields": [
          {"name": "name", "type": "string"}
        ]
      }
      

      Write code

        public static Record createRecord(String name, int id) {
          Record record = new Record();
          record.name = name;
          record.id = id;
          return record;
        }
      
        public static void writeToAvro(OutputStream outputStream)
            throws IOException {
          DataFileWriter<Record> writer =
              new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
          writer.create(Record.SCHEMA$, outputStream);
      
          writer.append(createRecord("r1", 1));
          writer.append(createRecord("r2", 2));
      
          writer.close();
          outputStream.close();
        }
      

      Read code

        public static void readFromAvro(InputStream is) throws IOException {
          DataFileStream<Record> reader = new DataFileStream<Record>(
                  is, new SpecificDatumReader<Record>());
          for (Record a : reader) {
            System.out.println(ToStringBuilder.reflectionToString(a));
          }
          IOUtils.cleanup(null, is);
          IOUtils.cleanup(null, reader);
        }
      
      
      1. AVRO-891.patch
        5 kB
        Doug Cutting
      2. AVRO-891.patch
        2 kB
        Doug Cutting

        Activity

        Hide
        Doug Cutting added a comment -

        When you read you should use 'new SpecificDatumReader<Record>(Record.class)' to pass the new schema to the reader. Otherwise it assumes the same schema that's used to write should be used to read. Does that help?

        Show
        Doug Cutting added a comment - When you read you should use 'new SpecificDatumReader<Record>(Record.class)' to pass the new schema to the reader. Otherwise it assumes the same schema that's used to write should be used to read. Does that help?
        Hide
        Scott Carey added a comment - - edited

        I should have caught that before.

        We may need better Javadoc and some example uses in the doc, but this definitely works for me and has worked for a long time — but I use the constructor that passes in the class.

        Show
        Scott Carey added a comment - - edited I should have caught that before. We may need better Javadoc and some example uses in the doc, but this definitely works for me and has worked for a long time — but I use the constructor that passes in the class.
        Hide
        Doug Cutting added a comment -

        Here are two ways we might avoid this:

        1. Make the default constructor for SpecificDatumReader protected instead of public. That would force folks to always provide a schema and would be incompatible.
        2. Change SpecificDatumReader#setSchema() to, when the expected schema is null, set it to getSpecificData().getClass(actual).getSchema(). That would get the schema with the same name as the one in the file from the currently loaded class.

        I prefer the second option.

        Show
        Doug Cutting added a comment - Here are two ways we might avoid this: Make the default constructor for SpecificDatumReader protected instead of public. That would force folks to always provide a schema and would be incompatible. Change SpecificDatumReader#setSchema() to, when the expected schema is null, set it to getSpecificData().getClass(actual).getSchema(). That would get the schema with the same name as the one in the file from the currently loaded class. I prefer the second option.
        Hide
        Doug Cutting added a comment -

        Here's a patch that implements (2) above. This should make things like the code in the description of this issue just work.

        Show
        Doug Cutting added a comment - Here's a patch that implements (2) above. This should make things like the code in the description of this issue just work.
        Hide
        Doug Cutting added a comment -

        Here's an improved patch that includes a test.

        Show
        Doug Cutting added a comment - Here's an improved patch that includes a test.
        Hide
        Doug Cutting added a comment -

        I'll commit this tomorrow if no one objects.

        Show
        Doug Cutting added a comment - I'll commit this tomorrow if no one objects.
        Hide
        Doug Cutting added a comment -

        I committed this.

        Show
        Doug Cutting added a comment - I committed this.

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Alex Holmes
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development