Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1650

Avro deserialization fails depending on the value of integer/long fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • None
    • None

    Description

      Here is a test that fails depending on the value of the zipCode integer.

      public class TestBinaryDecoderSeparateSchema {
      
        @Test
        public void checkAvroWithoutEmbeddedSchema () throws Exception {
      
          log ("\n\n\nBeginning without-schema\n");
          Person datum = new Person();
      
          ReflectData rdata = ReflectData.AllowNull.get();
          Schema schema = rdata.getSchema(Person.class);
      
          // Write avro as binary
          ByteArrayOutputStream baos = new ByteArrayOutputStream();
          DatumWriter<Person> dout = new ReflectDatumWriter<Person>(Person.class, rdata);
          Encoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
          dout.write(datum, encoder);
          encoder.flush();
          byte[] bytes = baos.toByteArray();
          String binaryString = new String (bytes);
          log (binaryString);
      
          // Read avro binary string into GenericRecord
          BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(binaryString.getBytes(), null);
          GenericDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord> ();
          datumReader.setSchema(schema);
          GenericRecord record = datumReader.read(null, decoder);
          log ("Read zipCode = " + record.get("zipCode"));
        }
      
        @Test
        public void checkAvroWithEmbeddedSchema () throws Exception {
      
          log ("\n\n\nBeginning with-schema\n");
          Person datum = new Person();
      
          ReflectData rdata = ReflectData.AllowNull.get();
          Schema schema = rdata.getSchema(Person.class);
      
          // Write avro with embedded schema
          ByteArrayOutputStream baos = new ByteArrayOutputStream();
          ReflectDatumWriter<Person> dout = new ReflectDatumWriter<Person> (Person.class, rdata);
          DataFileWriter<Person> fileWriter = new DataFileWriter<Person> (dout);
          fileWriter.create(schema, baos);
          fileWriter.append(datum);
          fileWriter.close();
          byte[] bytes = baos.toByteArray();
          String binaryString = new String (bytes);
          log (binaryString);
      
          // Read avro with embedded schema
          GenericDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord> ();
          SeekableByteArrayInput avroInputStream = new SeekableByteArrayInput(bytes);
          DataFileReader<GenericRecord> fileReader =
                  new DataFileReader<GenericRecord>(avroInputStream, datumReader);
      
          schema = fileReader.getSchema();
          GenericRecord record = null;
          List<GenericRecord> records = new ArrayList<GenericRecord> ();
          while (fileReader.hasNext())
              records.add (fileReader.next(record));
      
          log ("Read " + records.size() + " records");
         log ("Read zipCode = " + records.get(0).get("zipCode"));
        }
      
        private static class Person {
          Integer zipCode = 90900;
        }
      
        private static void log (String s) {
          System.out.println (s);
        }
      }
      


      Issues:

      1. zipCode = 1, no exception but data zipCode is readWrong
      2. zipCode = 90900, exception in checkAvroWithoutEmbeddedSchema()

        java.io.IOException: Invalid int encoding
        at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:145)
        at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
        at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:444)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:159)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
        at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
        at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
        at org.apache.avro.io.TestBinaryDecoderSeparateSchema.checkAvroWithoutEmbeddedSchema(TestBinaryDecoderSeparateSchema.java:68)


      Am I even supposed to read/write like the way shown in checkAvroWithoutEmbeddedSchema()?

      Attachments

        Activity

          People

            Unassigned Unassigned
            sachingoyal Sachin Goyal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: