Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-764

Possible issue with BinaryData.compare(...) used in Map/Reduce

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.4.1
    • 1.5.0
    • java
    • Linux, CDH3

    • Allow the BinaryData.compare(...) utility to make use of the lengths as passed by the RawComparator in Map/Reduce.
    • avro, map/reduce, rawcomparator, binarydata, binarydecoder

    Description

      MapReduce's RawComparator feature has a call compare(b1, start1, length1, b2, start2, length2) which is handled for Avro using BinaryData.compare(b1, start1, b2, start2, schema)

      BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) utility, has a sub-clause that handles byte array inputs of less than 16 bytes in length.

      This is the exact code which am talking about:

      BinaryDecoder.java Lines 879-893
          private ByteArrayByteSource(byte[] data, int start, int len) {
            super();
            // make sure data is not too small, otherwise getLong may try and
            // read 10 bytes and get index out of bounds.
            if (data.length < 16 || len < 16) {
              this.data = new byte[16];
              System.arraycopy(data, start, this.data, 0, len);
              this.position = 0;
              this.max = len;
            } else {
              // use the array passed in
              this.data = data;
              this.position = start;
              this.max = start + len;
            }
          }
      

      This clause would fail during arrayCopy since the target len bytes is sent into the constructor as the length of the byte array input itself, and not the length as given by the framework which should be the right one [Since the BD.compare() call does not take it as a parameter]. Thus, arrayCopy would be trying to copy from start byte index to length bytes after it, where length is the byte array's .length itself – which leads to an index bounds exception.

      Here is a slightly low level test case that should fail with an ArrayIndexOutOfBoundsException due to a bad System.arrayCopy call:

      TestBinaryData.java
      package org.apache.avro.io;
      
      import java.io.ByteArrayOutputStream;
      import java.io.IOException;
      import java.util.ArrayList;
      import java.util.List;
      
      import org.apache.avro.Schema;
      import org.apache.avro.Schema.Field;
      import org.apache.avro.Schema.Type;
      import org.apache.avro.generic.GenericData.Record;
      import org.apache.avro.generic.GenericDatumWriter;
      import org.junit.Assert;
      import org.junit.Test;
      
      public class TestBinaryData {
      
        @Test
        public void testCompare() {
          // Prepare a schema for testing.
          Field integerField = new Field("test", Schema.create(Type.INT), null, null);
          List<Field> fields = new ArrayList<Field>();
          fields.add(integerField);
          Schema record = Schema.createRecord("test", null, null, false);
          record.setFields(fields);
          
          ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
          ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
          BinaryEncoder b1Enc = new BinaryEncoder(b1);
          BinaryEncoder b2Enc = new BinaryEncoder(b2);
          Record testDatum1 = new Record(record);
          testDatum1.put(0, 1);
          Record testDatum2 = new Record(record);
          testDatum2.put(0, 2);
          GenericDatumWriter<Record> gWriter = new GenericDatumWriter<Record>(record);
          Integer start1 = 0, start2 = 0;
          try {
            gWriter.write(testDatum1, b1Enc);
            b1Enc.flush();
            start1 = b1.size();
            gWriter.write(testDatum1, b1Enc);
            b1Enc.flush();
            b1.close();
            gWriter.write(testDatum2, b2Enc);
            b2Enc.flush();
            start2 = b2.size();
            gWriter.write(testDatum2, b2Enc);
            b2Enc.flush();
            b2.close();
            BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, record);
          } catch (IOException e) {
            Assert.fail("IOException while writing records to output stream.");
          }
        }
      }
      

      A solution would be to let the length, as given by the MapReduce framework itself, be used instead of using bytearray.length in its place. I'll attach a patch for this soon.

      Attachments

        Activity

          People

            qwertymaniac Harsh J
            qwertymaniac Harsh J
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5m
                5m