Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3005

Deserialization of string with > 256 characters fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.10.1
    • 1.10.2
    • csharp
    • None

    Description

      Avro.IO.BinaryDecoder.ReadString() fails for strings with length > 256, i.e. when the StackallocThreshold is exceeded. 

      This can be seen when serializing and subsequently deserializing a GenericRecord of schema 

      {
        "type": "record",
        "name": "Foo",
        "fields": [
          { "name": "x", "type": "string" }
        ]
      }

      with a field x containing a string of length > 256, as done in the test case Test(257):

      public void Test(int n)
      {
          var schema = (RecordSchema) Schema.Parse("{ \"type\":\"record\", \"name\":\"Foo\",\"fields\":[{\"name\":\"x\",\"type\":\"string\"}]}");
                  
          var datum = new GenericRecord(schema);            
          datum.Add("x", new String('x', n));
          byte[] serialized;
          using (var ms = new MemoryStream())
          {
              var enc = new BinaryEncoder(ms);
              var writer = new GenericDatumWriter<GenericRecord>(schema);
              writer.Write(datum, enc);                
              serialized = ms.ToArray();
          }
      
          using (var ms = new MemoryStream(serialized))
          {
              var dec = new BinaryDecoder(ms);
              var deserialized = new GenericRecord(schema);
              var reader = new GenericDatumReader<GenericRecord>(schema, schema);
              reader.Read(deserialized, dec);
              Assert.Equal(datum, deserialized);
          }
      }

      which yields the following exception

      Avro.AvroException
      End of stream reached
         at Avro.IO.BinaryDecoder.Read(Span`1 buffer)
         at Avro.IO.BinaryDecoder.ReadString()
         at Avro.Generic.PreresolvingDatumReader`1.<>c.<ResolveReader>b__21_1(Decoder d)
         at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass37_0.<Read>b__0(Object r, Decoder d)
         at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_1.<ResolveRecord>b__2(Object rec, Decoder d)
         at Avro.Generic.PreresolvingDatumReader`1.ReadRecord(Object reuse, Decoder decoder, RecordAccess recordAccess, IEnumerable`1 readSteps)
         at Avro.Generic.PreresolvingDatumReader`1.<>c__DisplayClass23_0.<ResolveRecord>b__0(Object r, Decoder d)
         at Avro.Generic.PreresolvingDatumReader`1.Read(T reuse, Decoder decoder)
         at AvroTests.AvroTests.Test(Int32 n) in C:\Users\l.heimberg\Source\Repos\AvroTests\AvroTests\AvroTests.cs:line 41
      

      The reason seems to be the following: when a string of length <= StackallocThreshold (=256) is read, a buffer, to read the content of the string from the stream into, is allocated on the stack with the exact length of the string. If the length is > StackallocThreshold, the buffer is obtained from ArrayPool<byte>.Shared.Rent(length), which returns a buffer of minimum length 'length', but possibly also a larger buffer.

      The Read(Span<byte> buffer) method is used to read the content of the string from the input stream. The method always tries to read as much bytes from the input stream as this buffer has length, and in particular will fail with the exception shown above when the stream does not have enough data anymore. Thus, if the string has expected length > StackallocThreshold and the buffer obtained from ArrayPool<byte>.Shared.Rent(length) has size > length, the Read method will either throw the above AvroException (when the string is the last element in the stream) or will already consume parts of following data items in the stream, in any case causing corruption.

      The provided patch turns the byte array returned by the ArrayPool into a Span with the correct length using the Splice method, instead of casting it implicitly to Span<byte>.

       

      Possiby related: https://github.com/confluentinc/confluent-kafka-dotnet/issues/1398#issuecomment-748171083

       

      Attachments

        1. AVRO-3005.patch
          3 kB
          Lucas Heimberg

        Issue Links

          Activity

            People

              Unassigned Unassigned
              l.heimberg Lucas Heimberg
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: