Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1422

JSON-deserialization of recursively defined record causes stack overflow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7.5
    • None
    • java
    • Linux (but it doesn't matter because it's Java).

    Description

      A schema defined like this:

      recursiveSchema.avsc
      {"type": "record",
       "name": "RecursiveRecord",
       "fields": [
         {"name": "child", "type": "RecursiveRecord"}
       ]}
      

      results in an infinite loop/stack overflow when ingesting JSON that looks like {{

      {"child": null}

      }} or {"child": {"null": null}}. For instance, I can compile and load the schema into a Scala REPL and then cause the error when trying to read in the JSON, like this:

      command-line-1
      java -jar avro-tools-1.7.5.jar compile schema recursiveSchema.avsc .
      javac RecursiveRecord.java -cp avro-tools-1.7.5.jar
      scala -cp avro-tools-1.7.5.jar:.
      
      scala-repl-specific-1
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.Schema;
      import org.apache.avro.specific.SpecificDatumReader;
      
      var output: RecursiveRecord = new RecursiveRecord();
      val schema: Schema = output.getSchema();
      val reader: SpecificDatumReader[RecursiveRecord] = new SpecificDatumReader[RecursiveRecord](schema);
      output = reader.read(output, DecoderFactory.get().jsonDecoder(schema, """{"child": null}"""));
      output = reader.read(output, DecoderFactory.get().jsonDecoder(schema, """{"child": {"null": null}}"""));
      

      The same is true if I attempt to load it into a generic object:

      scala-repl-generic-1
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.Schema;
      import org.apache.avro.generic.GenericDatumReader;
      
      val parser = new Schema.Parser();
      val schema: Schema = parser.parse("""{"type": "record", "name": "RecursiveRecord", "fields": [{"name": "child", "type": "RecursiveRecord"}]}""");
      val reader: GenericDatumReader[java.lang.Object] = new GenericDatumReader[java.lang.Object](schema);
      val output = reader.read(null, DecoderFactory.get().jsonDecoder(schema, """{"child": null}"""));
      val output = reader.read(null, DecoderFactory.get().jsonDecoder(schema, """{"child": {"null": null}}"""));
      

      In all cases, it is the reader.read calls that cause stack overflows (all four of the ones described above). The stack trace is apparently truncated, but what is shown repeats these two lines until cut off by the JVM:

      stack-trace
              at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:324)
              at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:217)
      

      The same is not true if we (correctly?) declare the child as a union of null and a recursive record. For instance,

      recursiveSchema2.avsc
      {"type": "record",
       "name": "RecursiveRecord2",
       "fields": [
         {"name": "child", "type": ["RecursiveRecord2", "null"]}
       ]}
      
      command-line-2
      java -jar avro-tools-1.7.5.jar compile schema recursiveSchema2.avsc .
      javac RecursiveRecord2.java -cp avro-tools-1.7.5.jar
      scala -cp avro-tools-1.7.5.jar:.
      
      scala-repl-specific-2
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.Schema;
      import org.apache.avro.specific.SpecificDatumReader;
      
      var output: RecursiveRecord2 = new RecursiveRecord2();
      val schema: Schema = output.getSchema();
      val reader: SpecificDatumReader[RecursiveRecord2] = new SpecificDatumReader[RecursiveRecord2](schema);
      output = reader.read(output, DecoderFactory.get().jsonDecoder(schema, """{"child": null}"""));
      output = reader.read(output, DecoderFactory.get().jsonDecoder(schema, """{"child": {"null": null}}"""));
      
      scala-repl-generic-2
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.Schema;
      import org.apache.avro.generic.GenericDatumReader;
      
      val parser = new Schema.Parser()
      val schema: Schema = parser.parse("""{"type": "record", "name": "RecursiveRecord2", "fields": [{"name": "child", "type": ["RecursiveRecord2", "null"]}]}""");
      val reader: GenericDatumReader[java.lang.Object] = new GenericDatumReader[java.lang.Object](schema);
      val output = reader.read(null, DecoderFactory.get().jsonDecoder(schema, """{"child": null}"""));
      val output = reader.read(null, DecoderFactory.get().jsonDecoder(schema, """{"child": {"null": null}}"""));
      

      For both specific and generic, RecursiveRecord2 works properly: it produces an object with recursive type and child == null.

      My understanding of the official schema is that only RecursiveRecord2 should be allowed to have a null child, so the JSON I supplied would not have been valid input for RecursiveRecord. (If so, then it wouldn't even be possible to give it valid finite input.) However, it should give a different error than a stack overflow, something to tell me that {{

      {"child": null}

      }} is not legal unless field child is declared as a union that includes null.

      The reason one might want this (recursively defined types) is to make trees. The example I gave had only one child for simplicity (i.e. it was a linked list), but the error would apply to binary trees as well. For instance, here's a three-node list (a little cumbersome in JSON):

      motivating-example
      {"child": {"RecursiveRecord2": {"child": {"RecursiveRecord2": {"child": null}}}}}
      

      I haven't tested this in Avro deserialization (which would be a more reasonable use-case), but I don't know of a way to generate the Avro-encoded data without first getting it from human-typable JSON. (I'm not constructing the Avro byte stream by hand.)

      Attachments

        Activity

          People

            Unassigned Unassigned
            jpivarski Jim Pivarski
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: