Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-4055

[rust] schema parsing invalid with nested records

    XMLWordPrintableJSON

Details

    Description

      Current state
      Rust parses the following schema correctly, without raising any errors, but the schema (I believe) is invalid

      {
        "type": "record",
        "name": "SampleSchema",
        "fields": [
          {
            "name": "order",
            "type": "record",
            "fields": [
              {
                "name": "order_number",
                "type": ["null", "string"],
                "default": null
              },
              { "name": "order_date", "type": "string" }
            ]
          }
        ]
      }
      

      Desired state
      Rust returns an error with the previous schema

      What would a correct schema look like?

      Notice in this schema, the record has a "type", which itself has a record with "type" and "fields".

      {
        "type": "record",
        "name": "SampleSchema",
        "fields": [
          {
            "name": "order",
            "type": {
              "type": "record",
              "name": "Order",
              "fields": [
                {
                  "name": "order_number",
                  "type": ["null", "string"],
                  "default": null
                },
                { "name": "order_date", "type": "string" }
              ]
            }
          }
        ]
      }
      

      Sample code

      use apache_avro::Schema;
      
      let raw_schema = r#"
      {
        "type": "record",
        "name": "SampleSchema",
        "fields": [
          {
            "name": "order",
            "type": "record",
            "fields": [
              {
                "name": "order_number",
                "type": ["null", "string"],
                "default": null
              },
              { "name": "order_date", "type": "string" }
            ]
          }
        ]
      }
      "#;
      
      // if the schema is not valid, this function will return an error
      let schema = Schema::parse_str(raw_schema).unwrap();
      
      // schemas can be printed for debugging
      println!("{:?}", schema);
      

      Why is this important? Other tools like in Java are not able to parse this schema, making compatibility between different languages harder.

      We've had issues using `avro-tools` to build the jars. We get the following error:

      Exception in thread "main" org.apache.avro.SchemaParseException: "record" is not a defined name. The type of the "order" field must be a defined name or a {"type": ...} expression.
      at org.apache.avro.Schema.parse(Schema.java:1734)
      at org.apache.avro.Schema$Parser.parse(Schema.java:1471)
      at org.apache.avro.Schema$Parser.parse(Schema.java:1433)
      at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:154)
      at org.apache.avro.tool.Main.run(Main.java:67)
      at org.apache.avro.tool.Main.main(Main.java:56)
      

      I can try to fix it, let me know if you want me to send a PR.

      Discussion

      • Is this a bug on Rust or on Java?
      • Can the avro spec documentation be updated to explain how to nest records?

      Regards

      Attachments

        Issue Links

          Activity

            People

              woile Santiago Fraire Willemoes
              woile Santiago Fraire Willemoes
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m