Avro
  1. Avro
  2. AVRO-530

allow for mutual recursion in type definitions

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3.2
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      Suppose you have these two types in your protocol:

      {"name": "User", "type": "record", "fields": [{"name": "current_status", "type": "Status"}]}
      
      {"name": "Status", "type": "record", "fields": [{"name": "author", "type": "User"}]}
      

      This will raise an error! The current workaround is to define one of them at their first usage. Like:

      {"name": "User", "type": "record", "fields": [{"name": "current_status", "type": {"name": "Status", "type": "record", "fields": [.. lots of fields ...]}]}
      

      But this is incredibly unwieldy. It would be really nice for the spec to require all the parsers to allow for mutual recursion, instead. It could be done by implementing a two-pass parser. One pass to acquire names referenced, and a second to fill in those names with their appropriate references.

        Activity

        Hide
        Jeff Hodges added a comment -

        (It could be done in one pass, of course. That was just an off-the-cuff idea that I thought would be easier to write but might not be.)

        Show
        Jeff Hodges added a comment - (It could be done in one pass, of course. That was just an off-the-cuff idea that I thought would be easier to write but might not be.)
        Hide
        Scott Carey added a comment -

        In the past, it has been suggested that simplifications like the above or shortcuts to make managing types across different schema/protocol files be handled at the genavro layer.

        If genavro evolves enough, all the user facing, ease of use stuff for managing schemas could go there, while the individual language implementations can all be as simple as possible and only use the JSON.

        Show
        Scott Carey added a comment - In the past, it has been suggested that simplifications like the above or shortcuts to make managing types across different schema/protocol files be handled at the genavro layer. If genavro evolves enough, all the user facing, ease of use stuff for managing schemas could go there, while the individual language implementations can all be as simple as possible and only use the JSON.
        Hide
        Jeff Hodges added a comment -

        Yeah, but now might be the time to change that! This is a known, well-understood, and solved parsing problem. We can totally do it!

        At my company, we're building lots of little services and its really, really handy to have the schema right up next to the code. We deal with a lot of thrift and while we understand the compilation step, it turns out to really nice to have the protocol Right There while you're working. Especially, if you're still tweaking it. We're hitting the desire/need for mutual recursion way before we're hitting a spot where genavro makes real sense for us. (Maybe others have different experiences?)

        And making mutual recursion would be a huge boon for anyone who doesn't want to deal with the genavro step, too. I know a lot of folks that have come on board just because we make it easy to play with Avro and push it to its limits quickly. I wouldn't want to take that away and say "use this thing that you'll have to make part of your build and then get all the file path stuff right and have to think" when all they want to do is see how far they can push it.

        In any case, I'm going to give this a go in the ruby implementation, and put up a patch. Maybe someone with a similar desire can do it in java or python and see how well it works?

        As the apps grow, genavro makes total sense! But I'd hate to limit wonderful, powerful ideas to it that are, while not trivial, totally doable at every level. How nice it is to know your idea will always work, no matter how you interact with avro.

        Show
        Jeff Hodges added a comment - Yeah, but now might be the time to change that! This is a known, well-understood, and solved parsing problem. We can totally do it! At my company, we're building lots of little services and its really, really handy to have the schema right up next to the code. We deal with a lot of thrift and while we understand the compilation step, it turns out to really nice to have the protocol Right There while you're working. Especially, if you're still tweaking it. We're hitting the desire/need for mutual recursion way before we're hitting a spot where genavro makes real sense for us. (Maybe others have different experiences?) And making mutual recursion would be a huge boon for anyone who doesn't want to deal with the genavro step, too. I know a lot of folks that have come on board just because we make it easy to play with Avro and push it to its limits quickly. I wouldn't want to take that away and say "use this thing that you'll have to make part of your build and then get all the file path stuff right and have to think" when all they want to do is see how far they can push it. In any case, I'm going to give this a go in the ruby implementation, and put up a patch. Maybe someone with a similar desire can do it in java or python and see how well it works? As the apps grow, genavro makes total sense! But I'd hate to limit wonderful, powerful ideas to it that are, while not trivial, totally doable at every level. How nice it is to know your idea will always work, no matter how you interact with avro.
        Hide
        Jeff Hodges added a comment -

        Oh, and I guess I screwed up my searches earlier. Where did we have this discussion at? Maybe I'm missing something important.

        Show
        Jeff Hodges added a comment - Oh, and I guess I screwed up my searches earlier. Where did we have this discussion at? Maybe I'm missing something important.
        Hide
        Scott Carey added a comment -

        Some of it was in this conversation:
        http://www.mail-archive.com/avro-dev@hadoop.apache.org/msg02397.html

        I agree the JSON could certainly be easier to use in several ways. But those are spec changes that are not backwards compatible. Maybe Avro 2.0?

        Show
        Scott Carey added a comment - Some of it was in this conversation: http://www.mail-archive.com/avro-dev@hadoop.apache.org/msg02397.html I agree the JSON could certainly be easier to use in several ways. But those are spec changes that are not backwards compatible. Maybe Avro 2.0?
        Hide
        Jeff Hodges added a comment -

        Yeah, I suppose. I'm happy to have a ticket for this.

        Show
        Jeff Hodges added a comment - Yeah, I suppose. I'm happy to have a ticket for this.
        Hide
        Cristian Opris added a comment -

        Any chance this could be revived ? The IDL still doesn't support self-referential types...

        Show
        Cristian Opris added a comment - Any chance this could be revived ? The IDL still doesn't support self-referential types...
        Hide
        Doug Cutting added a comment -

        Patches are welcome. The IDL parser lives in a file called idl.jj. Have at it.

        Show
        Doug Cutting added a comment - Patches are welcome. The IDL parser lives in a file called idl.jj. Have at it.
        Hide
        Cristian Opris added a comment -

        Is this supposed to work atm at the json schema representation level ? The following simple example results in StackOverflow even if there are comments in the Symbol class that it's supposed to handle recursive symbols:

        Code is in scala repl:

        val recType = new Schema.Parser().parse(""" {"type":"record", "name":"SelfRefType", "fields":[{"type": "SelfRefType", "name":"self"}]} """)
        
        recType: org.apache.avro.Schema = {"type":"record","name":"SelfRefType","fields":[{"name":"self","type":"SelfRefType"}]}
        
        
        val encoder = EncoderFactory.get().jsonEncoder(recType, Console.out)
        
        java.lang.StackOverflowError
        	at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:213)
        	at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:324)
        
        Show
        Cristian Opris added a comment - Is this supposed to work atm at the json schema representation level ? The following simple example results in StackOverflow even if there are comments in the Symbol class that it's supposed to handle recursive symbols: Code is in scala repl: val recType = new Schema.Parser().parse( """ {" type ":" record ", " name ":" SelfRefType ", " fields ":[{" type ": " SelfRefType ", " name ":" self "}]} " "") recType: org.apache.avro.Schema = { "type" : "record" , "name" : "SelfRefType" , "fields" :[{ "name" : "self" , "type" : "SelfRefType" }]} val encoder = EncoderFactory.get().jsonEncoder(recType, Console.out) java.lang.StackOverflowError at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:213) at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:324)
        Hide
        Doug Cutting added a comment -

        Yes, that's meant to work today. Perhaps there's a bug when a record has no other fields besides a self-reference. Please file a separate issue for this bug. Thanks!

        Show
        Doug Cutting added a comment - Yes, that's meant to work today. Perhaps there's a bug when a record has no other fields besides a self-reference. Please file a separate issue for this bug. Thanks!

          People

          • Assignee:
            Unassigned
            Reporter:
            Jeff Hodges
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development