Avro
  1. Avro
  2. AVRO-583

Bad error message if you try and name an array or a map: org.apache.avro.SchemaParseException: Undefined name: "map"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3.2
    • Fix Version/s: 1.4.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Error message should be something like "You can't name a map"

      Broken schema:
      {
      "name":"annoyance",
      "type":"record",
      "fields":[

      { "name":"mymap", "type":"map", "values":"string" }

      ]
      }

      Error:

      org.apache.avro.SchemaParseException: Undefined name: "map"
      at org.apache.avro.Schema.parse(Schema.java:876)
      at org.apache.avro.Schema.parse(Schema.java:912)
      at org.apache.avro.Schema.parse(Schema.java:796)

      1. AVRO-583.patch
        0.9 kB
        Doug Cutting

        Activity

        Hide
        Doug Cutting added a comment -

        I just committed this.

        Show
        Doug Cutting added a comment - I just committed this.
        Hide
        Doug Cutting added a comment -

        Here's a patch that improves the error message as I proposed above.

        Show
        Doug Cutting added a comment - Here's a patch that improves the error message as I proposed above.
        Hide
        Sam Pullara added a comment -

        Ah, you are right. Was again confusing defining the type name with defining the field name. One breaking change that would probably fix this would be to change the "name" field for a field to "field" or something similar. Then you could have:

        {"field": "myfield", "type": "record", "name": "myrecord", "fields":[...] } {"field": "myfield", "type": "array", "items":"string"} {"field": "myfield", "type": "map", "values":"string"}
        Show
        Sam Pullara added a comment - Ah, you are right. Was again confusing defining the type name with defining the field name. One breaking change that would probably fix this would be to change the "name" field for a field to "field" or something similar. Then you could have: {"field": "myfield", "type": "record", "name": "myrecord", "fields":[...] } {"field": "myfield", "type": "array", "items":"string"} {"field": "myfield", "type": "map", "values":"string"}
        Hide
        Scott Carey added a comment -

        "map" is not a defined name. Type of "mymap" field must be a defined name or a {"type": ...} expression.

        +1 Yes, I think that would prevent beginners from looking in the wrong place for the error in their schema.

        Show
        Scott Carey added a comment - "map" is not a defined name. Type of "mymap" field must be a defined name or a {"type": ...} expression. +1 Yes, I think that would prevent beginners from looking in the wrong place for the error in their schema.
        Hide
        Doug Cutting added a comment -

        > It isn't clear to me why they are treated differently.

        I don't see how they are treated differently. Can you provide some examples?

        Show
        Doug Cutting added a comment - > It isn't clear to me why they are treated differently. I don't see how they are treated differently. Can you provide some examples?
        Hide
        Sam Pullara added a comment -

        Actually I was proposing you treat "map" and "array" just like "record". In the same way "record" can have "fields", "map" can have "values" and "array" can have "items". It isn't clear to me why they are treated differently.

        Show
        Sam Pullara added a comment - Actually I was proposing you treat "map" and "array" just like "record". In the same way "record" can have "fields", "map" can have "values" and "array" can have "items". It isn't clear to me why they are treated differently.
        Hide
        Doug Cutting added a comment -

        > The problem I have with those error messages is that "map" is the name of a defined type.

        The problem in part is that terms like "type", "name" and "defined" are overloaded here. "map" is a keyword, like "type", and cannot alone be used to determine the type of a field. The type of a field can either be a JSON string, naming a previously defined type, or a JSON object.

        > My deeper question is why this doesn't work? Is there a real problem naming and defining a map/array without nesting?

        So you propose that, in addition to permitting nested type expressions we permit flattened type expressions in fields. That's possible. How would metadata be handled? For example, if someone put "foo":"bar" in a field, would that be a property of the Field, of its Schema, or both? Worse yet, how about:

        {"name": "myfield", "type": "record", "name": "myrecord", "fields":[...] }

        Oops. That has two names! JSON objects are unordered, so we can't tell them apart!

        Show
        Doug Cutting added a comment - > The problem I have with those error messages is that "map" is the name of a defined type. The problem in part is that terms like "type", "name" and "defined" are overloaded here. "map" is a keyword, like "type", and cannot alone be used to determine the type of a field. The type of a field can either be a JSON string, naming a previously defined type, or a JSON object. > My deeper question is why this doesn't work? Is there a real problem naming and defining a map/array without nesting? So you propose that, in addition to permitting nested type expressions we permit flattened type expressions in fields. That's possible. How would metadata be handled? For example, if someone put "foo":"bar" in a field, would that be a property of the Field, of its Schema, or both? Worse yet, how about: {"name": "myfield", "type": "record", "name": "myrecord", "fields":[...] } Oops. That has two names! JSON objects are unordered, so we can't tell them apart!
        Hide
        Sam Pullara added a comment -

        The problem I have with those error messages is that "map" is the name of a defined type. It just happens to be a special case. My deeper question is why this doesn't work? Is there a real problem naming and defining a map/array without nesting?

        Show
        Sam Pullara added a comment - The problem I have with those error messages is that "map" is the name of a defined type. It just happens to be a special case. My deeper question is why this doesn't work? Is there a real problem naming and defining a map/array without nesting?
        Hide
        Doug Cutting added a comment -

        Would it also help to give the name of the field? So the error might then be:

        "map" is not a defined name. Type of "mymap" field must be a defined name or a

        {"type": ...}

        expression.

        Show
        Doug Cutting added a comment - Would it also help to give the name of the field? So the error might then be: "map" is not a defined name. Type of "mymap" field must be a defined name or a {"type": ...} expression.
        Hide
        Doug Cutting added a comment -

        Maybe we could change this error message to something like:

        X is not a defined name. Field type must be a defined name or a

        {"type": ...}

        expression.

        Is that clearer?

        Show
        Doug Cutting added a comment - Maybe we could change this error message to something like: X is not a defined name. Field type must be a defined name or a {"type": ...} expression. Is that clearer?
        Hide
        Scott Carey added a comment -

        This error message confused me as well when first learning Avro. The message led me in circles and does not point to what is wrong unless you already have a deeper understanding of Avro.

        Note that what is wrong here is the type , not the name, and to a beginner the first 'gut' response to that message is 'I set the name to "mymap", not "map", wtf?'.

        If it said 'invalid name for type identifier, "map"' it would be better, or ideally something more informative about what is valid in that location and why.

        The confusion for a beginner is that you see examples with fields like:

        {"name": "myfield", "type": "string" }

        and so it makes sense to think that other built-in types work the same way, and so

        {"name": "mymap", "type": "map", "values": "string"}

        would work.

        The current error message does nothing to inform the beginner that "map" and "array" are different than "string" and "int" etc. one is a named type and the other is not.

        I'm not sure how much context we have when this error is thrown to be more informative, but it should be easy to do better than the current message.

        Show
        Scott Carey added a comment - This error message confused me as well when first learning Avro. The message led me in circles and does not point to what is wrong unless you already have a deeper understanding of Avro. Note that what is wrong here is the type , not the name, and to a beginner the first 'gut' response to that message is 'I set the name to "mymap", not "map", wtf?'. If it said 'invalid name for type identifier, "map"' it would be better, or ideally something more informative about what is valid in that location and why. The confusion for a beginner is that you see examples with fields like: {"name": "myfield", "type": "string" } and so it makes sense to think that other built-in types work the same way, and so {"name": "mymap", "type": "map", "values": "string"} would work. The current error message does nothing to inform the beginner that "map" and "array" are different than "string" and "int" etc. one is a named type and the other is not. I'm not sure how much context we have when this error is thrown to be more informative, but it should be easy to do better than the current message.
        Hide
        Doug Cutting added a comment -

        Is there a bug here to fix, or should we just close this?

        Show
        Doug Cutting added a comment - Is there a bug here to fix, or should we just close this?
        Hide
        Sam Pullara added a comment -

        Yeah, that is the schema I intended but it took me a while to figure out why it was failing.

        Show
        Sam Pullara added a comment - Yeah, that is the schema I intended but it took me a while to figure out why it was failing.
        Hide
        Doug Cutting added a comment -

        The bug in that schema is that, when a string is used as a type it's assumed to be a type name.

        I think the schema you intended is:

        {
          "name":"annoyance",
          "type":"record",
          "fields":[
            {"name":"mymap", "type": {"type": "map", "values":"string"}}
          ]
        }
        

        That should parse.

        Maybe we should prohibit the use of non-primitive type keywords as user type names, e.g., "map", "record", "enum", "fixed", and "array" could be made reserved words. That would permit us to generate a better error here.

        Show
        Doug Cutting added a comment - The bug in that schema is that, when a string is used as a type it's assumed to be a type name. I think the schema you intended is: { "name" : "annoyance" , "type" : "record" , "fields" :[ { "name" : "mymap" , "type" : { "type" : "map" , "values" : "string" }} ] } That should parse. Maybe we should prohibit the use of non-primitive type keywords as user type names, e.g., "map", "record", "enum", "fixed", and "array" could be made reserved words. That would permit us to generate a better error here.

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Sam Pullara
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development