Avro
  1. Avro
  2. AVRO-1449

Check default value's type matches union at schema parse time

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.6
    • Fix Version/s: 1.7.7
    • Component/s: java
    • Labels:
      None

      Description

      A common Avro mistake is to declare optional fields as follows:

      "type": ["string", "null"], "default": null
      

      This fails at runtime since the default value's type (null) doesn't match the type of the first type in the union (string). The correct declaration is:

      "type": ["null", "string"], "default": null
      

      Doug suggested that we throw an exception at schema parse time for cases like this. To ensure backwards compatibility we could make it optional; tools could emit a warning.

      1. AVRO-1449.patch
        8 kB
        Doug Cutting
      2. AVRO-1449.patch
        10 kB
        Tom White

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        6d 8h 55m 1 Doug Cutting 05/Feb/14 00:10
        Resolved Resolved Closed Closed
        169d 20h 5m 1 Doug Cutting 24/Jul/14 21:16
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        FAILURE: Integrated in AvroJava #438 (See https://builds.apache.org/job/AvroJava/438/)
        AVRO-1449. Java: Optionally validate default values while reading schemas. (cutting: rev 1564571)

        • /avro/trunk/CHANGES.txt
        • /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java
        • /avro/trunk/lang/java/compiler/src/test/idl/input/interop.avdl
        • /avro/trunk/lang/java/compiler/src/test/idl/input/simple.avdl
        • /avro/trunk/lang/java/compiler/src/test/idl/output/interop.avpr
        • /avro/trunk/lang/java/compiler/src/test/idl/output/simple.avpr
        • /avro/trunk/lang/java/ipc/src/test/java/org/apache/avro/TestSchema.java
        • /avro/trunk/share/test/data/schema-tests.txt
        Show
        Hudson added a comment - FAILURE: Integrated in AvroJava #438 (See https://builds.apache.org/job/AvroJava/438/ ) AVRO-1449 . Java: Optionally validate default values while reading schemas. (cutting: rev 1564571) /avro/trunk/CHANGES.txt /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java /avro/trunk/lang/java/compiler/src/test/idl/input/interop.avdl /avro/trunk/lang/java/compiler/src/test/idl/input/simple.avdl /avro/trunk/lang/java/compiler/src/test/idl/output/interop.avpr /avro/trunk/lang/java/compiler/src/test/idl/output/simple.avpr /avro/trunk/lang/java/ipc/src/test/java/org/apache/avro/TestSchema.java /avro/trunk/share/test/data/schema-tests.txt
        Doug Cutting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Doug Cutting [ cutting ]
        Fix Version/s 1.7.7 [ 12326041 ]
        Resolution Fixed [ 1 ]
        Hide
        Doug Cutting added a comment -

        I committed this.

        Show
        Doug Cutting added a comment - I committed this.
        Hide
        ASF subversion and git services added a comment -

        Commit 1564571 from Doug Cutting in branch 'avro/trunk'
        [ https://svn.apache.org/r1564571 ]

        AVRO-1449. Java: Optionally validate default values while reading schemas.

        Show
        ASF subversion and git services added a comment - Commit 1564571 from Doug Cutting in branch 'avro/trunk' [ https://svn.apache.org/r1564571 ] AVRO-1449 . Java: Optionally validate default values while reading schemas.
        Hide
        Doug Cutting added a comment -

        Your improvements look great. +1 I'll commit this soon unless someone objects or commits it first.

        Show
        Doug Cutting added a comment - Your improvements look great. +1 I'll commit this soon unless someone objects or commits it first.
        Tom White made changes -
        Attachment AVRO-1449.patch [ 12626092 ]
        Hide
        Tom White added a comment -

        Thanks for writing a patch Doug. Overall it looks good. I think logging a warning to standard error is reasonable. In the future we could add an option to the tools to enforce strict validation of defaults, then in a later release make that the default.

        Interestingly we don’t have to do anything with schema builder, since the API already enforces the constraint. (Nice work Scott!)

        … .type().unionOf().stringType().and().nullType().endUnion().nullDefault() // fails to compile
        … .type().unionOf().nullType().and().stringType().endUnion().nullDefault() // works
        … .type().optional().stringType() // usual shorthand
        

        There were a couple of small errors in the patch where the validateDefaults field wasn't getting set, and VALIDATE_DEFAULTS wasn't being reset to the right value. I've attached a new patch fixing that, and also adding a unit test: TestSchema already had a test for catching invalid defaults at runtime so I extended it to test for parse-time failures when validation is turned on.

        Show
        Tom White added a comment - Thanks for writing a patch Doug. Overall it looks good. I think logging a warning to standard error is reasonable. In the future we could add an option to the tools to enforce strict validation of defaults, then in a later release make that the default. Interestingly we don’t have to do anything with schema builder, since the API already enforces the constraint. (Nice work Scott!) … .type().unionOf().stringType().and().nullType().endUnion().nullDefault() // fails to compile … .type().unionOf().nullType().and().stringType().endUnion().nullDefault() // works … .type().optional().stringType() // usual shorthand There were a couple of small errors in the patch where the validateDefaults field wasn't getting set, and VALIDATE_DEFAULTS wasn't being reset to the right value. I've attached a new patch fixing that, and also adding a unit test: TestSchema already had a test for catching invalid defaults at runtime so I extended it to test for parse-time failures when validation is turned on.
        Doug Cutting made changes -
        Field Original Value New Value
        Attachment AVRO-1449.patch [ 12625992 ]
        Hide
        Doug Cutting added a comment -

        Here's a patch that implements this. The current test suite tests it some, but we should probably add more tests. I also patched a few schemas that it found to be in error.

        The validity checking itself is straightforward. Enabling and disabling it is not.

        Doing it at parse time isn't good enough, since schemas are often created directly with the Schema API (e.g., IDL & builder). So the check needs to be in the Field constructor. But then to be able to turn checking on and off we need a global flag (yuk) that the constructor reads.

        The existing name validation uses a ThreadLocal<Boolean> as such a global flag. Schema.Parser sets this around calls to parse(). By default name validation is on and is disabled when parsing schemas from data files.

        For default values, we want validation off by default. I used the same ThreadLocal mechanism to permit Schema.Parser to enable or disable it. As with name validation, I added no mechanism to enable checking outside of Schema.Parser.

        However I think we should do something by default to let folks know there's a problem with their schema. Throwing an exception would break lots of existing users. Logging a warning might be appropriate. The avro core module however doesn't currently have a logging dependency, and we shouldn't add one here. So this patch prints warnings to System.err. Does that seem reasonable?

        Show
        Doug Cutting added a comment - Here's a patch that implements this. The current test suite tests it some, but we should probably add more tests. I also patched a few schemas that it found to be in error. The validity checking itself is straightforward. Enabling and disabling it is not. Doing it at parse time isn't good enough, since schemas are often created directly with the Schema API (e.g., IDL & builder). So the check needs to be in the Field constructor. But then to be able to turn checking on and off we need a global flag (yuk) that the constructor reads. The existing name validation uses a ThreadLocal<Boolean> as such a global flag. Schema.Parser sets this around calls to parse(). By default name validation is on and is disabled when parsing schemas from data files. For default values, we want validation off by default. I used the same ThreadLocal mechanism to permit Schema.Parser to enable or disable it. As with name validation, I added no mechanism to enable checking outside of Schema.Parser. However I think we should do something by default to let folks know there's a problem with their schema. Throwing an exception would break lots of existing users. Logging a warning might be appropriate. The avro core module however doesn't currently have a logging dependency, and we shouldn't add one here. So this patch prints warnings to System.err. Does that seem reasonable?
        Tom White created issue -

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development