Avro
  1. Avro
  2. AVRO-1007

Insufficient validation in generated specific record builder implementations

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 1.6.2
    • Component/s: None
    • Labels:

      Description

      The are two main problems with the generated build() method in specific record builders:

      • For non-primitive types, if there is no default value and the user does not set the value, build() will execute successfully without throwing an exception
        • Instead, an AvroRuntimeException should be thrown with an exception message indicating the name of the required field that was not set
      • For primitive types, if there is no default value and the user does not set the value, an AvroRuntimeException is thrown with the 'cause' set to a NullPointerException, which is not very helpful
        • The NPE comes from attempting to set the primitive field to the result of defaultValue(), which is null
      1. AVRO-1007.patch
        3 kB
        James Baldassari
      2. AVRO-1007-v2.patch
        8 kB
        James Baldassari
      3. AVRO-1007-v3.patch
        11 kB
        James Baldassari
      4. AVRO-1007-v4.patch
        11 kB
        Scott Carey
      5. AVRO-1007.patch
        5 kB
        Doug Cutting
      6. AVRO-1007.patch
        6 kB
        Doug Cutting
      7. AVRO-1007.patch
        7 kB
        Doug Cutting

        Activity

        Hide
        James Baldassari added a comment -

        Here's a patch for record.vm along with some new test cases in TestSpecificRecordBuilder.

        Show
        James Baldassari added a comment - Here's a patch for record.vm along with some new test cases in TestSpecificRecordBuilder.
        Hide
        James Baldassari added a comment -

        Updated the patch to fix unit test failures in TestSpecificErrorBuilder and TestSpecificCompilerTool

        Show
        James Baldassari added a comment - Updated the patch to fix unit test failures in TestSpecificErrorBuilder and TestSpecificCompilerTool
        Hide
        Scott Carey added a comment -

        This looks good. Do we want to make this sort of change in 1.6.x? Users who have relied upon the builder API to create objects that are invalid and later set them to be valid will break. However, I don't think there are very many users of the builder API yet and we could better javadoc the behavior of build() to indicate the conditions that it will succeed and fail.

        Show
        Scott Carey added a comment - This looks good. Do we want to make this sort of change in 1.6.x? Users who have relied upon the builder API to create objects that are invalid and later set them to be valid will break. However, I don't think there are very many users of the builder API yet and we could better javadoc the behavior of build() to indicate the conditions that it will succeed and fail.
        Hide
        Scott Carey added a comment -

        This has a patch to review. Should this be fixed for 1.6.2, or only 1.7.0 ?

        Show
        Scott Carey added a comment - This has a patch to review. Should this be fixed for 1.6.2, or only 1.7.0 ?
        Hide
        James Baldassari added a comment -

        I just realized that this issue also affects the generic builder. Here's a third iteration of the patch (including new unit tests) that also improves validation in the generic builder

        Show
        James Baldassari added a comment - I just realized that this issue also affects the generic builder. Here's a third iteration of the patch (including new unit tests) that also improves validation in the generic builder
        Hide
        James Baldassari added a comment -

        These changes require regenerating/recompiling the specific record classes. However, all of these changes should be backwards-compatible with 1.6. So if this goes out in 1.6.2 and a user doesn't regenerate the specific records, everything should still work.

        Show
        James Baldassari added a comment - These changes require regenerating/recompiling the specific record classes. However, all of these changes should be backwards-compatible with 1.6. So if this goes out in 1.6.2 and a user doesn't regenerate the specific records, everything should still work.
        Hide
        Scott Carey added a comment -

        This patch (v4) rebases the (v3) version to apply clean to trunk @rev 1242036

        It works and passes tests with the change, and tests fail without the change.

        +1

        Thanks!

        Show
        Scott Carey added a comment - This patch (v4) rebases the (v3) version to apply clean to trunk @rev 1242036 It works and passes tests with the change, and tests fail without the change. +1 Thanks!
        Hide
        Doug Cutting added a comment -

        Seems like we're generating a lot of code that looks very similar. Let me see if I can simplify this...

        Show
        Doug Cutting added a comment - Seems like we're generating a lot of code that looks very similar. Let me see if I can simplify this...
        Hide
        Doug Cutting added a comment -

        Here's a version that just changes getDefault() to throw an exception when there's no default value and the field's type is not null. This is equivalent to the prior patch but with less generated code.

        Show
        Doug Cutting added a comment - Here's a version that just changes getDefault() to throw an exception when there's no default value and the field's type is not null. This is equivalent to the prior patch but with less generated code.
        Hide
        Doug Cutting added a comment -

        Here's a new version that also handles unions with null and adds a test case for that.

        Show
        Doug Cutting added a comment - Here's a new version that also handles unions with null and adds a test case for that.
        Hide
        Scott Carey added a comment -

        +1 to Doug's last patch. Tests fail without the one line change in RecordBuilderBase, and succeed with it.

        Show
        Scott Carey added a comment - +1 to Doug's last patch. Tests fail without the one line change in RecordBuilderBase, and succeed with it.
        Hide
        Scott Carey added a comment -

        Race condition: My last +1 was to the version prior to the union handling. Looking at the new one now.

        Show
        Scott Carey added a comment - Race condition: My last +1 was to the version prior to the union handling. Looking at the new one now.
        Hide
        Scott Carey added a comment -

        Should this work if any of the union branches are null, not just the first branch? Although the spec mentions that a union default applies to the first branch, there seems to be an implicit exception for null in the Java implementation. The following has always worked:

        {"type":"record", "name":"Rec", "fields":[

        {"name":"f", "type":["string", "null"], "default":null}

        ]}

        The BuilderAPI does not need to hold to the spec however, and if any branch is null, an unset field is ok IMO.

        Show
        Scott Carey added a comment - Should this work if any of the union branches are null, not just the first branch? Although the spec mentions that a union default applies to the first branch, there seems to be an implicit exception for null in the Java implementation. The following has always worked: {"type":"record", "name":"Rec", "fields":[ {"name":"f", "type":["string", "null"], "default":null} ]} The BuilderAPI does not need to hold to the spec however, and if any branch is null, an unset field is ok IMO.
        Hide
        Scott Carey added a comment -

        Slight clarification:

        The schema above has worked for me in the past, but I am not sure if it has always worked – that was too strong of a statement.

        Show
        Scott Carey added a comment - Slight clarification: The schema above has worked for me in the past, but I am not sure if it has always worked – that was too strong of a statement.
        Hide
        Doug Cutting added a comment -

        According to the spec, that schema is malformed. The default value of a union should always be interpreted as the first type in the union.

        I don't think this schema currently works when reading records that lack the field "f". See line 348 of ResovlingGrammarGenerator, where the default value is encoded using the first type in the union.

        I suppose we could change the spec to permit a null default value if any element of the union is null, but I don't see why we should. It makes the spec more complex and doesn't provide any additional expressive power.

        This patch would make the builder API enforce the spec, consistent with ResolvingDecoder.

        Show
        Doug Cutting added a comment - According to the spec, that schema is malformed. The default value of a union should always be interpreted as the first type in the union. I don't think this schema currently works when reading records that lack the field "f". See line 348 of ResovlingGrammarGenerator, where the default value is encoded using the first type in the union. I suppose we could change the spec to permit a null default value if any element of the union is null, but I don't see why we should. It makes the spec more complex and doesn't provide any additional expressive power. This patch would make the builder API enforce the spec, consistent with ResolvingDecoder.
        Hide
        Scott Carey added a comment -

        I am not sure I see how this makes the Builder API respect the spec. Does it allow the following?

        {"type":"record", "name":"Person", "fields":[
          {"name":"address", "type": ["null", "string"]}
        ]}
        
        Person.newBuilder().build(); // should error, address has no explicit default
        
        

        ?
        If so, to adhere to the spec that should throw an error since address was not set and has no specified default.

        Or, if the above is allowed, it is consistent to allow it regardless of the order in the union. Either we enforce that all fields without a specified default must be set, or allow for nullable fields to build without being set. Half way in between seems inconsistent. In short, either we interpret an unset field as an unset field that is only valid if there is a default, or we interpret it as a null, which is valid for null schemas or unions with any branch null.

        A related case is below:

        {"type":"record", "name":"Nothing", "fields":[
          {"name":"blank", "type": "null"}
        ]}
        
        Nothing.newBuilder().build(); // should error, blank not set and has no explicit default
        
        
        Show
        Scott Carey added a comment - I am not sure I see how this makes the Builder API respect the spec. Does it allow the following? {"type":"record", "name":"Person", "fields":[ {"name":"address", "type": ["null", "string"]} ]} Person.newBuilder().build(); // should error, address has no explicit default ? If so, to adhere to the spec that should throw an error since address was not set and has no specified default. Or, if the above is allowed, it is consistent to allow it regardless of the order in the union. Either we enforce that all fields without a specified default must be set, or allow for nullable fields to build without being set. Half way in between seems inconsistent. In short, either we interpret an unset field as an unset field that is only valid if there is a default, or we interpret it as a null, which is valid for null schemas or unions with any branch null. A related case is below: {"type":"record", "name":"Nothing", "fields":[ {"name":"blank", "type": "null"} ]} Nothing.newBuilder().build(); // should error, blank not set and has no explicit default
        Hide
        Scott Carey added a comment -

        With this patch, the below tests are not consistent. IMO these should all pass, or all fail. No record has a default, so if any of these were missing during schema resolution it would fail. If we want the builder API to behave similar to schema resolution all should fail. If we want an unset field to behave as if it were set to null, then all should pass.

          Schema rec = new Schema.Parser().parse("{\"type\":\"record\", \"name\":\"Nothing\", \"fields\":[" +
              "{\"name\":\"blank\", \"type\": \"null\"}]}");
          Schema unionSchema = new Schema.Parser().parse("{\"type\":\"record\", \"name\":\"Person\", \"fields\":[" +
          		"{\"name\":\"address\", \"type\": [\"null\", \"string\"]}]}");
          Schema unionSchema2 = new Schema.Parser().parse("{\"type\":\"record\", \"name\":\"Person\", \"fields\":[" +
              "{\"name\":\"address\", \"type\": [\"string\",\"null\"]}]}");
          
          @Test
          public void buildNoFieldSet() {
            new GenericRecordBuilder(rec).build();
          }
          
          @Test
          public void buildNoFieldSetUnion() {
            new GenericRecordBuilder(unionSchema).build();
          }
          
          @Test
          public void buildNoFieldSetUnion2() {
            new GenericRecordBuilder(unionSchema2).build();
          }
        
        Show
        Scott Carey added a comment - With this patch, the below tests are not consistent. IMO these should all pass, or all fail. No record has a default, so if any of these were missing during schema resolution it would fail. If we want the builder API to behave similar to schema resolution all should fail. If we want an unset field to behave as if it were set to null, then all should pass. Schema rec = new Schema.Parser().parse( "{\" type\ ":\" record\ ", \" name\ ":\" Nothing\ ", \" fields\ ":[" + "{\" name\ ":\" blank\ ", \" type\ ": \" null \ "}]}" ); Schema unionSchema = new Schema.Parser().parse( "{\" type\ ":\" record\ ", \" name\ ":\" Person\ ", \" fields\ ":[" + "{\" name\ ":\" address\ ", \" type\ ": [\" null \ ", \" string\ "]}]}" ); Schema unionSchema2 = new Schema.Parser().parse( "{\" type\ ":\" record\ ", \" name\ ":\" Person\ ", \" fields\ ":[" + "{\" name\ ":\" address\ ", \" type\ ": [\" string\ ",\" null \ "]}]}" ); @Test public void buildNoFieldSet() { new GenericRecordBuilder(rec).build(); } @Test public void buildNoFieldSetUnion() { new GenericRecordBuilder(unionSchema).build(); } @Test public void buildNoFieldSetUnion2() { new GenericRecordBuilder(unionSchema2).build(); }
        Hide
        Doug Cutting added a comment -

        Good point. It fixes builder to be consistent with ResolvingDecoder and the spec in using the first type of the union for defaults, but ot does not fix the issue you mention aove, where no default is specified yet a null is the first type in the union.

        ResolvingDecoder does currently implement this correctly, failing if no default value is specified in the schema even if the schema is a union with null as its first value.

        Here's a patch that fixes the builder API to conform to the spec in this case too.

        There's a separate question about whether we should change the spec so that the default value for default values is not unspecified but null. That would permit folks to use schemas like the one you have above. It would require some minor changes to ResolvingDecoder and to this patch.

        Show
        Doug Cutting added a comment - Good point. It fixes builder to be consistent with ResolvingDecoder and the spec in using the first type of the union for defaults, but ot does not fix the issue you mention aove, where no default is specified yet a null is the first type in the union. ResolvingDecoder does currently implement this correctly, failing if no default value is specified in the schema even if the schema is a union with null as its first value. Here's a patch that fixes the builder API to conform to the spec in this case too. There's a separate question about whether we should change the spec so that the default value for default values is not unspecified but null. That would permit folks to use schemas like the one you have above. It would require some minor changes to ResolvingDecoder and to this patch.
        Hide
        Scott Carey added a comment -

        This approach works for me and the code is clear.

        Minor nit we should fix at the same time: RecordBuilderBase.validate() has incorrect javadoc, it says it throws NullPointerException but it throws AvroRuntimeException.

        However, there is a new test failure in TestSpecificRecordBuilder.testInterop() now that we are more strict and a null field without a default must be set.

        Show
        Scott Carey added a comment - This approach works for me and the code is clear. Minor nit we should fix at the same time: RecordBuilderBase.validate() has incorrect javadoc, it says it throws NullPointerException but it throws AvroRuntimeException. However, there is a new test failure in TestSpecificRecordBuilder.testInterop() now that we are more strict and a null field without a default must be set.
        Hide
        Doug Cutting added a comment -

        I fixed RecordBuilderBase and TestSpecificRecordBuilder as suggested and committed this.

        Show
        Doug Cutting added a comment - I fixed RecordBuilderBase and TestSpecificRecordBuilder as suggested and committed this.

          People

          • Assignee:
            James Baldassari
            Reporter:
            James Baldassari
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development