Avro
  1. Avro
  2. AVRO-784

SpecificCompiler should generate accessors

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: java
    • Labels:
    • Release Note:
      The Java SpecificCompiler now generates bean style accessor methods for fields.

      Description

      Avro's Java SpecificCompiler should generate java bean style accessors.

      1. avro-784.diff
        3 kB
        E. Sammer
      2. avro-784.diff
        5 kB
        E. Sammer

        Issue Links

          Activity

          Hide
          E. Sammer added a comment -

          Input welcome. I thought this would be useful in supporting backward compatible APIs with aliases down the road so client code doesn't need to reference generated members directly.

          I have an ICLA on file with the ASF and assign all rights to the ASF under the same license as Avro.

          Show
          E. Sammer added a comment - Input welcome. I thought this would be useful in supporting backward compatible APIs with aliases down the road so client code doesn't need to reference generated members directly. I have an ICLA on file with the ASF and assign all rights to the ASF under the same license as Avro.
          Hide
          Doug Cutting added a comment -

          The approach I've heard others mention is, rather than back-compatible, to use a new set of templates that generates setters and have applications explicitly upgrade to the new templates. Then the fields could become private and the generated API would become more abstract, permitting optimizations. But we could pursue both approaches too.

          A few concerns with this patch:

          • appending get/set onto the front of field names may result in name conflicts. For example, a record may have fields named both 'setSize' and 'size'. For other generated names we include a dollar sign, a character that is not permitted in Avro names. My preference here would be to generate, for a field named 'foo', a 'foo()' getter and a 'foo' setter.
          • we don't need to generate calls to the 'put' and 'get' methods but can directly set/get the fields.
          • if we directly access the fields then we should use javaUnbox instead of javaType, so that, e.g., the setter for an int field accepts an "int" paramter rather than an "Integer".
          • we should ideally generate javadoc for the setter/getters. if the field in the schema has documentation then we should include that, otherwise perhaps generate some vanilla javadoc.
          Show
          Doug Cutting added a comment - The approach I've heard others mention is, rather than back-compatible, to use a new set of templates that generates setters and have applications explicitly upgrade to the new templates. Then the fields could become private and the generated API would become more abstract, permitting optimizations. But we could pursue both approaches too. A few concerns with this patch: appending get/set onto the front of field names may result in name conflicts. For example, a record may have fields named both 'setSize' and 'size'. For other generated names we include a dollar sign, a character that is not permitted in Avro names. My preference here would be to generate, for a field named 'foo', a 'foo()' getter and a 'foo ' setter. we don't need to generate calls to the 'put' and 'get' methods but can directly set/get the fields. if we directly access the fields then we should use javaUnbox instead of javaType, so that, e.g., the setter for an int field accepts an "int" paramter rather than an "Integer". we should ideally generate javadoc for the setter/getters. if the field in the schema has documentation then we should include that, otherwise perhaps generate some vanilla javadoc.
          Hide
          E. Sammer added a comment -

          appending get/set onto the front of field names may result in name conflicts. For example, a record may have fields named both 'setSize' and 'size'. For other generated names we include a dollar sign, a character that is not permitted in Avro names. My preference here would be to generate, for a field named 'foo', a 'foo()' getter and a 'foo' setter.

          The prepending, in the case of similarly named fields, would simply result in 'setSetSize(sizeSize)' which would probably be as the user intended. This should be fine.

          The rationale for adhering to the javabean style accessors is that many frameworks and libraries already grok these names during reflection. For instance, Spring takes advantage of this for automatic resolution of names of properties all over the place.

          Would you be willing to accept these conventions? Also, if you don't think this makes sense, would it be acceptable to create an annotation / json property that results in generated accessor so users have the choice? i.e.

          record {
            @get @set int foo;
            string inacessableType;
          }
          

          or similar?

          we don't need to generate calls to the 'put' and 'get' methods but can directly set/get the fields.

          Wasn't sure if that was OK. Sounds good.

          If we directly access the fields then we should use javaUnbox instead of javaType, so that, e.g., the setter for an int field accepts an "int" paramter rather than an "Integer".

          Makes sense.

          we should ideally generate javadoc for the setter/getters. if the field in the schema has documentation then we should include that, otherwise perhaps generate some vanilla javadoc.

          No problem.

          Show
          E. Sammer added a comment - appending get/set onto the front of field names may result in name conflicts. For example, a record may have fields named both 'setSize' and 'size'. For other generated names we include a dollar sign, a character that is not permitted in Avro names. My preference here would be to generate, for a field named 'foo', a 'foo()' getter and a 'foo ' setter. The prepending, in the case of similarly named fields, would simply result in 'setSetSize(sizeSize)' which would probably be as the user intended. This should be fine. The rationale for adhering to the javabean style accessors is that many frameworks and libraries already grok these names during reflection. For instance, Spring takes advantage of this for automatic resolution of names of properties all over the place. Would you be willing to accept these conventions? Also, if you don't think this makes sense, would it be acceptable to create an annotation / json property that results in generated accessor so users have the choice? i.e. record { @get @set int foo; string inacessableType; } or similar? we don't need to generate calls to the 'put' and 'get' methods but can directly set/get the fields. Wasn't sure if that was OK. Sounds good. If we directly access the fields then we should use javaUnbox instead of javaType, so that, e.g., the setter for an int field accepts an "int" paramter rather than an "Integer". Makes sense. we should ideally generate javadoc for the setter/getters. if the field in the schema has documentation then we should include that, otherwise perhaps generate some vanilla javadoc. No problem.
          Hide
          Scott Carey added a comment -

          In the long run we'll end up with a few variations for generated code. One of them will nee to be 'standard' bean compatible, with set/get prefixes because other frameworks look for that.

          In general, the user should have a choice on how to generate the access methods. However, the first iteration does not need to be that flexible, as long as it doesn't cause difficulty implementing other variations in the future.

          The signature for a nullable type would differ from a non-nullable type if they both are not objects, for example:

          Integer getFooCount();
          void setFooCount(Integer fooCount);
          

          for nullable types and

          int getFooCount();
          void setFooCount(int fooCount);
          

          for non-nullable types.

          Other likely variations that others may contribute later:

          • getters without setters, using an immutable object + builder pattern.
          • nullable intrinsics not boxed, but using a BitSet or similar to check for null (minimize memory footprint).

          Do we want to simply mark the whole record with an annotation (@bean, @immutable, @builder)? Or do we need the flexibility at the field level?

          Show
          Scott Carey added a comment - In the long run we'll end up with a few variations for generated code. One of them will nee to be 'standard' bean compatible, with set/get prefixes because other frameworks look for that. In general, the user should have a choice on how to generate the access methods. However, the first iteration does not need to be that flexible, as long as it doesn't cause difficulty implementing other variations in the future. The signature for a nullable type would differ from a non-nullable type if they both are not objects, for example: Integer getFooCount(); void setFooCount( Integer fooCount); for nullable types and int getFooCount(); void setFooCount( int fooCount); for non-nullable types. Other likely variations that others may contribute later: getters without setters, using an immutable object + builder pattern. nullable intrinsics not boxed, but using a BitSet or similar to check for null (minimize memory footprint). Do we want to simply mark the whole record with an annotation (@bean, @immutable, @builder)? Or do we need the flexibility at the field level?
          Hide
          Doug Cutting added a comment -

          I don't have a problem with the convention, I'm just paranoid about the potential for name conflicts in generated code. My example above was not good, however, as you point out.

          We do need to worry about conflicts with names we inherit. From Object we inherit 'getClass()' and from GenericContainer we inherit 'getSchema()', so I think, if a field is named 'class' or 'schema' then we'll need to mangle the generated accessor method. This actually requires less mangling than un-prefixed names, since otherwise we'd need to mangle accessor names for fields named 'wait', 'hashCode', 'equals', etc.

          Also, if someone has a record with fields named "foo" and "Foo" then we'd be in trouble.

          Show
          Doug Cutting added a comment - I don't have a problem with the convention, I'm just paranoid about the potential for name conflicts in generated code. My example above was not good, however, as you point out. We do need to worry about conflicts with names we inherit. From Object we inherit 'getClass()' and from GenericContainer we inherit 'getSchema()', so I think, if a field is named 'class' or 'schema' then we'll need to mangle the generated accessor method. This actually requires less mangling than un-prefixed names, since otherwise we'd need to mangle accessor names for fields named 'wait', 'hashCode', 'equals', etc. Also, if someone has a record with fields named "foo" and "Foo" then we'd be in trouble.
          Hide
          E. Sammer added a comment -

          Scott:

          In the long run we'll end up with a few variations for generated code. One of them will nee to be 'standard' bean compatible, with set/get prefixes because other frameworks look for that.

          In general, the user should have a choice on how to generate the access methods. However, the first iteration does not need to be that flexible, as long as it doesn't cause difficulty implementing other variations in the future.

          +1. Makes sense to me. I also like your point on nullable types. That's a nice (and easy) thing to expose like that.

          I think having field level flexibility is a good long term goal and will be useful. Support for advanced annotations like @immutable and @builder (at the record level) are very cool.

          I guess my current question is, assuming I can brow beat Doug sufficiently, what subset of these features are required as a v1 feature set? In addition to what I have, I'm confident I can handle:

          • Making the changes Doug suggested (e.g. doc addition / generation, removing use of put() / get(), dealing with the boxing appropriately).
          • Generating accessors that support ref types for nullable and primitives for non-nullables (assuming that information is easy to get at from Schema$Field or friends).

          I haven't yet looked at what it takes to implement annotations. For that reason, I'm inclined to defer that to another JIRA if that's acceptable (although I'd still be interested in doing it).

          Show
          E. Sammer added a comment - Scott: In the long run we'll end up with a few variations for generated code. One of them will nee to be 'standard' bean compatible, with set/get prefixes because other frameworks look for that. In general, the user should have a choice on how to generate the access methods. However, the first iteration does not need to be that flexible, as long as it doesn't cause difficulty implementing other variations in the future. +1. Makes sense to me. I also like your point on nullable types. That's a nice (and easy) thing to expose like that. I think having field level flexibility is a good long term goal and will be useful. Support for advanced annotations like @immutable and @builder (at the record level) are very cool. I guess my current question is, assuming I can brow beat Doug sufficiently, what subset of these features are required as a v1 feature set? In addition to what I have, I'm confident I can handle: Making the changes Doug suggested (e.g. doc addition / generation, removing use of put() / get(), dealing with the boxing appropriately). Generating accessors that support ref types for nullable and primitives for non-nullables (assuming that information is easy to get at from Schema$Field or friends). I haven't yet looked at what it takes to implement annotations. For that reason, I'm inclined to defer that to another JIRA if that's acceptable (although I'd still be interested in doing it).
          Hide
          Doug Cutting added a comment -

          Scott: the javaUnbox() method should already handle nullable types correctly.

          Show
          Doug Cutting added a comment - Scott: the javaUnbox() method should already handle nullable types correctly.
          Hide
          Doug Cutting added a comment -

          Doc addition is not required for v1, but is easy, so why not.

          I don't think we should implement both prefixed and unprefixed. Let's just pick one. I now think that prefixing leads to fewer name conflicts and is thus preferable.

          Show
          Doug Cutting added a comment - Doc addition is not required for v1, but is easy, so why not. I don't think we should implement both prefixed and unprefixed. Let's just pick one. I now think that prefixing leads to fewer name conflicts and is thus preferable.
          Hide
          E. Sammer added a comment -

          Updated patch with:

          • Javadoc for each accessor (either custom or generic docs if docs aren't provided).
          • Generate accessors for aliases, if they exist.
          • No longer use get() / put() for member access.
          • Use javaUnbox() rather than javaType() to figure out java types for setter args and getter return types.
          Show
          E. Sammer added a comment - Updated patch with: Javadoc for each accessor (either custom or generic docs if docs aren't provided). Generate accessors for aliases, if they exist. No longer use get() / put() for member access. Use javaUnbox() rather than javaType() to figure out java types for setter args and getter return types.
          Hide
          Scott Carey added a comment -

          Looking good.

          We'll need to create a different velocity profile instead of breaking the current one (perhaps 'bean' instead of 'classic'?) and figure out how to wire this up to switch types at user request in the Tool/AntTask/MavenPlugin.

          I'd like to see some name collision tests too. We can have collisions with Object.getClass(), IndexedRecord.getSchema(), and perhaps others. Those will need to be mangled. I believe "class" should already be mangled at the field name level (see SpecificCompiler.mangle()).

          The mangling for method names will differ from mangling for field names. Currently, a field named "volatile" gets mangled to be a member variable named "volatile$". The getter can be getVolatile() rather than getVolatile$(). Conversely, "schema" does not need to have its field mangled, but does need to have its getter mangled. Its setter does not require mangling, but for consistency should be.

          Show
          Scott Carey added a comment - Looking good. We'll need to create a different velocity profile instead of breaking the current one (perhaps 'bean' instead of 'classic'?) and figure out how to wire this up to switch types at user request in the Tool/AntTask/MavenPlugin. I'd like to see some name collision tests too. We can have collisions with Object.getClass(), IndexedRecord.getSchema(), and perhaps others. Those will need to be mangled. I believe "class" should already be mangled at the field name level (see SpecificCompiler.mangle()). The mangling for method names will differ from mangling for field names. Currently, a field named "volatile" gets mangled to be a member variable named "volatile$". The getter can be getVolatile() rather than getVolatile$(). Conversely, "schema" does not need to have its field mangled, but does need to have its getter mangled. Its setter does not require mangling, but for consistency should be.
          Hide
          E. Sammer added a comment -

          We'll need to create a different velocity profile instead of breaking the current one (perhaps 'bean' instead of 'classic'?) and figure out how to wire this up to switch types at user request in the Tool/AntTask/MavenPlugin.

          I made sure the changes didn't break backward compatibility; the only changes are additions. That said, I guess having a separate template so we can make instance vars private makes sense.

          I'd like to see some name collision tests too...

          Is there a reasonable (predictable) mangling scheme that really makes sense though? There's also some nastiness in actually being able to detect collisions programmatically when we're talking about generated classes without making significant assumptions about super classes. I suppose an alternative is to make it all annotation based and allow the user to override the names (or force them to if they want to avoid collisions).

          Show
          E. Sammer added a comment - We'll need to create a different velocity profile instead of breaking the current one (perhaps 'bean' instead of 'classic'?) and figure out how to wire this up to switch types at user request in the Tool/AntTask/MavenPlugin. I made sure the changes didn't break backward compatibility; the only changes are additions. That said, I guess having a separate template so we can make instance vars private makes sense. I'd like to see some name collision tests too... Is there a reasonable (predictable) mangling scheme that really makes sense though? There's also some nastiness in actually being able to detect collisions programmatically when we're talking about generated classes without making significant assumptions about super classes. I suppose an alternative is to make it all annotation based and allow the user to override the names (or force them to if they want to avoid collisions).
          Hide
          Doug Cutting added a comment -

          I think there are only a finite number of collisions possible, unless we start adding methods to the superclasses of generated classes. The only collisions I see at present are with Object#getClass() and IndexedRecord#getSchema(). I'd be happy to see just those handled and tested now.

          Show
          Doug Cutting added a comment - I think there are only a finite number of collisions possible, unless we start adding methods to the superclasses of generated classes. The only collisions I see at present are with Object#getClass() and IndexedRecord#getSchema(). I'd be happy to see just those handled and tested now.
          Hide
          Doug Cutting added a comment -

          Scott, would you be willing to have this committed as a compatible extension of the existing templates? If so, then I think all this lacks before we commmit it is mangling of getClass and getSchema, perhaps to be getClass$ and getSchema$.

          Show
          Doug Cutting added a comment - Scott, would you be willing to have this committed as a compatible extension of the existing templates? If so, then I think all this lacks before we commmit it is mangling of getClass and getSchema, perhaps to be getClass$ and getSchema$.
          Hide
          Doug Cutting added a comment -

          I think this is subsumed by AVRO-839. Is there anything here that's not there?

          Show
          Doug Cutting added a comment - I think this is subsumed by AVRO-839 . Is there anything here that's not there?
          Hide
          Doug Cutting added a comment -

          Resolving as a duplicate of AVRO-839.

          Show
          Doug Cutting added a comment - Resolving as a duplicate of AVRO-839 .
          Hide
          Scott Carey added a comment -

          I think there were some details that did not make it into AVRO-839. In particular, all accessors in AVRO-839 appear to be boxed, whether the value is nullable or not. I have not moved to using the API there because of the boxing overhead incurred. These issues will need to go into another ticket for later.

          Show
          Scott Carey added a comment - I think there were some details that did not make it into AVRO-839 . In particular, all accessors in AVRO-839 appear to be boxed, whether the value is nullable or not. I have not moved to using the API there because of the boxing overhead incurred. These issues will need to go into another ticket for later.
          Hide
          Doug Cutting added a comment -

          Scott, good point. Let's open a new issue to add unboxed accessors.

          Show
          Doug Cutting added a comment - Scott, good point. Let's open a new issue to add unboxed accessors.

            People

            • Assignee:
              Unassigned
              Reporter:
              E. Sammer
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development