Avro
  1. Avro
  2. AVRO-1476

Make position field of org.apache.avro.Schema not transient.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.7.7, 1.8.0
    • Fix Version/s: 1.7.7
    • Component/s: java
    • Labels:
      None

      Description

      Referring to: https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L358

      Christophe Taton did some research into possible causes/reasons for this field to be marked as transient but was unable to find any reason. The org.apache.avro.Schema class is not marked as serializable so this transient field serves no purpose. This transient field can cause odd behaviors with external serialization frameworks (and the built-in java serialization framework) when trying to serialize schemas.

      1. AVRO-1476.patch
        0.5 kB
        Robert Chu

        Activity

        Hide
        Robert Chu added a comment -

        The one line patch to fix this.

        Show
        Robert Chu added a comment - The one line patch to fix this.
        Hide
        Christophe Taton added a comment -

        The change looks fine by me. Can someone else comment on the need for transient?

        Show
        Christophe Taton added a comment - The change looks fine by me. Can someone else comment on the need for transient?
        Hide
        Doug Cutting added a comment -

        The question is, if we made it serializable, would we want to serialize this field? I'd suggest not. What application is harmed by this declaration?

        Avro reflect ignores fields that are declared as transient. I suspect the origin of this declaration is that I was looking at the schema that reflect generated for Schema. However there are now a number of other fields in Schema.java which should also be declared transient and are not. For example, RecordSchema#fieldMap, Name#full, etc. These, like Field#position, are computed from other fields and cached, but were added after Field#position and not declared transient.

        Since we're not consistent about transient declarations in Schema.java, removing this one should be harmless.

        Show
        Doug Cutting added a comment - The question is, if we made it serializable, would we want to serialize this field? I'd suggest not. What application is harmed by this declaration? Avro reflect ignores fields that are declared as transient. I suspect the origin of this declaration is that I was looking at the schema that reflect generated for Schema. However there are now a number of other fields in Schema.java which should also be declared transient and are not. For example, RecordSchema#fieldMap, Name#full, etc. These, like Field#position, are computed from other fields and cached, but were added after Field#position and not declared transient. Since we're not consistent about transient declarations in Schema.java, removing this one should be harmless.
        Hide
        Robert Chu added a comment -

        I would think you would want to serialize this field. Without serializing it, the position of each field in a record would be lost. Unless I'm misreading this, not having correct position information for a field in a record would cause all fields to be written to a single position. Since transient fields will get reset to their type's default value, to support regenerating this field post-serialization, either the marker value for "uninitialized value" should be changed to 0 (the default value for int) or the need to regenerate this cached variable should be indicated elsewhere.

        Show
        Robert Chu added a comment - I would think you would want to serialize this field. Without serializing it, the position of each field in a record would be lost. Unless I'm misreading this, not having correct position information for a field in a record would cause all fields to be written to a single position. Since transient fields will get reset to their type's default value, to support regenerating this field post-serialization, either the marker value for "uninitialized value" should be changed to 0 (the default value for int) or the need to regenerate this cached variable should be indicated elsewhere.
        Show
        Robert Chu added a comment - An example: http://java67.blogspot.com/2012/08/what-is-transient-variable-in-java.html
        Hide
        Doug Cutting added a comment -

        Robert, you're right, with Java serialization, we probably would want to serialize this field, even though its value is derived entirely from other fields. +1 on the provided patch.

        Show
        Doug Cutting added a comment - Robert, you're right, with Java serialization, we probably would want to serialize this field, even though its value is derived entirely from other fields. +1 on the provided patch.
        Hide
        Sandy Ryza added a comment -

        +1 to this as well. Kryo serialization, the most common internal serialization framework for Spark also ignores this field, making it impossible to pass GenericData.Records through Spark stages.

        Show
        Sandy Ryza added a comment - +1 to this as well. Kryo serialization, the most common internal serialization framework for Spark also ignores this field, making it impossible to pass GenericData.Records through Spark stages.
        Hide
        Christophe Taton added a comment -

        This patch looks good to me. Tests run fine.
        If nobody objects, I'd like to push it soon.

        Doug Cutting is that ok?

        Show
        Christophe Taton added a comment - This patch looks good to me. Tests run fine. If nobody objects, I'd like to push it soon. Doug Cutting is that ok?
        Hide
        Doug Cutting added a comment -

        +1

        Show
        Doug Cutting added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1598151 from Doug Cutting in branch 'avro/trunk'
        [ https://svn.apache.org/r1598151 ]

        AVRO-1476. Remove transient declaration from Schema.Field#position. Contributed by Robert Chu.

        Show
        ASF subversion and git services added a comment - Commit 1598151 from Doug Cutting in branch 'avro/trunk' [ https://svn.apache.org/r1598151 ] AVRO-1476 . Remove transient declaration from Schema.Field#position. Contributed by Robert Chu.
        Hide
        Doug Cutting added a comment -

        I committed this. Thanks, Robert!

        Show
        Doug Cutting added a comment - I committed this. Thanks, Robert!
        Hide
        Hudson added a comment -

        FAILURE: Integrated in AvroJava #452 (See https://builds.apache.org/job/AvroJava/452/)
        AVRO-1476. Remove transient declaration from Schema.Field#position. Contributed by Robert Chu. (cutting: rev 1598151)

        • /avro/trunk/CHANGES.txt
        • /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java
        Show
        Hudson added a comment - FAILURE: Integrated in AvroJava #452 (See https://builds.apache.org/job/AvroJava/452/ ) AVRO-1476 . Remove transient declaration from Schema.Field#position. Contributed by Robert Chu. (cutting: rev 1598151) /avro/trunk/CHANGES.txt /avro/trunk/lang/java/avro/src/main/java/org/apache/avro/Schema.java

          People

          • Assignee:
            Robert Chu
            Reporter:
            Robert Chu
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development