Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2090

Improve encode/decode time for SpecificRecord using code generation

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.9.0
    • java
    • None

    Description

      New implementation for generation of code for SpecificRecord that improves decoding by over 10% and encoding over 30% (more improvements are on the way). This feature is behind a feature flag (org.apache.avro.specific.use_custom_coders) and (for now) turned off by default. See Getting Started (Java) for instructions.

      (A bit more info: Compared to GenericRecords, SpecificRecords offer type-safety plus the performance of traditional getters/setters/instance variables. But these are only beneficial to Java code accessing those records. SpecificRecords inherit serialization and deserialization code from GenericRecords, which is dynamic and thus slow (in fact, benchmarks show that serialization and deserialization is slower for SpecificRecord than for GenericRecord). This patch extends record.vm to generate custom, higher-performance encoder and decoder functions for SpecificRecords.)

      Attachments

        1. customcoders.md
          10 kB
          Raymie Stata
        2. perf-data.txt
          10 kB
          Raymie Stata

        Issue Links

          Activity

            githubbot ASF GitHub Bot added a comment -

            Fokko commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432530681

            @rstata You're right, it could also be old state on my side. Let me figure it out. I'll let you know! Cheers

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - Fokko commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432530681 @rstata You're right, it could also be old state on my side. Let me figure it out. I'll let you know! Cheers ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432458389

            Yes, I ran tests (outside of Docker) before I committed this. Sorry to hear
            it's broken for you.

            Doug

            On Tue, Oct 23, 2018, 12:05 PM Fokko Driesprong <notifications@github.com>
            wrote:

            > Did you check the tests? I'm trying to set up CI/CD, but the following
            > tests are failing:
            >
            > -------------------------------------------------------
            > T E S T S
            > -------------------------------------------------------
            > Running org.apache.avro.specific.TestGeneratedCode
            > Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.353 sec <<< FAILURE! - in org.apache.avro.specific.TestGeneratedCode
            > withSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.042 sec <<< FAILURE!
            > java.lang.AssertionError: Test schema must allow for custom coders.
            > at org.apache.avro.specific.TestGeneratedCode.withSchemaMigration(TestGeneratedCode.java:75)
            >
            > withoutSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.001 sec <<< FAILURE!
            > java.lang.AssertionError: Test schema must allow for custom coders.
            > at org.apache.avro.specific.TestGeneratedCode.withoutSchemaMigration(TestGeneratedCode.java:54)
            >
            > —
            > You are receiving this because you modified the open/close state.
            > Reply to this email directly, view it on GitHub
            > <https://github.com/apache/avro/pull/350#issuecomment-432378885>, or mute
            > the thread
            > <https://github.com/notifications/unsubscribe-auth/AAQhejA5vRqKytFRiRMGNPDmmJUdSfudks5un2iGgaJpZM4XjMR_>
            > .
            >

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432458389 Yes, I ran tests (outside of Docker) before I committed this. Sorry to hear it's broken for you. Doug On Tue, Oct 23, 2018, 12:05 PM Fokko Driesprong <notifications@github.com> wrote: > Did you check the tests? I'm trying to set up CI/CD, but the following > tests are failing: > > ------------------------------------------------------- > T E S T S > ------------------------------------------------------- > Running org.apache.avro.specific.TestGeneratedCode > Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.353 sec <<< FAILURE! - in org.apache.avro.specific.TestGeneratedCode > withSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.042 sec <<< FAILURE! > java.lang.AssertionError: Test schema must allow for custom coders. > at org.apache.avro.specific.TestGeneratedCode.withSchemaMigration(TestGeneratedCode.java:75) > > withoutSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: Test schema must allow for custom coders. > at org.apache.avro.specific.TestGeneratedCode.withoutSchemaMigration(TestGeneratedCode.java:54) > > — > You are receiving this because you modified the open/close state. > Reply to this email directly, view it on GitHub > < https://github.com/apache/avro/pull/350#issuecomment-432378885 >, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAQhejA5vRqKytFRiRMGNPDmmJUdSfudks5un2iGgaJpZM4XjMR_ > > . > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432438717

            The Docker build also passed (see below).

            FWIW, when doing work on my laptop, I can't get `mvn test` to work – I have to do `mvn package` instead. I think this is because the `package` goal for earlier sub-projects of `lang/java` end up creating outputs needed to run tests in the later sub-projects. (Although, `build.sh` does a `mvn test`, which seems to work – I find the behavior of Maven mysterious).

            Can you try a `mvn clean package` instead?

            Here is the output of my Docker build:
            ```
            rstata@76312f39ae43:~/avro$ ./build.sh test
            + for target in '"$@"'
            + case "$target" in
            + cd lang/java
            + ./build.sh test
            [INFO] Scanning for projects...
            ...LOTS OUT OUTPUT ELIDED...
            [INFO] ------------------------------------------------------------------------
            [INFO] Reactor Summary:
            [INFO]
            [INFO] Apache Avro Java .................................. SUCCESS [30.554s]
            [INFO] Apache Avro Guava Dependencies .................... SUCCESS [17.652s]
            [INFO] Apache Avro ....................................... SUCCESS [4:08.692s]
            [INFO] Apache Avro Compiler .............................. SUCCESS [1:01.525s]
            [INFO] Apache Avro Maven Plugin .......................... SUCCESS [40.351s]
            [INFO] Apache Avro IPC ................................... SUCCESS [3:48.624s]
            [INFO] Trevni Java ....................................... SUCCESS [0.282s]
            [INFO] Trevni Java Core .................................. SUCCESS [14.026s]
            [INFO] Apache Avro Mapred API ............................ SUCCESS [6:49.612s]
            [INFO] Trevni Java Avro .................................. SUCCESS [1:09.773s]
            [INFO] Trevni Specification .............................. SUCCESS [0.579s]
            [INFO] Apache Avro Tools ................................. SUCCESS [2:44.934s]
            [INFO] Apache Avro Protobuf Compatibility ................ SUCCESS [7.054s]
            [INFO] Apache Avro Thrift Compatibility .................. SUCCESS [10.956s]
            [INFO] Apache Avro Maven Archetypes ...................... SUCCESS [2.159s]
            [INFO] Apache Avro Maven Service Archetype ............... SUCCESS [0.902s]
            [INFO] Apache Avro gRPC .................................. SUCCESS [19.420s]
            [INFO] ------------------------------------------------------------------------
            [INFO] BUILD SUCCESS
            [INFO] ------------------------------------------------------------------------
            [INFO] Total time: 22:47.899s
            [INFO] Finished at: Tue Oct 23 21:50:05 UTC 2018
            [INFO] Final Memory: 64M/194M
            [INFO] ------------------------------------------------------------------------
            ```

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432438717 The Docker build also passed (see below). FWIW, when doing work on my laptop, I can't get `mvn test` to work – I have to do `mvn package` instead. I think this is because the `package` goal for earlier sub-projects of `lang/java` end up creating outputs needed to run tests in the later sub-projects. (Although, `build.sh` does a `mvn test`, which seems to work – I find the behavior of Maven mysterious). Can you try a `mvn clean package` instead? Here is the output of my Docker build: ``` rstata@76312f39ae43:~/avro$ ./build.sh test + for target in '"$@"' + case "$target" in + cd lang/java + ./build.sh test [INFO] Scanning for projects... ...LOTS OUT OUTPUT ELIDED... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Avro Java .................................. SUCCESS [30.554s] [INFO] Apache Avro Guava Dependencies .................... SUCCESS [17.652s] [INFO] Apache Avro ....................................... SUCCESS [4:08.692s] [INFO] Apache Avro Compiler .............................. SUCCESS [1:01.525s] [INFO] Apache Avro Maven Plugin .......................... SUCCESS [40.351s] [INFO] Apache Avro IPC ................................... SUCCESS [3:48.624s] [INFO] Trevni Java ....................................... SUCCESS [0.282s] [INFO] Trevni Java Core .................................. SUCCESS [14.026s] [INFO] Apache Avro Mapred API ............................ SUCCESS [6:49.612s] [INFO] Trevni Java Avro .................................. SUCCESS [1:09.773s] [INFO] Trevni Specification .............................. SUCCESS [0.579s] [INFO] Apache Avro Tools ................................. SUCCESS [2:44.934s] [INFO] Apache Avro Protobuf Compatibility ................ SUCCESS [7.054s] [INFO] Apache Avro Thrift Compatibility .................. SUCCESS [10.956s] [INFO] Apache Avro Maven Archetypes ...................... SUCCESS [2.159s] [INFO] Apache Avro Maven Service Archetype ............... SUCCESS [0.902s] [INFO] Apache Avro gRPC .................................. SUCCESS [19.420s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 22:47.899s [INFO] Finished at: Tue Oct 23 21:50:05 UTC 2018 [INFO] Final Memory: 64M/194M [INFO] ------------------------------------------------------------------------ ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            Fokko commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432429845

            Thanks for the quick response! Using `./build.sh docker` you should get a full fledged docker environment to run the tests. If you run `mvn clean test` in `lang/avro/` the tests error on my current master.

            For example, the trailing whitespace is something that is being added in the test reference file should not be in the generated file: https://github.com/rstata-projects/avro/blob/98b3df3410c4dc14aa6b5890e2f7482da55350ca/lang/java/tools/src/test/compiler/output/Player.java#L543-L553
            This showed up in the diff of the failed test.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - Fokko commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432429845 Thanks for the quick response! Using `./build.sh docker` you should get a full fledged docker environment to run the tests. If you run `mvn clean test` in `lang/avro/` the tests error on my current master. For example, the trailing whitespace is something that is being added in the test reference file should not be in the generated file: https://github.com/rstata-projects/avro/blob/98b3df3410c4dc14aa6b5890e2f7482da55350ca/lang/java/tools/src/test/compiler/output/Player.java#L543-L553 This showed up in the diff of the failed test. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432418896

            I am trying to reproduce. When run from my Mac command-line (`mv clean && mvn package`), it seems like all tests pass (see uploaded file). Will try to reproduce using a Docker build, but this will take a while because I haven't done one before.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432418896 I am trying to reproduce. When run from my Mac command-line (`mv clean && mvn package`), it seems like all tests pass (see uploaded file). Will try to reproduce using a Docker build, but this will take a while because I haven't done one before. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            Fokko commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432415634

            It looks like three more tests are failing later on: https://gist.github.com/Fokko/40602f95c04313bd6c7aca00316ec84c

            PTAL @rstata

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - Fokko commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432415634 It looks like three more tests are failing later on: https://gist.github.com/Fokko/40602f95c04313bd6c7aca00316ec84c PTAL @rstata ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            Fokko commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-432378885

            Did you check the tests? I'm trying to set up CI/CD, but the following tests are failing:
            ```
            -------------------------------------------------------
            T E S T S
            -------------------------------------------------------
            Running org.apache.avro.specific.TestGeneratedCode
            Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.353 sec <<< FAILURE! - in org.apache.avro.specific.TestGeneratedCode
            withSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.042 sec <<< FAILURE!
            java.lang.AssertionError: Test schema must allow for custom coders.
            at org.apache.avro.specific.TestGeneratedCode.withSchemaMigration(TestGeneratedCode.java:75)

            withoutSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.001 sec <<< FAILURE!
            java.lang.AssertionError: Test schema must allow for custom coders.
            at org.apache.avro.specific.TestGeneratedCode.withoutSchemaMigration(TestGeneratedCode.java:54)
            ```

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - Fokko commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-432378885 Did you check the tests? I'm trying to set up CI/CD, but the following tests are failing: ``` ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.avro.specific.TestGeneratedCode Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.353 sec <<< FAILURE! - in org.apache.avro.specific.TestGeneratedCode withSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.042 sec <<< FAILURE! java.lang.AssertionError: Test schema must allow for custom coders. at org.apache.avro.specific.TestGeneratedCode.withSchemaMigration(TestGeneratedCode.java:75) withoutSchemaMigration(org.apache.avro.specific.TestGeneratedCode) Time elapsed: 0.001 sec <<< FAILURE! java.lang.AssertionError: Test schema must allow for custom coders. at org.apache.avro.specific.TestGeneratedCode.withoutSchemaMigration(TestGeneratedCode.java:54) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            cutting Doug Cutting added a comment -

            I committed this. Thanks, Raymie!

            cutting Doug Cutting added a comment - I committed this. Thanks, Raymie!

            Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch refs/heads/master from raymie
            [ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ]

            AVRO-2090 second try (#350)

            • Finished initial implementation (not tested).
            • Added Reader/Decoder code
            • Made some of the changes suggested by Doug.
            • Hide helper methods related to custom coding. Changed them from public to protected. Also changed name of encode and decode to customEncode and customDecode to be more clear as to their function.
            • Allow dynamic changes to flag that controls whether or not the custom en/decoders are used.
            • Fixed typos in TestSpecificCompiler.java
            • New test case: breaks new code-gen when schema needs resolution.
            • Fixed bug with decoding when the schema has been migrated.
            • Added test-with-custom-coders execution of testing and fixed some problems that this uncovered.
            • Fixed potential performance problem of redundantly allocating objects.
            • Added documentation (also update AVRO-2090 description to point to new docs).
            • Small doc fix (I forgot to commit these changes before pushing)
            jira-bot ASF subversion and git services added a comment - Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch refs/heads/master from raymie [ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ] AVRO-2090 second try (#350) Finished initial implementation (not tested). Added Reader/Decoder code Made some of the changes suggested by Doug. Hide helper methods related to custom coding. Changed them from public to protected. Also changed name of encode and decode to customEncode and customDecode to be more clear as to their function. Allow dynamic changes to flag that controls whether or not the custom en/decoders are used. Fixed typos in TestSpecificCompiler.java New test case: breaks new code-gen when schema needs resolution. Fixed bug with decoding when the schema has been migrated. Added test-with-custom-coders execution of testing and fixed some problems that this uncovered. Fixed potential performance problem of redundantly allocating objects. Added documentation (also update AVRO-2090 description to point to new docs). Small doc fix (I forgot to commit these changes before pushing)

            Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch refs/heads/master from raymie
            [ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ]

            AVRO-2090 second try (#350)

            • Finished initial implementation (not tested).
            • Added Reader/Decoder code
            • Made some of the changes suggested by Doug.
            • Hide helper methods related to custom coding. Changed them from public to protected. Also changed name of encode and decode to customEncode and customDecode to be more clear as to their function.
            • Allow dynamic changes to flag that controls whether or not the custom en/decoders are used.
            • Fixed typos in TestSpecificCompiler.java
            • New test case: breaks new code-gen when schema needs resolution.
            • Fixed bug with decoding when the schema has been migrated.
            • Added test-with-custom-coders execution of testing and fixed some problems that this uncovered.
            • Fixed potential performance problem of redundantly allocating objects.
            • Added documentation (also update AVRO-2090 description to point to new docs).
            • Small doc fix (I forgot to commit these changes before pushing)
            jira-bot ASF subversion and git services added a comment - Commit b4ede4b116b24b5308e8419504a73e02b7f7e406 in avro's branch refs/heads/master from raymie [ https://gitbox.apache.org/repos/asf?p=avro.git;h=b4ede4b ] AVRO-2090 second try (#350) Finished initial implementation (not tested). Added Reader/Decoder code Made some of the changes suggested by Doug. Hide helper methods related to custom coding. Changed them from public to protected. Also changed name of encode and decode to customEncode and customDecode to be more clear as to their function. Allow dynamic changes to flag that controls whether or not the custom en/decoders are used. Fixed typos in TestSpecificCompiler.java New test case: breaks new code-gen when schema needs resolution. Fixed bug with decoding when the schema has been migrated. Added test-with-custom-coders execution of testing and fixed some problems that this uncovered. Fixed potential performance problem of redundantly allocating objects. Added documentation (also update AVRO-2090 description to point to new docs). Small doc fix (I forgot to commit these changes before pushing)
            githubbot ASF GitHub Bot added a comment -

            cutting closed pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350

            This is a PR merged from a forked repository.
            As GitHub hides the original diff on merge, it is displayed below for
            the sake of provenance:

            As this is a foreign pull request (from a fork), the diff is supplied
            below (as it won't show otherwise due to GitHub magic):

            diff --git a/doc/src/content/xdocs/gettingstartedjava.xml b/doc/src/content/xdocs/gettingstartedjava.xml
            index fe6c7d284..7f331e347 100644
            — a/doc/src/content/xdocs/gettingstartedjava.xml
            +++ b/doc/src/content/xdocs/gettingstartedjava.xml
            @@ -319,6 +319,43 @@ $ mvn compile # includes code generation via Avro Maven plugin
            $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain
            </source>
            </section>
            + <section>
            + <title>Beta feature: Generating faster code</title>
            + <p>
            + In this release we have introduced a new approach to
            + generating code that speeds up decoding of objects by more
            + than 10% and encoding by more than 30% (future performance
            + enhancements are underway). To ensure a smooth introduction
            + of this change into production systems, this feature is
            + controlled by a feature flag, the system
            + property <code>org.apache.avro.specific.use_custom_coders</code>.
            + In this first release, this feature is off by default. To
            + turn it on, set the system flag to <code>true</code> at
            + runtime. In the sample above, for example, you could enable
            + the fater coders as follows:
            + </p>
            + <source>
            +$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \
            + -Dorg.apache.avro.specific.use_custom_coders=true
            + </source>
            + <p>
            + Note that you do <em>not</em> have to recompile your Avro
            + schema to have access to this feature. The feature is
            + compiled and built into your code, and you turn it on and
            + off at runtime using the feature flag. As a result, you can
            + turn it on during testing, for example, and then off in
            + production. Or you can turn it on in production, and
            + quickly turn it off if something breaks.
            + </p>
            + <p>
            + We encourage the Avro community to exercise this new feature
            + early to help build confidence. (For those paying
            + one-demand for compute resources in the cloud, it can lead
            + to meaningful cost savings.) As confidence builds, we will
            + turn this feature on by default, and eventually eliminate
            + the feature flag (and the old code).
            + </p>
            + </section>
            </section>

            <section>
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
            index cb9a82b48..073ca27b0 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
            @@ -116,9 +116,7 @@ public static Object resolve(Schema writer, Schema reader)

            • the above loop will always be correct.
              *
            • Throws a runtime exception if we're not just about to read the
            • * field of a record. Also, this method will consume the field
            • * information, and thus may only be called <em>once</em> before
            • * reading the field value. (However, if the client knows the
              + * first field of a record. (If the client knows the
            • order of incoming fields, then the client does <em>not</em>
            • need to call this method but rather can just start reading the
            • field values.)
              diff --git a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
              index 1c179007a..79558ba47 100644
                • a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
                  +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
                  @@ -66,6 +66,9 @@

            /** Utilities to use existing Java classes and interfaces via reflection. */
            public class ReflectData extends SpecificData {
            + @Override
            + public boolean useCustomCoders()

            { return false; }
            +
            /** {@link ReflectData} implementation that permits null field values. The
            * schema generated for each field is a union of its declared type and
            * null. */
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            index 44de5c434..60d43dcf0 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            @@ -122,6 +122,22 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get() { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            +
            + /** Retrieve the current value of the custom-coders feature flag.
            + * Defaults to <code>true</code>, but this default can be overriden
            + * using the system property
            + * <code>org.apache.avro.specific.use_custom_coders</code>, and can
            + * be set dynamically by {@link useCustomCoders}. See <a
            + * href="https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code"Getting started with Java</a> for more about this
            + * feature flag. */
            + public boolean useCustomCoders() { return useCustomCoderFlag; }
            +
            + /** Dynamically set the value of the custom-coder feature flag.
            + * See {@link useCustomCoders}. */
            + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; }
            +
            @Override
            protected boolean isEnum(Object datum) {
            return datum instanceof Enum || super.isEnum(datum);
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            index 29c989b78..ccf8107ac 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + if (data.useCustomCoders()) {
            + old = data.newRecord(old, expected);
            + if (old instanceof SpecificRecordBase) {
            + SpecificRecordBase d = (SpecificRecordBase) old;
            + if (d.hasCustomCoders()) { + d.customDecode(in); + return d; + }
            + }
            + }
            + return super.readRecord(old, expected, in);
            + }
            +
            @Override
            protected void readField(Object r, Schema.Field f, Object oldDatum,
            ResolvingDecoder in, Object state)
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            index 1204f4955..3d5e7ff4f 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            @@ -71,6 +71,19 @@ protected void writeString(Schema schema, Object datum, Encoder out)
            writeString(datum, out);
            }

            + @Override
            + protected void writeRecord(Schema schema, Object datum, Encoder out)
            + throws IOException {
            + if (datum instanceof SpecificRecordBase && this.getSpecificData().useCustomCoders()) {
            + SpecificRecordBase d = (SpecificRecordBase) datum;
            + if (d.hasCustomCoders()) { + d.customEncode(out); + return; + }
            + }
            + super.writeRecord(schema, datum, out);
            + }
            +
            @Override
            protected void writeField(Object datum, Schema.Field f, Encoder out,
            Object state) throws IOException { diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java index 1902cbc52..eed41b514 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java @@ -25,6 +25,8 @@ import org.apache.avro.Conversion; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.ResolvingDecoder; +import org.apache.avro.io.Encoder; /** Base class for generated record classes. */ public abstract class SpecificRecordBase @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); }
            +
            + /** Returns true iff an instance supports the {@link #encode} and
            + * {@link #decode} operations. Should only be used by
            + * <code>SpecificDatumReader/Writer</code> to selectively use
            + * {@link #customEncode} and {@link #customDecode} to optimize the (de)serialization of
            + * values. */
            + protected boolean hasCustomCoders() { return false; }

            +
            + protected void customEncode(Encoder out) throws IOException

            { + throw new UnsupportedOperationException(); + }
            +
            + protected void customDecode(ResolvingDecoder in) throws IOException {+ throw new UnsupportedOperationException();+ }

            }
            diff --git a/lang/java/compiler/pom.xml b/lang/java/compiler/pom.xml
            index ee260c7f1..c7cef915f 100644
            — a/lang/java/compiler/pom.xml
            +++ b/lang/java/compiler/pom.xml
            @@ -113,7 +113,57 @@
            </execution>
            </executions>
            </plugin>
            -
            + <plugin>
            + <groupId>org.codehaus.mojo</groupId>
            + <artifactId>exec-maven-plugin</artifactId>
            + <version>1.6.0</version>
            + <executions>
            + <execution>
            + <phase>generate-test-sources</phase>
            + <goals>
            + <goal>exec</goal>
            + </goals>
            + <configuration>
            + <executable>java</executable>
            + <workingDirectory>/tmp</workingDirectory>
            + <classpathScope>test</classpathScope>
            + <arguments>
            + <argument>-classpath</argument>
            + <classpath></classpath>
            + <argument>org.apache.avro.compiler.specific.SchemaTask</argument>
            + <argument>${project.basedir}/src/test/resources/full_record_v1.avsc</argument>
            + <argument>${project.basedir}/src/test/resources/full_record_v2.avsc</argument>
            + <argument>${project.basedir}/target/generated-test-sources</argument>
            + </arguments>
            + </configuration>
            + </execution>
            + </executions>
            + </plugin>
            + <plugin>
            + <groupId>org.codehaus.mojo</groupId>
            + <artifactId>build-helper-maven-plugin</artifactId>
            + <version>3.0.0</version>
            + <executions>
            + <execution>
            + <!--
            + Usually code is generated using a special-purpose maven plugin and the plugin
            + automatically adds the generated sources into project.
            + Here since general-purpose exec plugin is used for generating code, we need to manually
            + add the sources.
            + -->
            + <id>add-source</id>
            + <phase>generate-test-sources</phase>
            + <goals>
            + <goal>add-test-source</goal>
            + </goals>
            + <configuration>
            + <sources>
            + <source>${project.basedir}/target/generated-test-sources</source>
            + </sources>
            + </configuration>
            + </execution>
            + </executions>
            + </plugin>
            </plugins>
            </build>

            diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java
            index 9d2c244d3..89e2f882c 100644
            — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java
            +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java
            @@ -32,5 +32,15 @@ protected void doCompile(File src, File dest) throws IOException

            { compiler.setStringType(getStringType()); compiler.compileToDestination(src, dest); }

            +
            + public static void main(String[] args) throws IOException {
            + if (args.length < 2)

            { + System.err.println("Usage: SchemaTask <schema.avsc>... <output-folder>"); + System.exit(1); + }

            + File dst = new File(args[args.length-1]);
            + for (int i = 0; i < args.length-1; i++)
            + new SchemaTask().doCompile(new File(args[i]), dst);
            + }
            }

            diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            index 575462e73..58c43d094 100644
            — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            @@ -624,9 +624,24 @@ private Schema addStringType(Schema s, Map<Schema,Schema> seen)

            { return result; }
            • private String getStringType(JsonNode overrideClassProperty) {
            • if (overrideClassProperty != null)
            • return overrideClassProperty.getTextValue();
              + /** Utility for template use (and also internal use). Returns
              + * a string giving the FQN of the Java type to be used for a string
              + * schema or for the key of a map schema. (It's an error to call
              + * this on a schema other than a string or map.) */
              + public String getStringType(Schema s) {
              + String prop;
              + switch (s.getType()) { + case MAP: + prop = SpecificData.KEY_CLASS_PROP; + break; + case STRING: + prop = SpecificData.CLASS_PROP; + break; + default: + throw new IllegalArgumentException("Can't check string-type of non-string/map type: " + s); + }

              + JsonNode override = s.getJsonProp(prop);
              + if (override != null) return override.getTextValue();
              switch (stringType) {
              case String: return "java.lang.String";
              case Utf8: return "org.apache.avro.util.Utf8";
              @@ -635,6 +650,17 @@ private String getStringType(JsonNode overrideClassProperty) {
              }
              }

            + /** Utility for template use. Returns true iff a STRING-schema or
            + * the key of a MAP-schema is what SpecificData defines as
            + * "stringable" (which means we need to call toString on it before
            + * before writing it). */
            + public boolean isStringable(Schema schema)

            { + String t = getStringType(schema); + return ! (t.equals("java.lang.String") + || t.equals("java.lang.CharSequence") + || t.equals("org.apache.avro.util.Utf8")); + }

            +
            private static final Schema NULL_SCHEMA = Schema.create(Schema.Type.NULL);

            /** Utility for template use. Returns the java type for a Schema. */
            @@ -659,15 +685,14 @@ private String javaType(Schema schema, boolean checkConvertedLogicalType) {
            return "java.util.List<" + javaType(schema.getElementType()) + ">";
            case MAP:
            return "java.util.Map<"

            • + getStringType(schema.getJsonProp(SpecificData.KEY_CLASS_PROP))+","
            • + javaType(schema.getValueType()) + ">";
              + + getStringType(schema)+ "," + javaType(schema.getValueType()) + ">";
              case UNION:
              List<Schema> types = schema.getTypes(); // elide unions with null
              if ((types.size() == 2) && types.contains(NULL_SCHEMA))
              return javaType(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0));
              return "java.lang.Object";
              case STRING:
            • return getStringType(schema.getJsonProp(SpecificData.CLASS_PROP));
              + return getStringType(schema);
              case BYTES: return "java.nio.ByteBuffer";
              case INT: return "java.lang.Integer";
              case LONG: return "java.lang.Long";
              @@ -708,6 +733,58 @@ public String javaUnbox(Schema schema) {
              }
              }

            +
            + /** Utility for template use. Return a string with a given number
            + * of spaces to be used for indentation purposes. */
            + public String indent(int n)

            { + return new String(new char[n]).replace('\0', ' '); + }

            +
            + /** Utility for template use. For a two-branch union type with
            + * one null branch, returns the index of the null branch. It's an
            + * error to use on anything other than a two-branch union with on
            + * null branch. */
            + public int getNonNullIndex(Schema s)

            { + if (s.getType() != Schema.Type.UNION + || s.getTypes().size() != 2 + || ! s.getTypes().contains(NULL_SCHEMA)) + throw new IllegalArgumentException("Can only be used on 2-branch union with a null branch: " + s); + return (s.getTypes().get(0).equals(NULL_SCHEMA) ? 1 : 0); + }

            +
            + /** Utility for template use. Returns true if the encode/decode
            + * logic in record.vm can handle the schema being presented. */
            + public boolean isCustomCodable(Schema schema)

            { + if (schema.isError()) return false; + return isCustomCodable(schema, new HashSet<Schema>()); + }

            +
            + private boolean isCustomCodable(Schema schema, Set<Schema> seen) {
            + if (! seen.add(schema)) return true;
            + if (schema.getLogicalType() != null) return false;
            + boolean result = true;
            + switch (schema.getType())

            { + case RECORD: + for (Schema.Field f : schema.getFields()) + result &= isCustomCodable(f.schema(), seen); + break; + case MAP: + result = isCustomCodable(schema.getValueType(), seen); + break; + case ARRAY: + result = isCustomCodable(schema.getElementType(), seen); + break; + case UNION: + List<Schema> types = schema.getTypes(); + // Only know how to handle "nulling" unions for now + if (types.size() != 2 || ! types.contains(NULL_SCHEMA)) return false; + for (Schema s : types) result &= isCustomCodable(s, seen); + break; + default: + }

            + return result;
            + }
            +
            public boolean hasLogicalTypeField(Schema schema) {
            for (Schema.Field field : schema.getFields()) {
            if (field.schema().getLogicalType() != null) {
            diff --git a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            index 045c7e1ef..5a59a6ee9 100644
            — a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            +++ b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            @@ -19,7 +19,9 @@
            package $schema.getNamespace();
            #end

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            #if (!$schema.isError())
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            @@ -496,4 +498,294 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or
            READER$.read(this, SpecificData.getDecoder(in));
            }

            +#if ($this.isCustomCodable($schema))
            + @Override protected boolean hasCustomCoders()

            { return true; }
            +
            + @Override protected void customEncode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#encodeVar(0 "this.${n}" $s)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +
            + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in)
            + throws java.io.IOException
            + {
            + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder();
            + for (int i = 0; i < $schema.getFields().size(); i++) {
            + switch (fieldOrder[i].pos()) {
            +#set ($fieldno = 0)
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            + case $fieldno:
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
            +#decodeVar(4 "this.${n}" $s $rs)
            + break;
            +
            +#set ($nv = $maxnv)
            +#set ($fieldno = $fieldno + 1)
            +#end
            + default:
            + throw new java.io.IOException("Corrupt ResolvingDecoder.");
            + }
            + }
            + }
            +#end
            }
            +
            +#macro( encodeVar $indent $var $s )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this encodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#encodeArray($indent $var $s)
            +#elseif ($s.Type.Name.equals("map"))
            +#encodeMap($indent $var $s)
            +#elseif ($s.Type.Name.equals("union"))
            +#encodeUnion($indent $var $s)
            +##### Use the generated "encode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I ${var}.customEncode(out);
            +##### For rest of cases, generate calls out.writeXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I out.writeNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I out.writeBoolean(${var});
            +#elseif ($s.Type.Name.equals("int"))
            +$I out.writeInt(${var});
            +#elseif ($s.Type.Name.equals("long"))
            +$I out.writeLong(${var});
            +#elseif ($s.Type.Name.equals("float"))
            +$I out.writeFloat(${var});
            +#elseif ($s.Type.Name.equals("double"))
            +$I out.writeDouble(${var});
            +#elseif ($s.Type.Name.equals("string"))
            +#if ($this.isStringable($s))
            +$I out.writeString(${var}.toString());
            +#else
            +$I out.writeString(${var});
            +#end
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I out.writeBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I out.writeEnum(${var}.ordinal());
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( encodeArray $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.ElementType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeArrayStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for ($et e${nv}: ${var}) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ElementType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeArrayEnd();
            +$I if (actualSize${nv} != size${nv})
            +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeMap $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeMapStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#if ($this.isStringable($s))
            +$I out.writeString(e${nv}.getKey().toString());
            +#else
            +$I out.writeString(e${nv}.getKey());
            +#end
            +$I $vt v${nv} = e${nv}.getValue();
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ValueType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeMapEnd();
            +$I if (actualSize${nv} != size${nv})
            + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeUnion $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +$I if (${var} == null) {
            +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#{else}0#end);
            +$I out.writeNull();
            +$I } else {
            +$I out.writeIndex(${this.getNonNullIndex($s)});
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s)))
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            +
            +
            +#macro( decodeVar $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this decodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#decodeArray($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("map"))
            +#decodeMap($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("union"))
            +#decodeUnion($indent $var $s $rs)
            +##### Use the generated "decode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I ${var}.customDecode(in);
            +##### For rest of cases, generate calls in.readXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I in.readNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I $var = in.readBoolean();
            +#elseif ($s.Type.Name.equals("int"))
            +$I $var = in.readInt();
            +#elseif ($s.Type.Name.equals("long"))
            +$I $var = in.readLong();
            +#elseif ($s.Type.Name.equals("float"))
            +$I $var = in.readFloat();
            +#elseif ($s.Type.Name.equals("double"))
            +$I $var = in.readDouble();
            +#elseif ($s.Type.Name.equals("string"))
            +#decodeString( "$I" $var $s )
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I $var = in.readBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I $var = ${this.javaType($s)}.values()[in.readEnum()];
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( decodeString $II $var $s )
            +#set ($st = ${this.getStringType($s)})
            +#if ($this.isStringable($s))
            +$II ${var} = new ${st}(in.readString());
            +#elseif ($st.equals("java.lang.String"))
            +$II $var = in.readString();
            +#elseif ($st.equals("org.apache.avro.util.Utf8"))
            +$II $var = in.readString(${var});
            +#else
            +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null);
            +#end
            +#end
            +
            +#macro( decodeArray $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($et = $this.javaType($s.ElementType))
            +#set ($gat = "SpecificData.Array<${et}>")
            +$I long size${nv} = in.readArrayStart();
            +## Need fresh variable name due to limitation of macro system
            +$I $t a${nv} = ${var};
            +$I if (a${nv} == null) {
            +$I a${nv} = new ${gat}((int)size${nv}, ${rs});
            +$I $var = a${nv};
            +$I } else a${nv}.clear();
            +$I $gat ga${nv} = (a${nv} instanceof SpecificData.Array ? (${gat})a${nv} : null);
            +$I for ( ; 0 < size${nv}; size${nv} = in.arrayNext()) {
            +$I for ( ; size${nv} != 0; size${nv}--) {
            +$I $et e${nv} = (ga${nv} != null ? ga${nv}.peek() : null);
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 4)
            +#decodeVar($indent $var $s.ElementType "${rs}.getElementType()")
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 4)
            +#set ($I = $this.indent($indent))
            +$I a${nv}.add(e${nv});
            +$I }
            +$I }
            +#end
            +
            +#macro( decodeMap $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = in.readMapStart();
            +$I $t m${nv} = ${var}; // Need fresh name due to limitation of macro system
            +$I if (m${nv} == null) {
            +$I m${nv} = new java.util.HashMap<${kt},${vt}>((int)size${nv});
            +$I $var = m${nv};
            +$I } else m${nv}.clear();
            +$I for ( ; 0 < size${nv}; size${nv} = in.mapNext()) {
            +$I for ( ; size${nv} != 0; size${nv}--) {
            +$I $kt k${nv} = null;
            +#decodeString( "$I " "k${nv}" $s )
            +$I $vt v${nv} = null;
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 4)
            +#decodeVar($indent $var $s.ValueType "${rs}.getValueType()")
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 4)
            +#set ($I = $this.indent($indent))
            +$I m${nv}.put(k${nv}, v${nv});
            +$I }
            +$I }
            +#end
            +
            +#macro( decodeUnion $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +#set ($si = $this.getNonNullIndex($s))
            +$I if (in.readIndex() != ${si}) {
            +$I in.readNull();
            +$I ${var} = null;
            +$I } else {
            +#set ($indent = $indent + 2)
            +#decodeVar($indent $var $s.Types.get($si) "${rs}.getTypes().get(${si})")
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            diff --git a/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java b/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java
            index b5b3a0ab2..ceae52c12 100644
            — a/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java
            +++ b/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java
            @@ -81,7 +81,7 @@ public void setUp() {
            }

            @After
            - public void tearDow() {
            + public void tearDown() {
            if (this.outputFile != null) { this.outputFile.delete(); }
            @@ -622,8 +622,4 @@ public void testConversionInstanceWithDecimalLogicalTypeEnabled() throws Excepti
            Assert.assertEquals("Should use null for decimal if the flag is off",
            "null", compiler.conversionInstance(uuidSchema));
            }
            -
            - public void testToFromByteBuffer() { - - }
            }
            diff --git a/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java b/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java
            new file mode 100644
            index 000000000..394a5900c
            — /dev/null
            +++ b/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java
            @@ -0,0 +1,93 @@
            +/*
            + * Copyright 2017 The Apache Software Foundation.
            + *
            + * Licensed under the Apache License, Version 2.0 (the "License");
            + * you may not use this file except in compliance with the License.
            + * You may obtain a copy of the License at
            + *
            + * http://www.apache.org/licenses/LICENSE-2.0
            + *
            + * Unless required by applicable law or agreed to in writing, software
            + * distributed under the License is distributed on an "AS IS" BASIS,
            + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
            + * See the License for the specific language governing permissions and
            + * limitations under the License.
            + */
            +
            +package org.apache.avro.specific;
            +
            +import java.io.ByteArrayInputStream;
            +import java.io.ByteArrayOutputStream;
            +import java.io.IOException;
            +import java.nio.ByteBuffer;
            +
            +import org.apache.avro.Schema;
            +import org.apache.avro.io.Encoder;
            +import org.apache.avro.io.EncoderFactory;
            +import org.apache.avro.io.Decoder;
            +import org.apache.avro.io.DecoderFactory;
            +import org.apache.avro.io.DatumReader;
            +import org.apache.avro.io.DatumWriter;
            +import org.apache.avro.util.Utf8;
            +
            +import org.junit.Assert;
            +import org.junit.Before;
            +import org.junit.Test;
            +
            +import org.apache.avro.specific.test.FullRecordV1;
            +import org.apache.avro.specific.test.FullRecordV2;
            +
            +public class TestGeneratedCode {
            +
            + private final static SpecificData MODEL = new SpecificData();
            + private final static Schema V1S = FullRecordV1.getClassSchema();
            + private final static Schema V2S = FullRecordV2.getClassSchema();
            +
            + @Before
            + public void setUp() { + MODEL.setCustomCoders(true); + }
            +
            + @Test
            + public void withoutSchemaMigration() throws IOException { + FullRecordV1 src = new FullRecordV1(true, 87231, 731L, 54.2832F, 38.321, "Hi there", null); + Assert.assertTrue("Test schema must allow for custom coders.", + ((SpecificRecordBase)src).hasCustomCoders()); + + ByteArrayOutputStream out = new ByteArrayOutputStream(1024); + Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); + DatumWriter<FullRecordV1> w = (DatumWriter<FullRecordV1>)MODEL.createDatumWriter(V1S); + w.write(src, e); + e.flush(); + + ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); + Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); + DatumReader<FullRecordV1> r = (DatumReader<FullRecordV1>)MODEL.createDatumReader(V1S); + FullRecordV1 dst = r.read(null, d); + + Assert.assertEquals(src, dst); + }
            +
            + @Test
            + public void withSchemaMigration() throws IOException { + FullRecordV2 src = new FullRecordV2(true, 731, 87231, 38L, 54.2832F, "Hi there", + ByteBuffer.wrap(Utf8.getBytesFor("Hello, world!"))); + Assert.assertTrue("Test schema must allow for custom coders.", + ((SpecificRecordBase)src).hasCustomCoders()); + + ByteArrayOutputStream out = new ByteArrayOutputStream(1024); + Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); + DatumWriter<FullRecordV2> w = (DatumWriter<FullRecordV2>)MODEL.createDatumWriter(V2S); + w.write(src, e); + e.flush(); + + ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); + Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); + DatumReader<FullRecordV1> r = (DatumReader<FullRecordV1>)MODEL.createDatumReader(V2S, V1S); + FullRecordV1 dst = r.read(null, d); + + FullRecordV1 expected = new FullRecordV1(true, 87231, 731L, 54.2832F, 38.0, null, + "Hello, world!"); + Assert.assertEquals(expected, dst); + }
            +}
            diff --git a/lang/java/compiler/src/test/resources/full_record_v1.avsc b/lang/java/compiler/src/test/resources/full_record_v1.avsc
            new file mode 100644
            index 000000000..4e2218875
            — /dev/null
            +++ b/lang/java/compiler/src/test/resources/full_record_v1.avsc
            @@ -0,0 +1,30 @@
            +{
            + "type" : "record",
            + "name" : "FullRecordV1",
            + "doc" : "Test schema changes: this is the 'old' schema the SpecificRecord expects to see",
            + "namespace" : "org.apache.avro.specific.test",
            + "fields" : [ { + "name" : "b", + "type" : "boolean" + }, { + "name" : "i32", + "type" : "int" + }, { + "name" : "i64", + "type" : "long" + }, { + "name" : "f32", + "type" : "float" + }, { + "name" : "f64", + "type" : "double" + }, { + "name" : "s", + "type" : [ "null", "string" ], + "default" : null + }, { + "name" : "h", + "type" : [ "null", "string" ] + } ]
            +}
            +
            diff --git a/lang/java/compiler/src/test/resources/full_record_v2.avsc b/lang/java/compiler/src/test/resources/full_record_v2.avsc
            new file mode 100644
            index 000000000..b80b9b4ae
            — /dev/null
            +++ b/lang/java/compiler/src/test/resources/full_record_v2.avsc
            @@ -0,0 +1,29 @@
            +{
            + "type" : "record",
            + "name" : "FullRecordV2",
            + "doc" : "Test schema changes: this is the 'new' schema actually used to write data",
            + "namespace" : "org.apache.avro.specific.test",
            + "fields" : [ { + "name" : "b", + "type" : "boolean" + }, { + "name" : "i64", + "type" : "int" + }, { + "name" : "i32", + "type" : "int" + }, { + "name" : "f64", + "type" : "long" + }, { + "name" : "f32", + "type" : [ "float", "null" ] + }, { + "name" : "newfield", + "type" : "string" + }, { + "name" : "h", + "type" : "bytes" + } ]
            +}
            +
            diff --git a/lang/java/pom.xml b/lang/java/pom.xml
            index 471c34044..dd2c28533 100644
            — a/lang/java/pom.xml
            +++ b/lang/java/pom.xml
            @@ -195,6 +195,21 @@
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>${surefire-plugin.version}</version>
            + <executions>
            + <execution>
            + <id>test-with-custom-coders</id>
            + <phase>test</phase>
            + <goals>
            + <goal>test</goal>
            + </goals>
            + <configuration>
            + <systemPropertyVariables>
            + <org.apache.avro.specific.use_custom_coders>true</org.apache.avro.specific.use_custom_coders>
            + <test.dir>${project.basedir}/target/</test.dir>
            + </systemPropertyVariables>
            + </configuration>
            + </execution>
            + </executions>
            <configuration>
            <includes>
            <!-- Avro naming convention for JUnit tests -->
            diff --git a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            index 569f42711..26cc31fc0 100644
            — a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            +++ b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            @@ -5,7 +5,9 @@
            */
            package avro.examples.baseball;

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            import org.apache.avro.message.SchemaStore;
            @@ -472,4 +474,80 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); }

            + @Override protected boolean hasCustomCoders() { return true; }

            +
            + @Override protected void customEncode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            + out.writeInt(this.number);
            +
            + out.writeString(this.first_name);
            +
            + out.writeString(this.last_name);
            +
            + long size0 = this.position.size();
            + out.writeArrayStart();
            + out.setItemCount(size0);
            + long actualSize0 = 0;
            + for (avro.examples.baseball.Position e0: this.position)

            { + actualSize0++; + out.startItem(); + out.writeEnum(e0.ordinal()); + }
            + out.writeArrayEnd();
            + if (actualSize0 != size0)
            + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
            +
            + }
            +
            + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in)
            + throws java.io.IOException
            + {
            + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder();
            + for (int i = 0; i < 4; i++) {
            + switch (fieldOrder[i].pos()) {
            + case 0:
            + this.number = in.readInt();
            + break;
            +
            + case 1:
            + this.first_name = in.readString();
            + break;
            +
            + case 2:
            + this.last_name = in.readString();
            + break;
            +
            + case 3:
            + long size0 = in.readArrayStart();
            + java.util.List<avro.examples.baseball.Position> a0 = this.position;
            + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear();
            + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null);
            + for ( ; 0 < size0; size0 = in.arrayNext()) {
            + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + }
            + }
            + break;
            +
            + default:
            + throw new java.io.IOException("Corrupt ResolvingDecoder.");
            + }
            + }
            + }
            }
            +
            +
            +
            +
            +
            +
            +
            +
            +
            +
            diff --git a/lang/java/tools/src/test/compiler/output/Player.java b/lang/java/tools/src/test/compiler/output/Player.java
            index 5bbb3b018..8eaf5d7ad 100644
            — a/lang/java/tools/src/test/compiler/output/Player.java
            +++ b/lang/java/tools/src/test/compiler/output/Player.java
            @@ -5,7 +5,9 @@
            */
            package avro.examples.baseball;

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            import org.apache.avro.message.SchemaStore;
            @@ -472,4 +474,80 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); }

            + @Override protected boolean hasCustomCoders() { return true; }
            +
            + @Override protected void customEncode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            + out.writeInt(this.number);
            +
            + out.writeString(this.first_name);
            +
            + out.writeString(this.last_name);
            +
            + long size0 = this.position.size();
            + out.writeArrayStart();
            + out.setItemCount(size0);
            + long actualSize0 = 0;
            + for (avro.examples.baseball.Position e0: this.position) {+ actualSize0++;+ out.startItem();+ out.writeEnum(e0.ordinal());+ }

            + out.writeArrayEnd();
            + if (actualSize0 != size0)
            + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
            +
            + }
            +
            + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in)
            + throws java.io.IOException
            + {
            + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder();
            + for (int i = 0; i < 4; i++) {
            + switch (fieldOrder[i].pos()) {
            + case 0:
            + this.number = in.readInt();
            + break;
            +
            + case 1:
            + this.first_name = in.readString(this.first_name instanceof Utf8 ? (Utf8)this.first_name : null);
            + break;
            +
            + case 2:
            + this.last_name = in.readString(this.last_name instanceof Utf8 ? (Utf8)this.last_name : null);
            + break;
            +
            + case 3:
            + long size0 = in.readArrayStart();
            + java.util.List<avro.examples.baseball.Position> a0 = this.position;
            + if (a0 == null)

            { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + }

            else a0.clear();
            + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null);
            + for ( ; 0 < size0; size0 = in.arrayNext()) {
            + for ( ; size0 != 0; size0--)

            { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + }

            + }
            + break;
            +
            + default:
            + throw new java.io.IOException("Corrupt ResolvingDecoder.");
            + }
            + }
            + }
            }
            +
            +
            +
            +
            +
            +
            +
            +
            +
            +

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting closed pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/doc/src/content/xdocs/gettingstartedjava.xml b/doc/src/content/xdocs/gettingstartedjava.xml index fe6c7d284..7f331e347 100644 — a/doc/src/content/xdocs/gettingstartedjava.xml +++ b/doc/src/content/xdocs/gettingstartedjava.xml @@ -319,6 +319,43 @@ $ mvn compile # includes code generation via Avro Maven plugin $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain </source> </section> + <section> + <title>Beta feature: Generating faster code</title> + <p> + In this release we have introduced a new approach to + generating code that speeds up decoding of objects by more + than 10% and encoding by more than 30% (future performance + enhancements are underway). To ensure a smooth introduction + of this change into production systems, this feature is + controlled by a feature flag, the system + property <code>org.apache.avro.specific.use_custom_coders</code>. + In this first release, this feature is off by default. To + turn it on, set the system flag to <code>true</code> at + runtime. In the sample above, for example, you could enable + the fater coders as follows: + </p> + <source> +$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \ + -Dorg.apache.avro.specific.use_custom_coders=true + </source> + <p> + Note that you do <em>not</em> have to recompile your Avro + schema to have access to this feature. The feature is + compiled and built into your code, and you turn it on and + off at runtime using the feature flag. As a result, you can + turn it on during testing, for example, and then off in + production. Or you can turn it on in production, and + quickly turn it off if something breaks. + </p> + <p> + We encourage the Avro community to exercise this new feature + early to help build confidence. (For those paying + one-demand for compute resources in the cloud, it can lead + to meaningful cost savings.) As confidence builds, we will + turn this feature on by default, and eventually eliminate + the feature flag (and the old code). + </p> + </section> </section> <section> diff --git a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java index cb9a82b48..073ca27b0 100644 — a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java +++ b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java @@ -116,9 +116,7 @@ public static Object resolve(Schema writer, Schema reader) the above loop will always be correct. * Throws a runtime exception if we're not just about to read the * field of a record. Also, this method will consume the field * information, and thus may only be called <em>once</em> before * reading the field value. (However, if the client knows the + * first field of a record. (If the client knows the order of incoming fields, then the client does <em>not</em> need to call this method but rather can just start reading the field values.) diff --git a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java index 1c179007a..79558ba47 100644 a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java @@ -66,6 +66,9 @@ /** Utilities to use existing Java classes and interfaces via reflection. */ public class ReflectData extends SpecificData { + @Override + public boolean useCustomCoders() { return false; } + /** {@link ReflectData} implementation that permits null field values. The * schema generated for each field is a union of its declared type and * null. */ diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java index 44de5c434..60d43dcf0 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java @@ -122,6 +122,22 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + + /** Retrieve the current value of the custom-coders feature flag. + * Defaults to <code>true</code>, but this default can be overriden + * using the system property + * <code>org.apache.avro.specific.use_custom_coders</code>, and can + * be set dynamically by {@link useCustomCoders}. See <a + * href="https://avro.apache.org/docs/current/gettingstartedjava.html#Beta+feature:+Generating+faster+code"Getting started with Java</a> for more about this + * feature flag. */ + public boolean useCustomCoders() { return useCustomCoderFlag; } + + /** Dynamically set the value of the custom-coder feature flag. + * See {@link useCustomCoders}. */ + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + @Override protected boolean isEnum(Object datum) { return datum instanceof Enum || super.isEnum(datum); diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java index 29c989b78..ccf8107ac 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + if (data.useCustomCoders()) { + old = data.newRecord(old, expected); + if (old instanceof SpecificRecordBase) { + SpecificRecordBase d = (SpecificRecordBase) old; + if (d.hasCustomCoders()) { + d.customDecode(in); + return d; + } + } + } + return super.readRecord(old, expected, in); + } + @Override protected void readField(Object r, Schema.Field f, Object oldDatum, ResolvingDecoder in, Object state) diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java index 1204f4955..3d5e7ff4f 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java @@ -71,6 +71,19 @@ protected void writeString(Schema schema, Object datum, Encoder out) writeString(datum, out); } + @Override + protected void writeRecord(Schema schema, Object datum, Encoder out) + throws IOException { + if (datum instanceof SpecificRecordBase && this.getSpecificData().useCustomCoders()) { + SpecificRecordBase d = (SpecificRecordBase) datum; + if (d.hasCustomCoders()) { + d.customEncode(out); + return; + } + } + super.writeRecord(schema, datum, out); + } + @Override protected void writeField(Object datum, Schema.Field f, Encoder out, Object state) throws IOException { diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java index 1902cbc52..eed41b514 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java @@ -25,6 +25,8 @@ import org.apache.avro.Conversion; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.ResolvingDecoder; +import org.apache.avro.io.Encoder; /** Base class for generated record classes. */ public abstract class SpecificRecordBase @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); } + + /** Returns true iff an instance supports the {@link #encode} and + * {@link #decode} operations. Should only be used by + * <code>SpecificDatumReader/Writer</code> to selectively use + * {@link #customEncode} and {@link #customDecode} to optimize the (de)serialization of + * values. */ + protected boolean hasCustomCoders() { return false; } + + protected void customEncode(Encoder out) throws IOException { + throw new UnsupportedOperationException(); + } + + protected void customDecode(ResolvingDecoder in) throws IOException {+ throw new UnsupportedOperationException();+ } } diff --git a/lang/java/compiler/pom.xml b/lang/java/compiler/pom.xml index ee260c7f1..c7cef915f 100644 — a/lang/java/compiler/pom.xml +++ b/lang/java/compiler/pom.xml @@ -113,7 +113,57 @@ </execution> </executions> </plugin> - + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>exec-maven-plugin</artifactId> + <version>1.6.0</version> + <executions> + <execution> + <phase>generate-test-sources</phase> + <goals> + <goal>exec</goal> + </goals> + <configuration> + <executable>java</executable> + <workingDirectory>/tmp</workingDirectory> + <classpathScope>test</classpathScope> + <arguments> + <argument>-classpath</argument> + <classpath></classpath> + <argument>org.apache.avro.compiler.specific.SchemaTask</argument> + <argument>${project.basedir}/src/test/resources/full_record_v1.avsc</argument> + <argument>${project.basedir}/src/test/resources/full_record_v2.avsc</argument> + <argument>${project.basedir}/target/generated-test-sources</argument> + </arguments> + </configuration> + </execution> + </executions> + </plugin> + <plugin> + <groupId>org.codehaus.mojo</groupId> + <artifactId>build-helper-maven-plugin</artifactId> + <version>3.0.0</version> + <executions> + <execution> + <!-- + Usually code is generated using a special-purpose maven plugin and the plugin + automatically adds the generated sources into project. + Here since general-purpose exec plugin is used for generating code, we need to manually + add the sources. + --> + <id>add-source</id> + <phase>generate-test-sources</phase> + <goals> + <goal>add-test-source</goal> + </goals> + <configuration> + <sources> + <source>${project.basedir}/target/generated-test-sources</source> + </sources> + </configuration> + </execution> + </executions> + </plugin> </plugins> </build> diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java index 9d2c244d3..89e2f882c 100644 — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java @@ -32,5 +32,15 @@ protected void doCompile(File src, File dest) throws IOException { compiler.setStringType(getStringType()); compiler.compileToDestination(src, dest); } + + public static void main(String[] args) throws IOException { + if (args.length < 2) { + System.err.println("Usage: SchemaTask <schema.avsc>... <output-folder>"); + System.exit(1); + } + File dst = new File(args [args.length-1] ); + for (int i = 0; i < args.length-1; i++) + new SchemaTask().doCompile(new File(args [i] ), dst); + } } diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java index 575462e73..58c43d094 100644 — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java @@ -624,9 +624,24 @@ private Schema addStringType(Schema s, Map<Schema,Schema> seen) { return result; } private String getStringType(JsonNode overrideClassProperty) { if (overrideClassProperty != null) return overrideClassProperty.getTextValue(); + /** Utility for template use (and also internal use). Returns + * a string giving the FQN of the Java type to be used for a string + * schema or for the key of a map schema. (It's an error to call + * this on a schema other than a string or map.) */ + public String getStringType(Schema s) { + String prop; + switch (s.getType()) { + case MAP: + prop = SpecificData.KEY_CLASS_PROP; + break; + case STRING: + prop = SpecificData.CLASS_PROP; + break; + default: + throw new IllegalArgumentException("Can't check string-type of non-string/map type: " + s); + } + JsonNode override = s.getJsonProp(prop); + if (override != null) return override.getTextValue(); switch (stringType) { case String: return "java.lang.String"; case Utf8: return "org.apache.avro.util.Utf8"; @@ -635,6 +650,17 @@ private String getStringType(JsonNode overrideClassProperty) { } } + /** Utility for template use. Returns true iff a STRING-schema or + * the key of a MAP-schema is what SpecificData defines as + * "stringable" (which means we need to call toString on it before + * before writing it). */ + public boolean isStringable(Schema schema) { + String t = getStringType(schema); + return ! (t.equals("java.lang.String") + || t.equals("java.lang.CharSequence") + || t.equals("org.apache.avro.util.Utf8")); + } + private static final Schema NULL_SCHEMA = Schema.create(Schema.Type.NULL); /** Utility for template use. Returns the java type for a Schema. */ @@ -659,15 +685,14 @@ private String javaType(Schema schema, boolean checkConvertedLogicalType) { return "java.util.List<" + javaType(schema.getElementType()) + ">"; case MAP: return "java.util.Map<" + getStringType(schema.getJsonProp(SpecificData.KEY_CLASS_PROP))+"," + javaType(schema.getValueType()) + ">"; + + getStringType(schema)+ "," + javaType(schema.getValueType()) + ">"; case UNION: List<Schema> types = schema.getTypes(); // elide unions with null if ((types.size() == 2) && types.contains(NULL_SCHEMA)) return javaType(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0)); return "java.lang.Object"; case STRING: return getStringType(schema.getJsonProp(SpecificData.CLASS_PROP)); + return getStringType(schema); case BYTES: return "java.nio.ByteBuffer"; case INT: return "java.lang.Integer"; case LONG: return "java.lang.Long"; @@ -708,6 +733,58 @@ public String javaUnbox(Schema schema) { } } + + /** Utility for template use. Return a string with a given number + * of spaces to be used for indentation purposes. */ + public String indent(int n) { + return new String(new char[n]).replace('\0', ' '); + } + + /** Utility for template use. For a two-branch union type with + * one null branch, returns the index of the null branch. It's an + * error to use on anything other than a two-branch union with on + * null branch. */ + public int getNonNullIndex(Schema s) { + if (s.getType() != Schema.Type.UNION + || s.getTypes().size() != 2 + || ! s.getTypes().contains(NULL_SCHEMA)) + throw new IllegalArgumentException("Can only be used on 2-branch union with a null branch: " + s); + return (s.getTypes().get(0).equals(NULL_SCHEMA) ? 1 : 0); + } + + /** Utility for template use. Returns true if the encode/decode + * logic in record.vm can handle the schema being presented. */ + public boolean isCustomCodable(Schema schema) { + if (schema.isError()) return false; + return isCustomCodable(schema, new HashSet<Schema>()); + } + + private boolean isCustomCodable(Schema schema, Set<Schema> seen) { + if (! seen.add(schema)) return true; + if (schema.getLogicalType() != null) return false; + boolean result = true; + switch (schema.getType()) { + case RECORD: + for (Schema.Field f : schema.getFields()) + result &= isCustomCodable(f.schema(), seen); + break; + case MAP: + result = isCustomCodable(schema.getValueType(), seen); + break; + case ARRAY: + result = isCustomCodable(schema.getElementType(), seen); + break; + case UNION: + List<Schema> types = schema.getTypes(); + // Only know how to handle "nulling" unions for now + if (types.size() != 2 || ! types.contains(NULL_SCHEMA)) return false; + for (Schema s : types) result &= isCustomCodable(s, seen); + break; + default: + } + return result; + } + public boolean hasLogicalTypeField(Schema schema) { for (Schema.Field field : schema.getFields()) { if (field.schema().getLogicalType() != null) { diff --git a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm index 045c7e1ef..5a59a6ee9 100644 — a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm +++ b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm @@ -19,7 +19,9 @@ package $schema.getNamespace(); #end +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; #if (!$schema.isError()) import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; @@ -496,4 +498,294 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or READER$.read(this, SpecificData.getDecoder(in)); } +#if ($this.isCustomCodable($schema)) + @Override protected boolean hasCustomCoders() { return true; } + + @Override protected void customEncode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#encodeVar(0 "this.${n}" $s) + +#set ($nv = $maxnv) +#end + } + + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in) + throws java.io.IOException + { + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder(); + for (int i = 0; i < $schema.getFields().size(); i++) { + switch (fieldOrder [i] .pos()) { +#set ($fieldno = 0) +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) + case $fieldno: +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#set ($rs = "SCHEMA$.getField(""${n}"").schema()") +#decodeVar(4 "this.${n}" $s $rs) + break; + +#set ($nv = $maxnv) +#set ($fieldno = $fieldno + 1) +#end + default: + throw new java.io.IOException("Corrupt ResolvingDecoder."); + } + } + } +#end } + +#macro( encodeVar $indent $var $s ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this encodeVar macro: +#if ($s.Type.Name.equals("array")) +#encodeArray($indent $var $s) +#elseif ($s.Type.Name.equals("map")) +#encodeMap($indent $var $s) +#elseif ($s.Type.Name.equals("union")) +#encodeUnion($indent $var $s) +##### Use the generated "encode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I ${var}.customEncode(out); +##### For rest of cases, generate calls out.writeXYZ: +#elseif ($s.Type.Name.equals("null")) +$I out.writeNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I out.writeBoolean(${var}); +#elseif ($s.Type.Name.equals("int")) +$I out.writeInt(${var}); +#elseif ($s.Type.Name.equals("long")) +$I out.writeLong(${var}); +#elseif ($s.Type.Name.equals("float")) +$I out.writeFloat(${var}); +#elseif ($s.Type.Name.equals("double")) +$I out.writeDouble(${var}); +#elseif ($s.Type.Name.equals("string")) +#if ($this.isStringable($s)) +$I out.writeString(${var}.toString()); +#else +$I out.writeString(${var}); +#end +#elseif ($s.Type.Name.equals("bytes")) +$I out.writeBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I out.writeEnum(${var}.ordinal()); +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( encodeArray $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.ElementType)) +$I long size${nv} = ${var}.size(); +$I out.writeArrayStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for ($et e${nv}: ${var}) { $I actualSize${nv} +; +$I out.startItem(); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ElementType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeArrayEnd(); +$I if (actualSize${nv} != size${nv}) +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeMap $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = ${var}.size(); +$I out.writeMapStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) { $I actualSize${nv} +; +$I out.startItem(); +#if ($this.isStringable($s)) +$I out.writeString(e${nv}.getKey().toString()); +#else +$I out.writeString(e${nv}.getKey()); +#end +$I $vt v${nv} = e${nv}.getValue(); +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ValueType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeMapEnd(); +$I if (actualSize${nv} != size${nv}) + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeUnion $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +$I if (${var} == null) { +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#{else}0#end); +$I out.writeNull(); +$I } else { +$I out.writeIndex(${this.getNonNullIndex($s)}); +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s))) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end + + +#macro( decodeVar $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this decodeVar macro: +#if ($s.Type.Name.equals("array")) +#decodeArray($indent $var $s $rs) +#elseif ($s.Type.Name.equals("map")) +#decodeMap($indent $var $s $rs) +#elseif ($s.Type.Name.equals("union")) +#decodeUnion($indent $var $s $rs) +##### Use the generated "decode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I ${var}.customDecode(in); +##### For rest of cases, generate calls in.readXYZ: +#elseif ($s.Type.Name.equals("null")) +$I in.readNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I $var = in.readBoolean(); +#elseif ($s.Type.Name.equals("int")) +$I $var = in.readInt(); +#elseif ($s.Type.Name.equals("long")) +$I $var = in.readLong(); +#elseif ($s.Type.Name.equals("float")) +$I $var = in.readFloat(); +#elseif ($s.Type.Name.equals("double")) +$I $var = in.readDouble(); +#elseif ($s.Type.Name.equals("string")) +#decodeString( "$I" $var $s ) +#elseif ($s.Type.Name.equals("bytes")) +$I $var = in.readBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I $var = ${this.javaType($s)}.values() [in.readEnum()] ; +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( decodeString $II $var $s ) +#set ($st = ${this.getStringType($s)}) +#if ($this.isStringable($s)) +$II ${var} = new ${st}(in.readString()); +#elseif ($st.equals("java.lang.String")) +$II $var = in.readString(); +#elseif ($st.equals("org.apache.avro.util.Utf8")) +$II $var = in.readString(${var}); +#else +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null); +#end +#end + +#macro( decodeArray $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($et = $this.javaType($s.ElementType)) +#set ($gat = "SpecificData.Array<${et}>") +$I long size${nv} = in.readArrayStart(); +## Need fresh variable name due to limitation of macro system +$I $t a${nv} = ${var}; +$I if (a${nv} == null) { +$I a${nv} = new ${gat}((int)size${nv}, ${rs}); +$I $var = a${nv}; +$I } else a${nv}.clear(); +$I $gat ga${nv} = (a${nv} instanceof SpecificData.Array ? (${gat})a${nv} : null); +$I for ( ; 0 < size${nv}; size${nv} = in.arrayNext()) { +$I for ( ; size${nv} != 0; size${nv}--) { +$I $et e${nv} = (ga${nv} != null ? ga${nv}.peek() : null); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 4) +#decodeVar($indent $var $s.ElementType "${rs}.getElementType()") +#set ($nv = $nv - 1) +#set ($indent = $indent - 4) +#set ($I = $this.indent($indent)) +$I a${nv}.add(e${nv}); +$I } +$I } +#end + +#macro( decodeMap $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = in.readMapStart(); +$I $t m${nv} = ${var}; // Need fresh name due to limitation of macro system +$I if (m${nv} == null) { +$I m${nv} = new java.util.HashMap<${kt},${vt}>((int)size${nv}); +$I $var = m${nv}; +$I } else m${nv}.clear(); +$I for ( ; 0 < size${nv}; size${nv} = in.mapNext()) { +$I for ( ; size${nv} != 0; size${nv}--) { +$I $kt k${nv} = null; +#decodeString( "$I " "k${nv}" $s ) +$I $vt v${nv} = null; +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 4) +#decodeVar($indent $var $s.ValueType "${rs}.getValueType()") +#set ($nv = $nv - 1) +#set ($indent = $indent - 4) +#set ($I = $this.indent($indent)) +$I m${nv}.put(k${nv}, v${nv}); +$I } +$I } +#end + +#macro( decodeUnion $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +#set ($si = $this.getNonNullIndex($s)) +$I if (in.readIndex() != ${si}) { +$I in.readNull(); +$I ${var} = null; +$I } else { +#set ($indent = $indent + 2) +#decodeVar($indent $var $s.Types.get($si) "${rs}.getTypes().get(${si})") +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end diff --git a/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java b/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java index b5b3a0ab2..ceae52c12 100644 — a/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java +++ b/lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java @@ -81,7 +81,7 @@ public void setUp() { } @After - public void tearDow() { + public void tearDown() { if (this.outputFile != null) { this.outputFile.delete(); } @@ -622,8 +622,4 @@ public void testConversionInstanceWithDecimalLogicalTypeEnabled() throws Excepti Assert.assertEquals("Should use null for decimal if the flag is off", "null", compiler.conversionInstance(uuidSchema)); } - - public void testToFromByteBuffer() { - - } } diff --git a/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java b/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java new file mode 100644 index 000000000..394a5900c — /dev/null +++ b/lang/java/compiler/src/test/java/org/apache/avro/specific/TestGeneratedCode.java @@ -0,0 +1,93 @@ +/* + * Copyright 2017 The Apache Software Foundation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.avro.specific; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; + +import org.apache.avro.Schema; +import org.apache.avro.io.Encoder; +import org.apache.avro.io.EncoderFactory; +import org.apache.avro.io.Decoder; +import org.apache.avro.io.DecoderFactory; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.DatumWriter; +import org.apache.avro.util.Utf8; + +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.avro.specific.test.FullRecordV1; +import org.apache.avro.specific.test.FullRecordV2; + +public class TestGeneratedCode { + + private final static SpecificData MODEL = new SpecificData(); + private final static Schema V1S = FullRecordV1.getClassSchema(); + private final static Schema V2S = FullRecordV2.getClassSchema(); + + @Before + public void setUp() { + MODEL.setCustomCoders(true); + } + + @Test + public void withoutSchemaMigration() throws IOException { + FullRecordV1 src = new FullRecordV1(true, 87231, 731L, 54.2832F, 38.321, "Hi there", null); + Assert.assertTrue("Test schema must allow for custom coders.", + ((SpecificRecordBase)src).hasCustomCoders()); + + ByteArrayOutputStream out = new ByteArrayOutputStream(1024); + Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); + DatumWriter<FullRecordV1> w = (DatumWriter<FullRecordV1>)MODEL.createDatumWriter(V1S); + w.write(src, e); + e.flush(); + + ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); + Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); + DatumReader<FullRecordV1> r = (DatumReader<FullRecordV1>)MODEL.createDatumReader(V1S); + FullRecordV1 dst = r.read(null, d); + + Assert.assertEquals(src, dst); + } + + @Test + public void withSchemaMigration() throws IOException { + FullRecordV2 src = new FullRecordV2(true, 731, 87231, 38L, 54.2832F, "Hi there", + ByteBuffer.wrap(Utf8.getBytesFor("Hello, world!"))); + Assert.assertTrue("Test schema must allow for custom coders.", + ((SpecificRecordBase)src).hasCustomCoders()); + + ByteArrayOutputStream out = new ByteArrayOutputStream(1024); + Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); + DatumWriter<FullRecordV2> w = (DatumWriter<FullRecordV2>)MODEL.createDatumWriter(V2S); + w.write(src, e); + e.flush(); + + ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); + Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); + DatumReader<FullRecordV1> r = (DatumReader<FullRecordV1>)MODEL.createDatumReader(V2S, V1S); + FullRecordV1 dst = r.read(null, d); + + FullRecordV1 expected = new FullRecordV1(true, 87231, 731L, 54.2832F, 38.0, null, + "Hello, world!"); + Assert.assertEquals(expected, dst); + } +} diff --git a/lang/java/compiler/src/test/resources/full_record_v1.avsc b/lang/java/compiler/src/test/resources/full_record_v1.avsc new file mode 100644 index 000000000..4e2218875 — /dev/null +++ b/lang/java/compiler/src/test/resources/full_record_v1.avsc @@ -0,0 +1,30 @@ +{ + "type" : "record", + "name" : "FullRecordV1", + "doc" : "Test schema changes: this is the 'old' schema the SpecificRecord expects to see", + "namespace" : "org.apache.avro.specific.test", + "fields" : [ { + "name" : "b", + "type" : "boolean" + }, { + "name" : "i32", + "type" : "int" + }, { + "name" : "i64", + "type" : "long" + }, { + "name" : "f32", + "type" : "float" + }, { + "name" : "f64", + "type" : "double" + }, { + "name" : "s", + "type" : [ "null", "string" ], + "default" : null + }, { + "name" : "h", + "type" : [ "null", "string" ] + } ] +} + diff --git a/lang/java/compiler/src/test/resources/full_record_v2.avsc b/lang/java/compiler/src/test/resources/full_record_v2.avsc new file mode 100644 index 000000000..b80b9b4ae — /dev/null +++ b/lang/java/compiler/src/test/resources/full_record_v2.avsc @@ -0,0 +1,29 @@ +{ + "type" : "record", + "name" : "FullRecordV2", + "doc" : "Test schema changes: this is the 'new' schema actually used to write data", + "namespace" : "org.apache.avro.specific.test", + "fields" : [ { + "name" : "b", + "type" : "boolean" + }, { + "name" : "i64", + "type" : "int" + }, { + "name" : "i32", + "type" : "int" + }, { + "name" : "f64", + "type" : "long" + }, { + "name" : "f32", + "type" : [ "float", "null" ] + }, { + "name" : "newfield", + "type" : "string" + }, { + "name" : "h", + "type" : "bytes" + } ] +} + diff --git a/lang/java/pom.xml b/lang/java/pom.xml index 471c34044..dd2c28533 100644 — a/lang/java/pom.xml +++ b/lang/java/pom.xml @@ -195,6 +195,21 @@ <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>${surefire-plugin.version}</version> + <executions> + <execution> + <id>test-with-custom-coders</id> + <phase>test</phase> + <goals> + <goal>test</goal> + </goals> + <configuration> + <systemPropertyVariables> + <org.apache.avro.specific.use_custom_coders>true</org.apache.avro.specific.use_custom_coders> + <test.dir>${project.basedir}/target/</test.dir> + </systemPropertyVariables> + </configuration> + </execution> + </executions> <configuration> <includes> <!-- Avro naming convention for JUnit tests --> diff --git a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java index 569f42711..26cc31fc0 100644 — a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java +++ b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java @@ -5,7 +5,9 @@ */ package avro.examples.baseball; +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import org.apache.avro.message.SchemaStore; @@ -472,4 +474,80 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); } + @Override protected boolean hasCustomCoders() { return true; } + + @Override protected void customEncode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { + out.writeInt(this.number); + + out.writeString(this.first_name); + + out.writeString(this.last_name); + + long size0 = this.position.size(); + out.writeArrayStart(); + out.setItemCount(size0); + long actualSize0 = 0; + for (avro.examples.baseball.Position e0: this.position) { + actualSize0++; + out.startItem(); + out.writeEnum(e0.ordinal()); + } + out.writeArrayEnd(); + if (actualSize0 != size0) + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + "."); + + } + + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in) + throws java.io.IOException + { + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder(); + for (int i = 0; i < 4; i++) { + switch (fieldOrder [i] .pos()) { + case 0: + this.number = in.readInt(); + break; + + case 1: + this.first_name = in.readString(); + break; + + case 2: + this.last_name = in.readString(); + break; + + case 3: + long size0 = in.readArrayStart(); + java.util.List<avro.examples.baseball.Position> a0 = this.position; + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear(); + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null); + for ( ; 0 < size0; size0 = in.arrayNext()) { + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + } + } + break; + + default: + throw new java.io.IOException("Corrupt ResolvingDecoder."); + } + } + } } + + + + + + + + + + diff --git a/lang/java/tools/src/test/compiler/output/Player.java b/lang/java/tools/src/test/compiler/output/Player.java index 5bbb3b018..8eaf5d7ad 100644 — a/lang/java/tools/src/test/compiler/output/Player.java +++ b/lang/java/tools/src/test/compiler/output/Player.java @@ -5,7 +5,9 @@ */ package avro.examples.baseball; +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import org.apache.avro.message.SchemaStore; @@ -472,4 +474,80 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); } + @Override protected boolean hasCustomCoders() { return true; } + + @Override protected void customEncode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { + out.writeInt(this.number); + + out.writeString(this.first_name); + + out.writeString(this.last_name); + + long size0 = this.position.size(); + out.writeArrayStart(); + out.setItemCount(size0); + long actualSize0 = 0; + for (avro.examples.baseball.Position e0: this.position) {+ actualSize0++;+ out.startItem();+ out.writeEnum(e0.ordinal());+ } + out.writeArrayEnd(); + if (actualSize0 != size0) + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + "."); + + } + + @Override protected void customDecode(org.apache.avro.io.ResolvingDecoder in) + throws java.io.IOException + { + org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrder(); + for (int i = 0; i < 4; i++) { + switch (fieldOrder [i] .pos()) { + case 0: + this.number = in.readInt(); + break; + + case 1: + this.first_name = in.readString(this.first_name instanceof Utf8 ? (Utf8)this.first_name : null); + break; + + case 2: + this.last_name = in.readString(this.last_name instanceof Utf8 ? (Utf8)this.last_name : null); + break; + + case 3: + long size0 = in.readArrayStart(); + java.util.List<avro.examples.baseball.Position> a0 = this.position; + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear(); + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null); + for ( ; 0 < size0; size0 = in.arrayNext()) { + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + } + } + break; + + default: + throw new java.io.IOException("Corrupt ResolvingDecoder."); + } + } + } } + + + + + + + + + + ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            cutting Doug Cutting added a comment -

            Looks reasonable to me.  Anyone object to me committing this soon?

            cutting Doug Cutting added a comment - Looks reasonable to me.  Anyone object to me committing this soon?
            raymie Raymie Stata added a comment -

            Any more feedback on this patch?

            raymie Raymie Stata added a comment - Any more feedback on this patch?
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226786418

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return useCustomCoderFlag; }

            + public void setCustomCoders(boolean flag)

            { useCustomCoderFlag = flag; }

            +

            Review comment:
            I just pushed a change that added the missing Javadoc, and also updated the Getting Started Guide for Java to include a subsection on the new feature flag and how to use it. Finally, since we point to JIRA tickets as our release notes, I updated the description of AVRO-2090 to be more informative to users looking to see what's new in Avro, and also included there a pointer to the more detailed documentation in the Getting Started Guide.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226786418 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return useCustomCoderFlag; } + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + Review comment: I just pushed a change that added the missing Javadoc, and also updated the Getting Started Guide for Java to include a subsection on the new feature flag and how to use it. Finally, since we point to JIRA tickets as our release notes, I updated the description of AVRO-2090 to be more informative to users looking to see what's new in Avro, and also included there a pointer to the more detailed documentation in the Getting Started Guide. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226734720

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return useCustomCoderFlag; }

            + public void setCustomCoders(boolean flag)

            { useCustomCoderFlag = flag; }

            +

            Review comment:
            Regarding piecemeal: yes, it would definitely be valuable to have a finite list of todo's here rather than an indefinitely growing one

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226734720 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return useCustomCoderFlag; } + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + Review comment: Regarding piecemeal: yes, it would definitely be valuable to have a finite list of todo's here rather than an indefinitely growing one ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226734142

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return useCustomCoderFlag; }

            + public void setCustomCoders(boolean flag)

            { useCustomCoderFlag = flag; }

            +

            Review comment:
            Regarding documentation: there is a larger issue here that this feature flag isn't documented. I'll add the suggested Javadoc but also look for other places to document this new feature and how Avro users can enable it.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226734142 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return useCustomCoderFlag; } + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + Review comment: Regarding documentation: there is a larger issue here that this feature flag isn't documented. I'll add the suggested Javadoc but also look for other places to document this new feature and how Avro users can enable it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226733860

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return useCustomCoderFlag; }

            + public void setCustomCoders(boolean flag)

            { useCustomCoderFlag = flag; }

            +

            Review comment:
            Regarding compiler options: the controlability here isn't intended to be a long-term option, it's intended to be a "feature flag" that allows more risk-seeking users to use the feature early and help find corner-case bugs. After a release or two, the flag should go away and this should be the only way to generate code. As a result, this feature flag is more appropriate as a runtime option (as it is now), vs. a compile-time option. For example, imagine the scenario where the custom coders look really good during testing, they get shipped to production, and a bug is tickled. In this scenario, the ops team will want to turn off the custom coders with a runtime flag, rather than wait for the system to be recompiled with a change to the compile-time flag.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226733860 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return useCustomCoderFlag; } + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + Review comment: Regarding compiler options: the controlability here isn't intended to be a long-term option, it's intended to be a "feature flag" that allows more risk-seeking users to use the feature early and help find corner-case bugs. After a release or two, the flag should go away and this should be the only way to generate code. As a result, this feature flag is more appropriate as a runtime option (as it is now), vs. a compile-time option. For example, imagine the scenario where the custom coders look really good during testing, they get shipped to production, and a bug is tickled. In this scenario, the ops team will want to turn off the custom coders with a runtime flag, rather than wait for the system to be recompiled with a change to the compile-time flag. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226716217

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private boolean useCustomCoderFlag
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return useCustomCoderFlag; }

            + public void setCustomCoders(boolean flag)

            { useCustomCoderFlag = flag; }

            +

            Review comment:
            These new public methods need javadoc.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226716217 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,11 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private boolean useCustomCoderFlag + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return useCustomCoderFlag; } + public void setCustomCoders(boolean flag) { useCustomCoderFlag = flag; } + Review comment: These new public methods need javadoc. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-431334712

            I pushed a patch to fix the problem Doug found. I think this is ready to go...

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-431334712 I pushed a patch to fix the problem Doug found. I think this is ready to go... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#issuecomment-430901185

            I will make the change to SpecificDatumReader as discussed with Doug. I'd like to batch this change with a larger set of changes based on further feedback on this work.

            More feedback?

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#issuecomment-430901185 I will make the change to SpecificDatumReader as discussed with Doug. I'd like to batch this change with a larger set of changes based on further feedback on this work. More feedback? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            raymie Raymie Stata added a comment - - edited

            I've attached my two runs of Perf.java combined into a single file (perf-data.txt).  The first four columns of numbers in this file are the results with custom-encoders turned off; the next four columns are the results with custom-encoders on.

            For the two SpecificRecord cases: On my machine, FooBarSpecificRecordTestWrite improved 36% (from 3577 ms to 2296 ms), while FooBarSpecificRecordTestRead improved 12% (4728 ms to 4130 ms).  It's not surprising that the read case improved less: the overhead of accommodating schema migration is high.  I have some ideas on how improve performance even more, esp. for the read case.  That said, a >10% improvement is not bad, and 36% improvement is quite good, so I suggest we commit this change as-is and save further improvements to future patches.

            (Thiru points out that FooBarSpecificRecord a very small class that probably understates the performance-improvements of this patch.  In our work at Aqfer, we've seen larger improvements.)

            raymie Raymie Stata added a comment - - edited I've attached my two runs of Perf.java combined into a single file ( perf-data.txt ).  The first four columns of numbers in this file are the results with custom-encoders turned off; the next four columns are the results with custom-encoders on. For the two SpecificRecord cases: On my machine, FooBarSpecificRecordTestWrite improved 36% (from 3577 ms to 2296 ms), while FooBarSpecificRecordTestRead improved 12% (4728 ms to 4130 ms).  It's not surprising that the read case improved less: the overhead of accommodating schema migration is high.  I have some ideas on how improve performance even more, esp. for the read case.  That said, a >10% improvement is not bad, and 36% improvement is quite good, so I suggest we commit this change as-is and save further improvements to future patches. (Thiru points out that FooBarSpecificRecord a very small class that probably understates the performance-improvements of this patch.  In our work at Aqfer, we've seen larger improvements.)
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226108653

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + if (data.useCustomCoders()) {
            + Object r = data.newRecord(old, expected);

            Review comment:
            Yes, I think that works. The punt case at the bottom will then reuse the instance already created.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226108653 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + if (data.useCustomCoders()) { + Object r = data.newRecord(old, expected); Review comment: Yes, I think that works. The punt case at the bottom will then reuse the instance already created. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226101432

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + if (data.useCustomCoders()) {
            + Object r = data.newRecord(old, expected);

            Review comment:
            What about the following:
            ```
            old = data.newRecord(old, expected);
            if (old instanceof SpecificRecordBase) {
            ...
            ```
            That is to say, if I get a record back from newRecord and it doesn't have custom coders, then I pass the old-or-newly-allocated object to `super.readRecord`, which won't then need to allocate another object.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226101432 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + if (data.useCustomCoders()) { + Object r = data.newRecord(old, expected); Review comment: What about the following: ``` old = data.newRecord(old, expected); if (old instanceof SpecificRecordBase) { ... ``` That is to say, if I get a record back from newRecord and it doesn't have custom coders, then I pass the old-or-newly-allocated object to `super.readRecord`, which won't then need to allocate another object. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350#discussion_r226033686

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + if (data.useCustomCoders()) {
            + Object r = data.newRecord(old, expected);

            Review comment:
            If you enable custom coders and use classes that don't have them, you'll silently create spurious instances. That could be a significant performance hit. Should it instead throw an exception? Or, if that's not practical, might there be a faster way to test if a schema has a custom coder?

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350#discussion_r226033686 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + if (data.useCustomCoders()) { + Object r = data.newRecord(old, expected); Review comment: If you enable custom coders and use classes that don't have them, you'll silently create spurious instances. That could be a significant performance hit. Should it instead throw an exception? Or, if that's not practical, might there be a faster way to test if a schema has a custom coder? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata closed pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256

            This is a PR merged from a forked repository.
            As GitHub hides the original diff on merge, it is displayed below for
            the sake of provenance:

            As this is a foreign pull request (from a fork), the diff is supplied
            below (as it won't show otherwise due to GitHub magic):

            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            index be6bde8d2..bb918e41d 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private static final boolean USE_CUSTOM_CODERS
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return USE_CUSTOM_CODERS; }

            +
            @Override
            protected boolean isEnum(Object datum) {
            return datum instanceof Enum || super.isEnum(datum);
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            index 774ca0944..0a9c97014 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + Object r = data.newRecord(old, expected);
            + if (SpecificData.get().useCustomCoders()
            + && r instanceof SpecificRecordBase) // TODO: Is this needed?
            + {
            + SpecificRecordBase d = (SpecificRecordBase) r;
            + if (d.hasCustomCoders())

            { + d.decode(in); + return d; + }

            + }
            + return super.readRecord(old, expected, in);
            + }
            +
            @Override
            protected void readField(Object r, Schema.Field f, Object oldDatum,
            ResolvingDecoder in, Object state)
            diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            index 7bee02a65..ee1d850a7 100644
            — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out)
            writeString(datum, out);
            }

            + @Override
            + protected void writeRecord(Schema schema, Object datum, Encoder out)
            + throws IOException {
            + if (SpecificData.get().useCustomCoders()
            + && datum instanceof SpecificRecordBase) // TODO: Is this needed?
            + {
            + SpecificRecordBase d = (SpecificRecordBase) datum;
            + if (d.hasCustomCoders())

            { + d.encode(out); + return; + }

            + }
            + super.writeRecord(schema, datum, out);
            + }
            +
            @Override
            protected void writeField(Object datum, Schema.Field f, Encoder out,
            Object state) throws IOException

            { diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java index 20d3dc331..2c26d0282 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java @@ -25,6 +25,8 @@ import org.apache.avro.Conversion; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.Decoder; +import org.apache.avro.io.Encoder; /** Base class for generated record classes. */ public abstract class SpecificRecordBase @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); }

            +
            + /** Returns true iff an instance supports the

            {@link #encode} and
            + * {@link #decode} operations. Should only be used by
            + * <code>SpecificDatumReader/Writer</code> to selectively use
            + * {@link #encode}

            and

            {@link #decode}

            to optimize the (de)serialization of
            + * values. */
            + public boolean hasCustomCoders()

            { return false; }

            +
            + public void encode(Encoder out) throws IOException

            { + throw new UnsupportedOperationException(); + }
            +
            + public void decode(Decoder in) throws IOException {+ throw new UnsupportedOperationException();+ }

            }
            diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            index 7a6f5f1cc..8d8106538 100644
            — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
            @@ -572,9 +572,24 @@ private Schema addStringType(Schema s, Map<Schema,Schema> seen)

            { return result; }
            • private String getStringType(JsonNode overrideClassProperty) {
            • if (overrideClassProperty != null)
            • return overrideClassProperty.getTextValue();
              + /** Utility for template use (and also internal use). Returns
              + * a string giving the FQN of the Java type to be used for a string
              + * schema or for the key of a map schema. (It's an error to call
              + * this on a schema other than a string or map.) */
              + public String getStringType(Schema s) {
              + String prop;
              + switch (s.getType()) { + case MAP: + prop = SpecificData.KEY_CLASS_PROP; + break; + case STRING: + prop = SpecificData.CLASS_PROP; + break; + default: + throw new IllegalArgumentException("Can't check string-type of non-string/map type: " + s); + }

              + JsonNode override = s.getJsonProp(prop);
              + if (override != null) return override.getTextValue();
              switch (stringType) {
              case String: return "java.lang.String";
              case Utf8: return "org.apache.avro.util.Utf8";
              @@ -583,6 +598,17 @@ private String getStringType(JsonNode overrideClassProperty) {
              }
              }

            + /** Utility for template use. Returns true iff a STRING-schema or
            + * the key of a MAP-schema is what SpecificData defines as
            + * "stringable" (which means we need to call toString on it before
            + * before writing it). */
            + public boolean isStringable(Schema schema)

            { + String t = getStringType(schema); + return ! (t.equals("java.lang.String") + || t.equals("java.lang.CharSequence") + || t.equals("org.apache.avro.util.Utf8")); + }

            +
            private static final Schema NULL_SCHEMA = Schema.create(Schema.Type.NULL);

            /** Utility for template use. Returns the java type for a Schema. */
            @@ -607,15 +633,14 @@ private String javaType(Schema schema, boolean checkConvertedLogicalType) {
            return "java.util.List<" + javaType(schema.getElementType()) + ">";
            case MAP:
            return "java.util.Map<"

            • + getStringType(schema.getJsonProp(SpecificData.KEY_CLASS_PROP))+","
            • + javaType(schema.getValueType()) + ">";
              + + getStringType(schema)+ "," + javaType(schema.getValueType()) + ">";
              case UNION:
              List<Schema> types = schema.getTypes(); // elide unions with null
              if ((types.size() == 2) && types.contains(NULL_SCHEMA))
              return javaType(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0));
              return "java.lang.Object";
              case STRING:
            • return getStringType(schema.getJsonProp(SpecificData.CLASS_PROP));
              + return getStringType(schema);
              case BYTES: return "java.nio.ByteBuffer";
              case INT: return "java.lang.Integer";
              case LONG: return "java.lang.Long";
              @@ -656,6 +681,58 @@ public String javaUnbox(Schema schema) {
              }
              }

            +
            + /** Utility for template use. Return a string with a given number
            + * of spaces to be used for indentation purposes. */
            + public String indent(int n)

            { + return new String(new char[n]).replace('\0', ' '); + }

            +
            + /** Utility for template use. For a two-branch union type with
            + * one null branch, returns the index of the null branch. It's an
            + * error to use on anything other than a two-branch union with on
            + * null branch. */
            + public int getNonNullIndex(Schema s)

            { + if (s.getType() != Schema.Type.UNION + || s.getTypes().size() != 2 + || ! s.getTypes().contains(NULL_SCHEMA)) + throw new IllegalArgumentException("Can only be used on 2-branch union with a null branch: " + s); + return (s.getTypes().get(0).equals(NULL_SCHEMA) ? 1 : 0); + }

            +
            + /** Utility for template use. Returns true if the encode/decode
            + * logic in record.vm can handle the schema being presented. */
            + public boolean isCustomCodable(Schema schema)

            { + if (schema.isError()) return false; + return isCustomCodable(schema, new HashSet<Schema>()); + }

            +
            + private boolean isCustomCodable(Schema schema, Set<Schema> seen) {
            + if (! seen.add(schema)) return true;
            + if (schema.getLogicalType() != null) return false;
            + boolean result = true;
            + switch (schema.getType())

            { + case RECORD: + for (Schema.Field f : schema.getFields()) + result &= isCustomCodable(f.schema(), seen); + break; + case MAP: + result = isCustomCodable(schema.getValueType(), seen); + break; + case ARRAY: + result = isCustomCodable(schema.getElementType(), seen); + break; + case UNION: + List<Schema> types = schema.getTypes(); + // Only know how to handle "nulling" unions for now + if (types.size() != 2 || ! types.contains(NULL_SCHEMA)) return false; + for (Schema s : types) result &= isCustomCodable(s, seen); + break; + default: + }

            + return result;
            + }
            +
            public boolean hasLogicalTypeField(Schema schema) {
            for (Schema.Field field : schema.getFields()) {
            if (field.schema().getLogicalType() != null) {
            diff --git a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            index ccec4b60c..61259591f 100644
            — a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            +++ b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            @@ -19,7 +19,9 @@
            package $schema.getNamespace();
            #end

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            #if (!$schema.isError())
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or
            READER$.read(this, SpecificData.getDecoder(in));
            }

            +#if ($this.isCustomCodable($schema))
            + @Override public boolean hasCustomCoders()

            { return true; }
            +
            + @Override public void encode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#encodeVar(0 "this.${n}" $s)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +
            + @Override public void decode(org.apache.avro.io.Decoder in)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
            +#decodeVar(0 "this.${n}" $s $rs)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +#end
            }
            +
            +#macro( encodeVar $indent $var $s )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this encodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#encodeArray($indent $var $s)
            +#elseif ($s.Type.Name.equals("map"))
            +#encodeMap($indent $var $s)
            +#elseif ($s.Type.Name.equals("union"))
            +#encodeUnion($indent $var $s)
            +##### Use the generated "encode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I ${var}.encode(out);
            +##### For rest of cases, generate calls out.writeXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I out.writeNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I out.writeBoolean(${var});
            +#elseif ($s.Type.Name.equals("int"))
            +$I out.writeInt(${var});
            +#elseif ($s.Type.Name.equals("long"))
            +$I out.writeLong(${var});
            +#elseif ($s.Type.Name.equals("float"))
            +$I out.writeFloat(${var});
            +#elseif ($s.Type.Name.equals("double"))
            +$I out.writeDouble(${var});
            +#elseif ($s.Type.Name.equals("string"))
            +#if ($this.isStringable($s))
            +$I out.writeString(${var}.toString());
            +#else
            +$I out.writeString(${var});
            +#end
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I out.writeBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I out.writeEnum(${var}.ordinal());
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( encodeArray $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.ElementType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeArrayStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for ($et e${nv}: ${var}) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ElementType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeArrayEnd();
            +$I if (actualSize${nv} != size${nv})
            +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeMap $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeMapStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#if ($this.isStringable($s))
            +$I out.writeString(e${nv}.getKey().toString());
            +#else
            +$I out.writeString(e${nv}.getKey());
            +#end
            +$I $vt v${nv} = e${nv}.getValue();
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ValueType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeMapEnd();
            +$I if (actualSize${nv} != size${nv})
            + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeUnion $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +$I if (${var} == null) {
            +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#{else}0#end);
            +$I out.writeNull();
            +$I } else {
            +$I out.writeIndex(${this.getNonNullIndex($s)});
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s)))
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            +
            +
            +#macro( decodeVar $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this decodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#decodeArray($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("map"))
            +#decodeMap($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("union"))
            +#decodeUnion($indent $var $s $rs)
            +##### Use the generated "decode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I ${var}.decode(in);
            +##### For rest of cases, generate calls in.readXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I in.readNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I $var = in.readBoolean();
            +#elseif ($s.Type.Name.equals("int"))
            +$I $var = in.readInt();
            +#elseif ($s.Type.Name.equals("long"))
            +$I $var = in.readLong();
            +#elseif ($s.Type.Name.equals("float"))
            +$I $var = in.readFloat();
            +#elseif ($s.Type.Name.equals("double"))
            +$I $var = in.readDouble();
            +#elseif ($s.Type.Name.equals("string"))
            +#decodeString( "$I" $var $s )
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I $var = in.readBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I $var = ${this.javaType($s)}.values()[in.readEnum()];
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( decodeString $II $var $s )
            +#set ($st = ${this.getStringType($s)})
            +#if ($this.isStringable($s))
            +$II ${var} = new ${st}(in.readString());
            +#elseif ($st.equals("java.lang.String"))
            +$II $var = in.readString();
            +#elseif ($st.equals("org.apache.avro.util.Utf8"))
            +$II $var = in.readString(${var});
            +#else
            +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null);
            +#end
            +#end
            +
            +#macro( decodeArray $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($et = $this.javaType($s.ElementType))
            +#set ($gat = "SpecificData.Array<${et}>")
            +$I long size${nv} = in.readArrayStart();
            +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system
            +$I if (a${nv} == null) {
            +$I a${nv} = new ${gat}((int)size${nv}, ${rs});
            +$I $var = a${nv};
            +$I } else a${nv}.clear();
            +$I $gat ga${nv} = (a${nv} instanceof SpecificData.Array ? (${gat})a${nv} : null);
            +$I for ( ; 0 < size${nv}; size${nv} = in.arrayNext()) {
            +$I for ( ; size${nv} != 0; size${nv}--) {
            +$I $et e${nv} = (ga${nv} != null ? ga${nv}.peek() : null);
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 4)
            +#decodeVar($indent $var $s.ElementType "${rs}.getElementType()")
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 4)
            +#set ($I = $this.indent($indent))
            +$I a${nv}.add(e${nv});
            +$I }
            +$I }
            +#end
            +
            +#macro( decodeMap $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = in.readMapStart();
            +$I $t m${nv} = ${var}; // Need fresh name due to limitation of macro system
            +$I if (m${nv} == null) {
            +$I m${nv} = new java.util.HashMap<${kt},${vt}>((int)size${nv});
            +$I $var = m${nv};
            +$I } else m${nv}.clear();
            +$I for ( ; 0 < size${nv}; size${nv} = in.mapNext()) {
            +$I for ( ; size${nv} != 0; size${nv}--) {
            +$I $kt k${nv} = null;
            +#decodeString( "$I " "k${nv}" $s )
            +$I $vt v${nv} = null;
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 4)
            +#decodeVar($indent $var $s.ValueType "${rs}.getValueType()")
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 4)
            +#set ($I = $this.indent($indent))
            +$I m${nv}.put(k${nv}, v${nv});
            +$I }
            +$I }
            +#end
            +
            +#macro( decodeUnion $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +#set ($si = $this.getNonNullIndex($s))
            +$I if (in.readIndex() != ${si}) {
            +$I in.readNull();
            +$I ${var} = null;
            +$I } else {
            +#set ($indent = $indent + 2)
            +#decodeVar($indent $var $s.Types.get($si) "${rs}.getTypes().get(${si})")
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            diff --git a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            index 4dff5ef50..264e8e222 100644
            — a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            +++ b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java
            @@ -5,7 +5,9 @@
            */
            package avro.examples.baseball;

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            import org.apache.avro.message.SchemaStore;
            @@ -460,4 +462,65 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); }

            + @Override public boolean hasCustomCoders() { return true; }

            +
            + @Override public void encode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            + out.writeInt(this.number);
            +
            + out.writeString(this.first_name);
            +
            + out.writeString(this.last_name);
            +
            + long size0 = this.position.size();
            + out.writeArrayStart();
            + out.setItemCount(size0);
            + long actualSize0 = 0;
            + for (avro.examples.baseball.Position e0: this.position)

            { + actualSize0++; + out.startItem(); + out.writeEnum(e0.ordinal()); + }
            + out.writeArrayEnd();
            + if (actualSize0 != size0)
            + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
            +
            + }
            +
            + @Override public void decode(org.apache.avro.io.Decoder in)
            + throws java.io.IOException
            + {
            + this.number = in.readInt();
            +
            + this.first_name = in.readString();
            +
            + this.last_name = in.readString();
            +
            + long size0 = in.readArrayStart();
            + java.util.List<avro.examples.baseball.Position> a0 = this.position; // Need fresh name due to limitation of macro system
            + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear();
            + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null);
            + for ( ; 0 < size0; size0 = in.arrayNext()) {
            + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + }
            + }
            +
            + }
            }
            +
            +
            +
            +
            +
            +
            +
            +
            +
            +
            diff --git a/lang/java/tools/src/test/compiler/output/Player.java b/lang/java/tools/src/test/compiler/output/Player.java
            index 26fcbc0d5..c9bf1687a 100644
            — a/lang/java/tools/src/test/compiler/output/Player.java
            +++ b/lang/java/tools/src/test/compiler/output/Player.java
            @@ -5,7 +5,9 @@
            */
            package avro.examples.baseball;

            +import org.apache.avro.generic.GenericArray;
            import org.apache.avro.specific.SpecificData;
            +import org.apache.avro.util.Utf8;
            import org.apache.avro.message.BinaryMessageEncoder;
            import org.apache.avro.message.BinaryMessageDecoder;
            import org.apache.avro.message.SchemaStore;
            @@ -460,4 +462,65 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); }

            + @Override public boolean hasCustomCoders() { return true; }
            +
            + @Override public void encode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            + out.writeInt(this.number);
            +
            + out.writeString(this.first_name);
            +
            + out.writeString(this.last_name);
            +
            + long size0 = this.position.size();
            + out.writeArrayStart();
            + out.setItemCount(size0);
            + long actualSize0 = 0;
            + for (avro.examples.baseball.Position e0: this.position) {+ actualSize0++;+ out.startItem();+ out.writeEnum(e0.ordinal());+ }

            + out.writeArrayEnd();
            + if (actualSize0 != size0)
            + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
            +
            + }
            +
            + @Override public void decode(org.apache.avro.io.Decoder in)
            + throws java.io.IOException
            + {
            + this.number = in.readInt();
            +
            + this.first_name = in.readString(this.first_name instanceof Utf8 ? (Utf8)this.first_name : null);
            +
            + this.last_name = in.readString(this.last_name instanceof Utf8 ? (Utf8)this.last_name : null);
            +
            + long size0 = in.readArrayStart();
            + java.util.List<avro.examples.baseball.Position> a0 = this.position; // Need fresh name due to limitation of macro system
            + if (a0 == null)

            { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + }

            else a0.clear();
            + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null);
            + for ( ; 0 < size0; size0 = in.arrayNext()) {
            + for ( ; size0 != 0; size0--)

            { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + }

            + }
            +
            + }
            }
            +
            +
            +
            +
            +
            +
            +
            +
            +
            +

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata closed pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java index be6bde8d2..bb918e41d 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private static final boolean USE_CUSTOM_CODERS + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return USE_CUSTOM_CODERS; } + @Override protected boolean isEnum(Object datum) { return datum instanceof Enum || super.isEnum(datum); diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java index 774ca0944..0a9c97014 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + Object r = data.newRecord(old, expected); + if (SpecificData.get().useCustomCoders() + && r instanceof SpecificRecordBase) // TODO: Is this needed? + { + SpecificRecordBase d = (SpecificRecordBase) r; + if (d.hasCustomCoders()) { + d.decode(in); + return d; + } + } + return super.readRecord(old, expected, in); + } + @Override protected void readField(Object r, Schema.Field f, Object oldDatum, ResolvingDecoder in, Object state) diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java index 7bee02a65..ee1d850a7 100644 — a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out) writeString(datum, out); } + @Override + protected void writeRecord(Schema schema, Object datum, Encoder out) + throws IOException { + if (SpecificData.get().useCustomCoders() + && datum instanceof SpecificRecordBase) // TODO: Is this needed? + { + SpecificRecordBase d = (SpecificRecordBase) datum; + if (d.hasCustomCoders()) { + d.encode(out); + return; + } + } + super.writeRecord(schema, datum, out); + } + @Override protected void writeField(Object datum, Schema.Field f, Encoder out, Object state) throws IOException { diff --git a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java index 20d3dc331..2c26d0282 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java +++ b/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java @@ -25,6 +25,8 @@ import org.apache.avro.Conversion; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.Decoder; +import org.apache.avro.io.Encoder; /** Base class for generated record classes. */ public abstract class SpecificRecordBase @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); } + + /** Returns true iff an instance supports the {@link #encode} and + * {@link #decode} operations. Should only be used by + * <code>SpecificDatumReader/Writer</code> to selectively use + * {@link #encode} and {@link #decode} to optimize the (de)serialization of + * values. */ + public boolean hasCustomCoders() { return false; } + + public void encode(Encoder out) throws IOException { + throw new UnsupportedOperationException(); + } + + public void decode(Decoder in) throws IOException {+ throw new UnsupportedOperationException();+ } } diff --git a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java index 7a6f5f1cc..8d8106538 100644 — a/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java +++ b/lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java @@ -572,9 +572,24 @@ private Schema addStringType(Schema s, Map<Schema,Schema> seen) { return result; } private String getStringType(JsonNode overrideClassProperty) { if (overrideClassProperty != null) return overrideClassProperty.getTextValue(); + /** Utility for template use (and also internal use). Returns + * a string giving the FQN of the Java type to be used for a string + * schema or for the key of a map schema. (It's an error to call + * this on a schema other than a string or map.) */ + public String getStringType(Schema s) { + String prop; + switch (s.getType()) { + case MAP: + prop = SpecificData.KEY_CLASS_PROP; + break; + case STRING: + prop = SpecificData.CLASS_PROP; + break; + default: + throw new IllegalArgumentException("Can't check string-type of non-string/map type: " + s); + } + JsonNode override = s.getJsonProp(prop); + if (override != null) return override.getTextValue(); switch (stringType) { case String: return "java.lang.String"; case Utf8: return "org.apache.avro.util.Utf8"; @@ -583,6 +598,17 @@ private String getStringType(JsonNode overrideClassProperty) { } } + /** Utility for template use. Returns true iff a STRING-schema or + * the key of a MAP-schema is what SpecificData defines as + * "stringable" (which means we need to call toString on it before + * before writing it). */ + public boolean isStringable(Schema schema) { + String t = getStringType(schema); + return ! (t.equals("java.lang.String") + || t.equals("java.lang.CharSequence") + || t.equals("org.apache.avro.util.Utf8")); + } + private static final Schema NULL_SCHEMA = Schema.create(Schema.Type.NULL); /** Utility for template use. Returns the java type for a Schema. */ @@ -607,15 +633,14 @@ private String javaType(Schema schema, boolean checkConvertedLogicalType) { return "java.util.List<" + javaType(schema.getElementType()) + ">"; case MAP: return "java.util.Map<" + getStringType(schema.getJsonProp(SpecificData.KEY_CLASS_PROP))+"," + javaType(schema.getValueType()) + ">"; + + getStringType(schema)+ "," + javaType(schema.getValueType()) + ">"; case UNION: List<Schema> types = schema.getTypes(); // elide unions with null if ((types.size() == 2) && types.contains(NULL_SCHEMA)) return javaType(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0)); return "java.lang.Object"; case STRING: return getStringType(schema.getJsonProp(SpecificData.CLASS_PROP)); + return getStringType(schema); case BYTES: return "java.nio.ByteBuffer"; case INT: return "java.lang.Integer"; case LONG: return "java.lang.Long"; @@ -656,6 +681,58 @@ public String javaUnbox(Schema schema) { } } + + /** Utility for template use. Return a string with a given number + * of spaces to be used for indentation purposes. */ + public String indent(int n) { + return new String(new char[n]).replace('\0', ' '); + } + + /** Utility for template use. For a two-branch union type with + * one null branch, returns the index of the null branch. It's an + * error to use on anything other than a two-branch union with on + * null branch. */ + public int getNonNullIndex(Schema s) { + if (s.getType() != Schema.Type.UNION + || s.getTypes().size() != 2 + || ! s.getTypes().contains(NULL_SCHEMA)) + throw new IllegalArgumentException("Can only be used on 2-branch union with a null branch: " + s); + return (s.getTypes().get(0).equals(NULL_SCHEMA) ? 1 : 0); + } + + /** Utility for template use. Returns true if the encode/decode + * logic in record.vm can handle the schema being presented. */ + public boolean isCustomCodable(Schema schema) { + if (schema.isError()) return false; + return isCustomCodable(schema, new HashSet<Schema>()); + } + + private boolean isCustomCodable(Schema schema, Set<Schema> seen) { + if (! seen.add(schema)) return true; + if (schema.getLogicalType() != null) return false; + boolean result = true; + switch (schema.getType()) { + case RECORD: + for (Schema.Field f : schema.getFields()) + result &= isCustomCodable(f.schema(), seen); + break; + case MAP: + result = isCustomCodable(schema.getValueType(), seen); + break; + case ARRAY: + result = isCustomCodable(schema.getElementType(), seen); + break; + case UNION: + List<Schema> types = schema.getTypes(); + // Only know how to handle "nulling" unions for now + if (types.size() != 2 || ! types.contains(NULL_SCHEMA)) return false; + for (Schema s : types) result &= isCustomCodable(s, seen); + break; + default: + } + return result; + } + public boolean hasLogicalTypeField(Schema schema) { for (Schema.Field field : schema.getFields()) { if (field.schema().getLogicalType() != null) { diff --git a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm index ccec4b60c..61259591f 100644 — a/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm +++ b/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm @@ -19,7 +19,9 @@ package $schema.getNamespace(); #end +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; #if (!$schema.isError()) import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or READER$.read(this, SpecificData.getDecoder(in)); } +#if ($this.isCustomCodable($schema)) + @Override public boolean hasCustomCoders() { return true; } + + @Override public void encode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#encodeVar(0 "this.${n}" $s) + +#set ($nv = $maxnv) +#end + } + + @Override public void decode(org.apache.avro.io.Decoder in) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#set ($rs = "SCHEMA$.getField(""${n}"").schema()") +#decodeVar(0 "this.${n}" $s $rs) + +#set ($nv = $maxnv) +#end + } +#end } + +#macro( encodeVar $indent $var $s ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this encodeVar macro: +#if ($s.Type.Name.equals("array")) +#encodeArray($indent $var $s) +#elseif ($s.Type.Name.equals("map")) +#encodeMap($indent $var $s) +#elseif ($s.Type.Name.equals("union")) +#encodeUnion($indent $var $s) +##### Use the generated "encode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I ${var}.encode(out); +##### For rest of cases, generate calls out.writeXYZ: +#elseif ($s.Type.Name.equals("null")) +$I out.writeNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I out.writeBoolean(${var}); +#elseif ($s.Type.Name.equals("int")) +$I out.writeInt(${var}); +#elseif ($s.Type.Name.equals("long")) +$I out.writeLong(${var}); +#elseif ($s.Type.Name.equals("float")) +$I out.writeFloat(${var}); +#elseif ($s.Type.Name.equals("double")) +$I out.writeDouble(${var}); +#elseif ($s.Type.Name.equals("string")) +#if ($this.isStringable($s)) +$I out.writeString(${var}.toString()); +#else +$I out.writeString(${var}); +#end +#elseif ($s.Type.Name.equals("bytes")) +$I out.writeBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I out.writeEnum(${var}.ordinal()); +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( encodeArray $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.ElementType)) +$I long size${nv} = ${var}.size(); +$I out.writeArrayStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for ($et e${nv}: ${var}) { $I actualSize${nv} +; +$I out.startItem(); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ElementType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeArrayEnd(); +$I if (actualSize${nv} != size${nv}) +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeMap $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = ${var}.size(); +$I out.writeMapStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) { $I actualSize${nv} +; +$I out.startItem(); +#if ($this.isStringable($s)) +$I out.writeString(e${nv}.getKey().toString()); +#else +$I out.writeString(e${nv}.getKey()); +#end +$I $vt v${nv} = e${nv}.getValue(); +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ValueType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeMapEnd(); +$I if (actualSize${nv} != size${nv}) + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeUnion $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +$I if (${var} == null) { +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#{else}0#end); +$I out.writeNull(); +$I } else { +$I out.writeIndex(${this.getNonNullIndex($s)}); +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s))) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end + + +#macro( decodeVar $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this decodeVar macro: +#if ($s.Type.Name.equals("array")) +#decodeArray($indent $var $s $rs) +#elseif ($s.Type.Name.equals("map")) +#decodeMap($indent $var $s $rs) +#elseif ($s.Type.Name.equals("union")) +#decodeUnion($indent $var $s $rs) +##### Use the generated "decode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I ${var}.decode(in); +##### For rest of cases, generate calls in.readXYZ: +#elseif ($s.Type.Name.equals("null")) +$I in.readNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I $var = in.readBoolean(); +#elseif ($s.Type.Name.equals("int")) +$I $var = in.readInt(); +#elseif ($s.Type.Name.equals("long")) +$I $var = in.readLong(); +#elseif ($s.Type.Name.equals("float")) +$I $var = in.readFloat(); +#elseif ($s.Type.Name.equals("double")) +$I $var = in.readDouble(); +#elseif ($s.Type.Name.equals("string")) +#decodeString( "$I" $var $s ) +#elseif ($s.Type.Name.equals("bytes")) +$I $var = in.readBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I $var = ${this.javaType($s)}.values() [in.readEnum()] ; +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( decodeString $II $var $s ) +#set ($st = ${this.getStringType($s)}) +#if ($this.isStringable($s)) +$II ${var} = new ${st}(in.readString()); +#elseif ($st.equals("java.lang.String")) +$II $var = in.readString(); +#elseif ($st.equals("org.apache.avro.util.Utf8")) +$II $var = in.readString(${var}); +#else +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null); +#end +#end + +#macro( decodeArray $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($et = $this.javaType($s.ElementType)) +#set ($gat = "SpecificData.Array<${et}>") +$I long size${nv} = in.readArrayStart(); +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system +$I if (a${nv} == null) { +$I a${nv} = new ${gat}((int)size${nv}, ${rs}); +$I $var = a${nv}; +$I } else a${nv}.clear(); +$I $gat ga${nv} = (a${nv} instanceof SpecificData.Array ? (${gat})a${nv} : null); +$I for ( ; 0 < size${nv}; size${nv} = in.arrayNext()) { +$I for ( ; size${nv} != 0; size${nv}--) { +$I $et e${nv} = (ga${nv} != null ? ga${nv}.peek() : null); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 4) +#decodeVar($indent $var $s.ElementType "${rs}.getElementType()") +#set ($nv = $nv - 1) +#set ($indent = $indent - 4) +#set ($I = $this.indent($indent)) +$I a${nv}.add(e${nv}); +$I } +$I } +#end + +#macro( decodeMap $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = in.readMapStart(); +$I $t m${nv} = ${var}; // Need fresh name due to limitation of macro system +$I if (m${nv} == null) { +$I m${nv} = new java.util.HashMap<${kt},${vt}>((int)size${nv}); +$I $var = m${nv}; +$I } else m${nv}.clear(); +$I for ( ; 0 < size${nv}; size${nv} = in.mapNext()) { +$I for ( ; size${nv} != 0; size${nv}--) { +$I $kt k${nv} = null; +#decodeString( "$I " "k${nv}" $s ) +$I $vt v${nv} = null; +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 4) +#decodeVar($indent $var $s.ValueType "${rs}.getValueType()") +#set ($nv = $nv - 1) +#set ($indent = $indent - 4) +#set ($I = $this.indent($indent)) +$I m${nv}.put(k${nv}, v${nv}); +$I } +$I } +#end + +#macro( decodeUnion $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +#set ($si = $this.getNonNullIndex($s)) +$I if (in.readIndex() != ${si}) { +$I in.readNull(); +$I ${var} = null; +$I } else { +#set ($indent = $indent + 2) +#decodeVar($indent $var $s.Types.get($si) "${rs}.getTypes().get(${si})") +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end diff --git a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java index 4dff5ef50..264e8e222 100644 — a/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java +++ b/lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/Player.java @@ -5,7 +5,9 @@ */ package avro.examples.baseball; +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import org.apache.avro.message.SchemaStore; @@ -460,4 +462,65 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); } + @Override public boolean hasCustomCoders() { return true; } + + @Override public void encode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { + out.writeInt(this.number); + + out.writeString(this.first_name); + + out.writeString(this.last_name); + + long size0 = this.position.size(); + out.writeArrayStart(); + out.setItemCount(size0); + long actualSize0 = 0; + for (avro.examples.baseball.Position e0: this.position) { + actualSize0++; + out.startItem(); + out.writeEnum(e0.ordinal()); + } + out.writeArrayEnd(); + if (actualSize0 != size0) + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + "."); + + } + + @Override public void decode(org.apache.avro.io.Decoder in) + throws java.io.IOException + { + this.number = in.readInt(); + + this.first_name = in.readString(); + + this.last_name = in.readString(); + + long size0 = in.readArrayStart(); + java.util.List<avro.examples.baseball.Position> a0 = this.position; // Need fresh name due to limitation of macro system + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear(); + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null); + for ( ; 0 < size0; size0 = in.arrayNext()) { + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + } + } + + } } + + + + + + + + + + diff --git a/lang/java/tools/src/test/compiler/output/Player.java b/lang/java/tools/src/test/compiler/output/Player.java index 26fcbc0d5..c9bf1687a 100644 — a/lang/java/tools/src/test/compiler/output/Player.java +++ b/lang/java/tools/src/test/compiler/output/Player.java @@ -5,7 +5,9 @@ */ package avro.examples.baseball; +import org.apache.avro.generic.GenericArray; import org.apache.avro.specific.SpecificData; +import org.apache.avro.util.Utf8; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import org.apache.avro.message.SchemaStore; @@ -460,4 +462,65 @@ public Player build() { READER$.read(this, SpecificData.getDecoder(in)); } + @Override public boolean hasCustomCoders() { return true; } + + @Override public void encode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { + out.writeInt(this.number); + + out.writeString(this.first_name); + + out.writeString(this.last_name); + + long size0 = this.position.size(); + out.writeArrayStart(); + out.setItemCount(size0); + long actualSize0 = 0; + for (avro.examples.baseball.Position e0: this.position) {+ actualSize0++;+ out.startItem();+ out.writeEnum(e0.ordinal());+ } + out.writeArrayEnd(); + if (actualSize0 != size0) + throw new java.util.ConcurrentModificationException("Array-size written was " + size0 + ", but element count was " + actualSize0 + "."); + + } + + @Override public void decode(org.apache.avro.io.Decoder in) + throws java.io.IOException + { + this.number = in.readInt(); + + this.first_name = in.readString(this.first_name instanceof Utf8 ? (Utf8)this.first_name : null); + + this.last_name = in.readString(this.last_name instanceof Utf8 ? (Utf8)this.last_name : null); + + long size0 = in.readArrayStart(); + java.util.List<avro.examples.baseball.Position> a0 = this.position; // Need fresh name due to limitation of macro system + if (a0 == null) { + a0 = new SpecificData.Array<avro.examples.baseball.Position>((int)size0, SCHEMA$.getField("position").schema()); + this.position = a0; + } else a0.clear(); + SpecificData.Array<avro.examples.baseball.Position> ga0 = (a0 instanceof SpecificData.Array ? (SpecificData.Array<avro.examples.baseball.Position>)a0 : null); + for ( ; 0 < size0; size0 = in.arrayNext()) { + for ( ; size0 != 0; size0--) { + avro.examples.baseball.Position e0 = (ga0 != null ? ga0.peek() : null); + e0 = avro.examples.baseball.Position.values()[in.readEnum()]; + a0.add(e0); + } + } + + } } + + + + + + + + + + ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-430501383

            I've opened a new pull request (#350) to replace this one.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-430501383 I've opened a new pull request (#350) to replace this one. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata opened a new pull request #350: AVRO-2090 second try
            URL: https://github.com/apache/avro/pull/350

            I'm resubmitting my patch for AVRO-2090. I have made most the changes suggested by @dcutting.

            I created a test case for the schema-migration case. As predicted by @juwex, the old code failed in this case. I updated the generated code so it now works in this case.

            Finally, I modified the top-level pom.xml so that it runs tests twice, once with the old code enabled and once with the new code enabled. It's a good thing I did this, because it uncovered some backward-compat problems with ReflectData and friends, which subclass SepcificData and friends and were breaking when use_custom_coders was enabled.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata opened a new pull request #350: AVRO-2090 second try URL: https://github.com/apache/avro/pull/350 I'm resubmitting my patch for AVRO-2090 . I have made most the changes suggested by @dcutting. I created a test case for the schema-migration case. As predicted by @juwex, the old code failed in this case. I updated the generated code so it now works in this case. Finally, I modified the top-level pom.xml so that it runs tests twice, once with the old code enabled and once with the new code enabled. It's a good thing I did this, because it uncovered some backward-compat problems with ReflectData and friends, which subclass SepcificData and friends and were breaking when use_custom_coders was enabled. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221484459

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private static final boolean USE_CUSTOM_CODERS
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return USE_CUSTOM_CODERS; }

            Review comment:
            In https://github.com/rstata-projects/avro/commit/e81a5880be69aef5bffdcf38f7ae3491957a6a68, I changed this flag to be dynamically changable.

            For the record, here's the roll-out plan I imagined for this feature (See AVRO-2091(https://issues.apache.org/jira/browse/AVRO-2091)). The initial release of this feature sets the `use_custom_coders` flag to `false` by default, to allow for some road testing. The release after that would set this flag to `true` by default, to allow for some more serious road testing. The next release of Avro after that would then eliminate this flag altogether, under the assumption that the feature has been sufficiently tested and there's no longer a use for the old way of doing things.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221484459 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private static final boolean USE_CUSTOM_CODERS + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return USE_CUSTOM_CODERS; } Review comment: In https://github.com/rstata-projects/avro/commit/e81a5880be69aef5bffdcf38f7ae3491957a6a68 , I changed this flag to be dynamically changable. For the record, here's the roll-out plan I imagined for this feature (See AVRO-2091 ( https://issues.apache.org/jira/browse/AVRO-2091 )). The initial release of this feature sets the `use_custom_coders` flag to `false` by default, to allow for some road testing. The release after that would set this flag to `true` by default, to allow for some more serious road testing. The next release of Avro after that would then eliminate this flag altogether, under the assumption that the feature has been sufficiently tested and there's no longer a use for the old way of doing things. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-425691627

            @juwex, would love your help if you're still interested. Good catch on the field-order problem. It'd be great if you could develop a test case that demonstrates the bug you've found – it'd be a fantastic regression test. Regarding a fix for this problem, we should check to see if the Decoder provided is an instance of ResolvingDecoder, and if it is code should use readFieldOrder to read fields in the correct order.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-425691627 @juwex, would love your help if you're still interested. Good catch on the field-order problem. It'd be great if you could develop a test case that demonstrates the bug you've found – it'd be a fantastic regression test. Regarding a fix for this problem, we should check to see if the Decoder provided is an instance of ResolvingDecoder, and if it is code should use readFieldOrder to read fields in the correct order. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-425690934

            I will update this patch. The old changes were easily rebased onto Avro's current master. You can see that result here:

            https://github.com/rstata-projects/avro/tree/specific-new-rebased

            I then started addressing the comments above. I just closed a few of the change requests from above – you can find the resulting changes here:

            https://github.com/rstata-projects/avro/tree/AVRO-2090-again

            I'll keep working on this over the next few days. In the meantime, if folks have applications that use SpecificRecord, it'd be great if you could enable this feature and run it through your test cases.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-425690934 I will update this patch. The old changes were easily rebased onto Avro's current master. You can see that result here: https://github.com/rstata-projects/avro/tree/specific-new-rebased I then started addressing the comments above. I just closed a few of the change requests from above – you can find the resulting changes here: https://github.com/rstata-projects/avro/tree/AVRO-2090-again I'll keep working on this over the next few days. In the meantime, if folks have applications that use SpecificRecord, it'd be great if you could enable this feature and run it through your test cases. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221445814

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + Object r = data.newRecord(old, expected);
            + if (SpecificData.get().useCustomCoders()

            Review comment:
            ok

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221445814 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + Object r = data.newRecord(old, expected); + if (SpecificData.get().useCustomCoders() Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221445811

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            ##########
            @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out)
            writeString(datum, out);
            }

            + @Override
            + protected void writeRecord(Schema schema, Object datum, Encoder out)
            + throws IOException {
            + if (SpecificData.get().useCustomCoders()

            Review comment:
            ok

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221445811 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java ########## @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out) writeString(datum, out); } + @Override + protected void writeRecord(Schema schema, Object datum, Encoder out) + throws IOException { + if (SpecificData.get().useCustomCoders() Review comment: ok ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221445803

            ##########
            File path: lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            ##########
            @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or
            READER$.read(this, SpecificData.getDecoder(in));
            }

            +#if ($this.isCustomCodable($schema))
            + @Override public boolean hasCustomCoders()

            { return true; }

            +
            + @Override public void encode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#encodeVar(0 "this.${n}" $s)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +
            + @Override public void decode(org.apache.avro.io.Decoder in)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
            +#decodeVar(0 "this.${n}" $s $rs)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +#end
            }
            +
            +#macro( encodeVar $indent $var $s )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this encodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#encodeArray($indent $var $s)
            +#elseif ($s.Type.Name.equals("map"))
            +#encodeMap($indent $var $s)
            +#elseif ($s.Type.Name.equals("union"))
            +#encodeUnion($indent $var $s)
            +##### Use the generated "encode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I ${var}.encode(out);
            +##### For rest of cases, generate calls out.writeXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I out.writeNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I out.writeBoolean(${var});
            +#elseif ($s.Type.Name.equals("int"))
            +$I out.writeInt(${var});
            +#elseif ($s.Type.Name.equals("long"))
            +$I out.writeLong(${var});
            +#elseif ($s.Type.Name.equals("float"))
            +$I out.writeFloat(${var});
            +#elseif ($s.Type.Name.equals("double"))
            +$I out.writeDouble(${var});
            +#elseif ($s.Type.Name.equals("string"))
            +#if ($this.isStringable($s))
            +$I out.writeString(${var}.toString());
            +#else
            +$I out.writeString(${var});
            +#end
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I out.writeBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I out.writeEnum(${var}.ordinal());
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( encodeArray $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.ElementType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeArrayStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for ($et e${nv}: ${var}) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ElementType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeArrayEnd();
            +$I if (actualSize${nv} != size${nv})
            +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeMap $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeMapStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#if ($this.isStringable($s))
            +$I out.writeString(e${nv}.getKey().toString());
            +#else
            +$I out.writeString(e${nv}.getKey());
            +#end
            +$I $vt v${nv} = e${nv}.getValue();
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ValueType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeMapEnd();
            +$I if (actualSize${nv} != size${nv})
            + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeUnion $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +$I if (${var} == null) {
            +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#

            {else}

            0#end);
            +$I out.writeNull();
            +$I } else {
            +$I out.writeIndex(${this.getNonNullIndex($s)});
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s)))
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            +
            +
            +#macro( decodeVar $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this decodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#decodeArray($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("map"))
            +#decodeMap($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("union"))
            +#decodeUnion($indent $var $s $rs)
            +##### Use the generated "decode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I ${var}.decode(in);
            +##### For rest of cases, generate calls in.readXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I in.readNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I $var = in.readBoolean();
            +#elseif ($s.Type.Name.equals("int"))
            +$I $var = in.readInt();
            +#elseif ($s.Type.Name.equals("long"))
            +$I $var = in.readLong();
            +#elseif ($s.Type.Name.equals("float"))
            +$I $var = in.readFloat();
            +#elseif ($s.Type.Name.equals("double"))
            +$I $var = in.readDouble();
            +#elseif ($s.Type.Name.equals("string"))
            +#decodeString( "$I" $var $s )
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I $var = in.readBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I $var = ${this.javaType($s)}.values()[in.readEnum()];
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( decodeString $II $var $s )
            +#set ($st = ${this.getStringType($s)})
            +#if ($this.isStringable($s))
            +$II ${var} = new ${st}(in.readString());
            +#elseif ($st.equals("java.lang.String"))
            +$II $var = in.readString();
            +#elseif ($st.equals("org.apache.avro.util.Utf8"))
            +$II $var = in.readString(${var});
            +#else
            +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null);
            +#end
            +#end
            +
            +#macro( decodeArray $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($et = $this.javaType($s.ElementType))
            +#set ($gat = "SpecificData.Array<${et}>")
            +$I long size${nv} = in.readArrayStart();
            +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system
            +$I if (a${nv} == null) {

            Review comment:
            changed to a velocity comment

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221445803 ########## File path: lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm ########## @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or READER$.read(this, SpecificData.getDecoder(in)); } +#if ($this.isCustomCodable($schema)) + @Override public boolean hasCustomCoders() { return true; } + + @Override public void encode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#encodeVar(0 "this.${n}" $s) + +#set ($nv = $maxnv) +#end + } + + @Override public void decode(org.apache.avro.io.Decoder in) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#set ($rs = "SCHEMA$.getField(""${n}"").schema()") +#decodeVar(0 "this.${n}" $s $rs) + +#set ($nv = $maxnv) +#end + } +#end } + +#macro( encodeVar $indent $var $s ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this encodeVar macro: +#if ($s.Type.Name.equals("array")) +#encodeArray($indent $var $s) +#elseif ($s.Type.Name.equals("map")) +#encodeMap($indent $var $s) +#elseif ($s.Type.Name.equals("union")) +#encodeUnion($indent $var $s) +##### Use the generated "encode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I ${var}.encode(out); +##### For rest of cases, generate calls out.writeXYZ: +#elseif ($s.Type.Name.equals("null")) +$I out.writeNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I out.writeBoolean(${var}); +#elseif ($s.Type.Name.equals("int")) +$I out.writeInt(${var}); +#elseif ($s.Type.Name.equals("long")) +$I out.writeLong(${var}); +#elseif ($s.Type.Name.equals("float")) +$I out.writeFloat(${var}); +#elseif ($s.Type.Name.equals("double")) +$I out.writeDouble(${var}); +#elseif ($s.Type.Name.equals("string")) +#if ($this.isStringable($s)) +$I out.writeString(${var}.toString()); +#else +$I out.writeString(${var}); +#end +#elseif ($s.Type.Name.equals("bytes")) +$I out.writeBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I out.writeEnum(${var}.ordinal()); +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( encodeArray $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.ElementType)) +$I long size${nv} = ${var}.size(); +$I out.writeArrayStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for ($et e${nv}: ${var}) { $I actualSize${nv} +; +$I out.startItem(); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ElementType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeArrayEnd(); +$I if (actualSize${nv} != size${nv}) +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeMap $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = ${var}.size(); +$I out.writeMapStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) { $I actualSize${nv} +; +$I out.startItem(); +#if ($this.isStringable($s)) +$I out.writeString(e${nv}.getKey().toString()); +#else +$I out.writeString(e${nv}.getKey()); +#end +$I $vt v${nv} = e${nv}.getValue(); +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ValueType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeMapEnd(); +$I if (actualSize${nv} != size${nv}) + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeUnion $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +$I if (${var} == null) { +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1# {else} 0#end); +$I out.writeNull(); +$I } else { +$I out.writeIndex(${this.getNonNullIndex($s)}); +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s))) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end + + +#macro( decodeVar $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this decodeVar macro: +#if ($s.Type.Name.equals("array")) +#decodeArray($indent $var $s $rs) +#elseif ($s.Type.Name.equals("map")) +#decodeMap($indent $var $s $rs) +#elseif ($s.Type.Name.equals("union")) +#decodeUnion($indent $var $s $rs) +##### Use the generated "decode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I ${var}.decode(in); +##### For rest of cases, generate calls in.readXYZ: +#elseif ($s.Type.Name.equals("null")) +$I in.readNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I $var = in.readBoolean(); +#elseif ($s.Type.Name.equals("int")) +$I $var = in.readInt(); +#elseif ($s.Type.Name.equals("long")) +$I $var = in.readLong(); +#elseif ($s.Type.Name.equals("float")) +$I $var = in.readFloat(); +#elseif ($s.Type.Name.equals("double")) +$I $var = in.readDouble(); +#elseif ($s.Type.Name.equals("string")) +#decodeString( "$I" $var $s ) +#elseif ($s.Type.Name.equals("bytes")) +$I $var = in.readBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I $var = ${this.javaType($s)}.values() [in.readEnum()] ; +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( decodeString $II $var $s ) +#set ($st = ${this.getStringType($s)}) +#if ($this.isStringable($s)) +$II ${var} = new ${st}(in.readString()); +#elseif ($st.equals("java.lang.String")) +$II $var = in.readString(); +#elseif ($st.equals("org.apache.avro.util.Utf8")) +$II $var = in.readString(${var}); +#else +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null); +#end +#end + +#macro( decodeArray $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($et = $this.javaType($s.ElementType)) +#set ($gat = "SpecificData.Array<${et}>") +$I long size${nv} = in.readArrayStart(); +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system +$I if (a${nv} == null) { Review comment: changed to a velocity comment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221445798

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
            ##########
            @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in)
            new SpecificDatumReader(getSchema())
            .read(this, SpecificData.getDecoder(in));
            }
            +
            + /** Returns true iff an instance supports the

            {@link #encode} and
            + * {@link #decode} operations. Should only be used by
            + * <code>SpecificDatumReader/Writer</code> to selectively use
            + * {@link #encode}

            and

            {@link #decode}

            to optimize the (de)serialization of
            + * values. */
            + public boolean hasCustomCoders()

            { return false; }

            +

            Review comment:
            protected is enough

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221445798 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java ########## @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); } + + /** Returns true iff an instance supports the {@link #encode} and + * {@link #decode} operations. Should only be used by + * <code>SpecificDatumReader/Writer</code> to selectively use + * {@link #encode} and {@link #decode} to optimize the (de)serialization of + * values. */ + public boolean hasCustomCoders() { return false; } + Review comment: protected is enough ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r221445787

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + Object r = data.newRecord(old, expected);

            Review comment:
            yep

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r221445787 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + Object r = data.newRecord(old, expected); Review comment: yep ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            juwex commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-408317797

            I'm very excited about the performance improvements with this feature and would like to see this in a usable state sooner rather than later. If I can assist getting this branch into a mergeable state, I'd love to do so (with @rstata 's consent)

            There's a rather big issue with schema evolution currently, however. The generated reader relies on the field order within the `ResolvingDecoder` to stay the same. Hence, data written with a schema that only differs in regards to the order of fields will result in an error or (even worse) invalid data. If required, I can provide a code sample.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - juwex commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-408317797 I'm very excited about the performance improvements with this feature and would like to see this in a usable state sooner rather than later. If I can assist getting this branch into a mergeable state, I'd love to do so (with @rstata 's consent) There's a rather big issue with schema evolution currently, however. The generated reader relies on the field order within the `ResolvingDecoder` to stay the same. Hence, data written with a schema that only differs in regards to the order of fields will result in an error or (even worse) invalid data. If required, I can provide a code sample. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            zolyfarkas Zoltan Farkas added a comment -

            This can improve parsing performance even for GenericRecords... I created a GenericRecordBuilder that will generate/load "Generic-SpecificRecords" on the fly:

            https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/main/java/org/spf4j/avro/GenericRecordBuilder.java

             

            The generated GenericRecords can be significantly more memory efficient as well (primitives)...

            but I was musing about eventually eliminating the Generic/Specific implementation divide? would be interested in what others think... 

            zolyfarkas Zoltan Farkas added a comment - This can improve parsing performance even for GenericRecords... I created a GenericRecordBuilder that will generate/load "Generic-SpecificRecords" on the fly: https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/main/java/org/spf4j/avro/GenericRecordBuilder.java   The generated GenericRecords can be significantly more memory efficient as well (primitives)... but I was musing about eventually eliminating the Generic/Specific implementation divide? would be interested in what others think... 
            githubbot ASF GitHub Bot added a comment -

            cutting commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-380896347

            If it demonstrates a big performance improvement under Perf.java then I think it would be a good change to have & am willing to help get it committed.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-380896347 If it demonstrates a big performance improvement under Perf.java then I think it would be a good change to have & am willing to help get it committed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-379423675

            BTW, I believe that Thiru presented a bit of this work at Strata last Fall:

            https://conferences.oreilly.com/strata/strata-ny-2017/public/schedule/detail/60729

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-379423675 BTW, I believe that Thiru presented a bit of this work at Strata last Fall: https://conferences.oreilly.com/strata/strata-ny-2017/public/schedule/detail/60729 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            rstata commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-379423354

            This work has gotten a bit old in my head. I can dust it off and address the issues raised in this ticket, but want to make sure there is genuine interest before making the effort.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - rstata commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-379423354 This work has gotten a bit old in my head. I can dust it off and address the issues raised in this ticket, but want to make sure there is genuine interest before making the effort. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178191480

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + Object r = data.newRecord(old, expected);
            + if (SpecificData.get().useCustomCoders()

            Review comment:
            should use this.getSpecificData() instead of using the static instance

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178191480 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + Object r = data.newRecord(old, expected); + if (SpecificData.get().useCustomCoders() Review comment: should use this.getSpecificData() instead of using the static instance ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178191398

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java
            ##########
            @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out)
            writeString(datum, out);
            }

            + @Override
            + protected void writeRecord(Schema schema, Object datum, Encoder out)
            + throws IOException {
            + if (SpecificData.get().useCustomCoders()

            Review comment:
            should use this.getSpecificData() instead of the static instance

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178191398 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumWriter.java ########## @@ -71,6 +71,21 @@ protected void writeString(Schema schema, Object datum, Encoder out) writeString(datum, out); } + @Override + protected void writeRecord(Schema schema, Object datum, Encoder out) + throws IOException { + if (SpecificData.get().useCustomCoders() Review comment: should use this.getSpecificData() instead of the static instance ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178190902

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java
            ##########
            @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) {
            /** Return the singleton instance. */
            public static SpecificData get()

            { return INSTANCE; }

            + private static final boolean USE_CUSTOM_CODERS
            + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false"));
            + public boolean useCustomCoders()

            { return USE_CUSTOM_CODERS; }

            Review comment:
            useCustomCoders should probably be a settable member variable on each SpecificData instance, so folks can more easily switch it on and off without restarting the JVM.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178190902 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java ########## @@ -122,6 +122,10 @@ public DatumWriter createDatumWriter(Schema schema) { /** Return the singleton instance. */ public static SpecificData get() { return INSTANCE; } + private static final boolean USE_CUSTOM_CODERS + = Boolean.parseBoolean(System.getProperty("org.apache.avro.specific.use_custom_coders","false")); + public boolean useCustomCoders() { return USE_CUSTOM_CODERS; } Review comment: useCustomCoders should probably be a settable member variable on each SpecificData instance, so folks can more easily switch it on and off without restarting the JVM. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178152178

            ##########
            File path: lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm
            ##########
            @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or
            READER$.read(this, SpecificData.getDecoder(in));
            }

            +#if ($this.isCustomCodable($schema))
            + @Override public boolean hasCustomCoders()

            { return true; }

            +
            + @Override public void encode(org.apache.avro.io.Encoder out)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#encodeVar(0 "this.${n}" $s)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +
            + @Override public void decode(org.apache.avro.io.Decoder in)
            + throws java.io.IOException
            + {
            +#set ($nv = 0)## Counter to ensure unique var-names
            +#set ($maxnv = 0)## Holds high-water mark during recursion
            +#foreach ($field in $schema.getFields())
            +#set ($n = $this.mangle($field.name(), $schema.isError()))
            +#set ($s = $field.schema())
            +#set ($rs = "SCHEMA$.getField(""${n}"").schema()")
            +#decodeVar(0 "this.${n}" $s $rs)
            +
            +#set ($nv = $maxnv)
            +#end
            + }
            +#end
            }
            +
            +#macro( encodeVar $indent $var $s )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this encodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#encodeArray($indent $var $s)
            +#elseif ($s.Type.Name.equals("map"))
            +#encodeMap($indent $var $s)
            +#elseif ($s.Type.Name.equals("union"))
            +#encodeUnion($indent $var $s)
            +##### Use the generated "encode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I ${var}.encode(out);
            +##### For rest of cases, generate calls out.writeXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I out.writeNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I out.writeBoolean(${var});
            +#elseif ($s.Type.Name.equals("int"))
            +$I out.writeInt(${var});
            +#elseif ($s.Type.Name.equals("long"))
            +$I out.writeLong(${var});
            +#elseif ($s.Type.Name.equals("float"))
            +$I out.writeFloat(${var});
            +#elseif ($s.Type.Name.equals("double"))
            +$I out.writeDouble(${var});
            +#elseif ($s.Type.Name.equals("string"))
            +#if ($this.isStringable($s))
            +$I out.writeString(${var}.toString());
            +#else
            +$I out.writeString(${var});
            +#end
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I out.writeBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I out.writeEnum(${var}.ordinal());
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( encodeArray $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.ElementType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeArrayStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for ($et e${nv}: ${var}) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#set ($var = "e${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ElementType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeArrayEnd();
            +$I if (actualSize${nv} != size${nv})
            +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeMap $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($kt = $this.getStringType($s))
            +#set ($vt = $this.javaType($s.ValueType))
            +$I long size${nv} = ${var}.size();
            +$I out.writeMapStart();
            +$I out.setItemCount(size${nv});
            +$I long actualSize${nv} = 0;
            +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) {
            $I actualSize${nv}+;
            +$I out.startItem();
            +#if ($this.isStringable($s))
            +$I out.writeString(e${nv}.getKey().toString());
            +#else
            +$I out.writeString(e${nv}.getKey());
            +#end
            +$I $vt v${nv} = e${nv}.getValue();
            +#set ($var = "v${nv}")
            +#set ($nv = $nv + 1)
            +#set ($maxnv = $nv)
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.ValueType)
            +#set ($nv = $nv - 1)
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +$I out.writeMapEnd();
            +$I if (actualSize${nv} != size${nv})
            + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + ".");
            +#end
            +
            +#macro( encodeUnion $indent $var $s )
            +#set ($I = $this.indent($indent))
            +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s))))
            +$I if (${var} == null) {
            +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1#

            {else}

            0#end);
            +$I out.writeNull();
            +$I } else {
            +$I out.writeIndex(${this.getNonNullIndex($s)});
            +#set ($indent = $indent + 2)
            +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s)))
            +#set ($indent = $indent - 2)
            +#set ($I = $this.indent($indent))
            +$I }
            +#end
            +
            +
            +#macro( decodeVar $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +##### Compound types (array, map, and union) require calls
            +##### that will recurse back into this decodeVar macro:
            +#if ($s.Type.Name.equals("array"))
            +#decodeArray($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("map"))
            +#decodeMap($indent $var $s $rs)
            +#elseif ($s.Type.Name.equals("union"))
            +#decodeUnion($indent $var $s $rs)
            +##### Use the generated "decode" method as fast way to write
            +##### (specific) record types:
            +#elseif ($s.Type.Name.equals("record"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I ${var}.decode(in);
            +##### For rest of cases, generate calls in.readXYZ:
            +#elseif ($s.Type.Name.equals("null"))
            +$I in.readNull();
            +#elseif ($s.Type.Name.equals("boolean"))
            +$I $var = in.readBoolean();
            +#elseif ($s.Type.Name.equals("int"))
            +$I $var = in.readInt();
            +#elseif ($s.Type.Name.equals("long"))
            +$I $var = in.readLong();
            +#elseif ($s.Type.Name.equals("float"))
            +$I $var = in.readFloat();
            +#elseif ($s.Type.Name.equals("double"))
            +$I $var = in.readDouble();
            +#elseif ($s.Type.Name.equals("string"))
            +#decodeString( "$I" $var $s )
            +#elseif ($s.Type.Name.equals("bytes"))
            +$I $var = in.readBytes(${var});
            +#elseif ($s.Type.Name.equals("fixed"))
            +$I if (${var} == null) {
            +$I ${var} = new ${this.javaType($s)}();
            +$I }
            +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize});
            +#elseif ($s.Type.Name.equals("enum"))
            +$I $var = ${this.javaType($s)}.values()[in.readEnum()];
            +#else
            +## TODO – singal a code-gen-time error
            +#end
            +#end
            +
            +#macro( decodeString $II $var $s )
            +#set ($st = ${this.getStringType($s)})
            +#if ($this.isStringable($s))
            +$II ${var} = new ${st}(in.readString());
            +#elseif ($st.equals("java.lang.String"))
            +$II $var = in.readString();
            +#elseif ($st.equals("org.apache.avro.util.Utf8"))
            +$II $var = in.readString(${var});
            +#else
            +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null);
            +#end
            +#end
            +
            +#macro( decodeArray $indent $var $s $rs )
            +#set ($I = $this.indent($indent))
            +#set ($t = $this.javaType($s))
            +#set ($et = $this.javaType($s.ElementType))
            +#set ($gat = "SpecificData.Array<${et}>")
            +$I long size${nv} = in.readArrayStart();
            +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system
            +$I if (a${nv} == null) {

            Review comment:
            but we probably don't need this comment in every generated file

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178152178 ########## File path: lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/record.vm ########## @@ -473,4 +475,282 @@ public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends or READER$.read(this, SpecificData.getDecoder(in)); } +#if ($this.isCustomCodable($schema)) + @Override public boolean hasCustomCoders() { return true; } + + @Override public void encode(org.apache.avro.io.Encoder out) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#encodeVar(0 "this.${n}" $s) + +#set ($nv = $maxnv) +#end + } + + @Override public void decode(org.apache.avro.io.Decoder in) + throws java.io.IOException + { +#set ($nv = 0)## Counter to ensure unique var-names +#set ($maxnv = 0)## Holds high-water mark during recursion +#foreach ($field in $schema.getFields()) +#set ($n = $this.mangle($field.name(), $schema.isError())) +#set ($s = $field.schema()) +#set ($rs = "SCHEMA$.getField(""${n}"").schema()") +#decodeVar(0 "this.${n}" $s $rs) + +#set ($nv = $maxnv) +#end + } +#end } + +#macro( encodeVar $indent $var $s ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this encodeVar macro: +#if ($s.Type.Name.equals("array")) +#encodeArray($indent $var $s) +#elseif ($s.Type.Name.equals("map")) +#encodeMap($indent $var $s) +#elseif ($s.Type.Name.equals("union")) +#encodeUnion($indent $var $s) +##### Use the generated "encode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I ${var}.encode(out); +##### For rest of cases, generate calls out.writeXYZ: +#elseif ($s.Type.Name.equals("null")) +$I out.writeNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I out.writeBoolean(${var}); +#elseif ($s.Type.Name.equals("int")) +$I out.writeInt(${var}); +#elseif ($s.Type.Name.equals("long")) +$I out.writeLong(${var}); +#elseif ($s.Type.Name.equals("float")) +$I out.writeFloat(${var}); +#elseif ($s.Type.Name.equals("double")) +$I out.writeDouble(${var}); +#elseif ($s.Type.Name.equals("string")) +#if ($this.isStringable($s)) +$I out.writeString(${var}.toString()); +#else +$I out.writeString(${var}); +#end +#elseif ($s.Type.Name.equals("bytes")) +$I out.writeBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I out.writeFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I out.writeEnum(${var}.ordinal()); +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( encodeArray $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.ElementType)) +$I long size${nv} = ${var}.size(); +$I out.writeArrayStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for ($et e${nv}: ${var}) { $I actualSize${nv} +; +$I out.startItem(); +#set ($var = "e${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ElementType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeArrayEnd(); +$I if (actualSize${nv} != size${nv}) +$I throw new java.util.ConcurrentModificationException("Array-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeMap $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($kt = $this.getStringType($s)) +#set ($vt = $this.javaType($s.ValueType)) +$I long size${nv} = ${var}.size(); +$I out.writeMapStart(); +$I out.setItemCount(size${nv}); +$I long actualSize${nv} = 0; +$I for (java.util.Map.Entry<$kt, $vt> e${nv}: ${var}.entrySet()) { $I actualSize${nv} +; +$I out.startItem(); +#if ($this.isStringable($s)) +$I out.writeString(e${nv}.getKey().toString()); +#else +$I out.writeString(e${nv}.getKey()); +#end +$I $vt v${nv} = e${nv}.getValue(); +#set ($var = "v${nv}") +#set ($nv = $nv + 1) +#set ($maxnv = $nv) +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.ValueType) +#set ($nv = $nv - 1) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +$I out.writeMapEnd(); +$I if (actualSize${nv} != size${nv}) + throw new java.util.ConcurrentModificationException("Map-size written was " + size${nv} + ", but element count was " + actualSize${nv} + "."); +#end + +#macro( encodeUnion $indent $var $s ) +#set ($I = $this.indent($indent)) +#set ($et = $this.javaType($s.Types.get($this.getNonNullIndex($s)))) +$I if (${var} == null) { +$I out.writeIndex(#if($this.getNonNullIndex($s)==0)1# {else} 0#end); +$I out.writeNull(); +$I } else { +$I out.writeIndex(${this.getNonNullIndex($s)}); +#set ($indent = $indent + 2) +#encodeVar($indent $var $s.Types.get($this.getNonNullIndex($s))) +#set ($indent = $indent - 2) +#set ($I = $this.indent($indent)) +$I } +#end + + +#macro( decodeVar $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +##### Compound types (array, map, and union) require calls +##### that will recurse back into this decodeVar macro: +#if ($s.Type.Name.equals("array")) +#decodeArray($indent $var $s $rs) +#elseif ($s.Type.Name.equals("map")) +#decodeMap($indent $var $s $rs) +#elseif ($s.Type.Name.equals("union")) +#decodeUnion($indent $var $s $rs) +##### Use the generated "decode" method as fast way to write +##### (specific) record types: +#elseif ($s.Type.Name.equals("record")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I ${var}.decode(in); +##### For rest of cases, generate calls in.readXYZ: +#elseif ($s.Type.Name.equals("null")) +$I in.readNull(); +#elseif ($s.Type.Name.equals("boolean")) +$I $var = in.readBoolean(); +#elseif ($s.Type.Name.equals("int")) +$I $var = in.readInt(); +#elseif ($s.Type.Name.equals("long")) +$I $var = in.readLong(); +#elseif ($s.Type.Name.equals("float")) +$I $var = in.readFloat(); +#elseif ($s.Type.Name.equals("double")) +$I $var = in.readDouble(); +#elseif ($s.Type.Name.equals("string")) +#decodeString( "$I" $var $s ) +#elseif ($s.Type.Name.equals("bytes")) +$I $var = in.readBytes(${var}); +#elseif ($s.Type.Name.equals("fixed")) +$I if (${var} == null) { +$I ${var} = new ${this.javaType($s)}(); +$I } +$I in.readFixed(${var}.bytes(), 0, ${s.FixedSize}); +#elseif ($s.Type.Name.equals("enum")) +$I $var = ${this.javaType($s)}.values() [in.readEnum()] ; +#else +## TODO – singal a code-gen-time error +#end +#end + +#macro( decodeString $II $var $s ) +#set ($st = ${this.getStringType($s)}) +#if ($this.isStringable($s)) +$II ${var} = new ${st}(in.readString()); +#elseif ($st.equals("java.lang.String")) +$II $var = in.readString(); +#elseif ($st.equals("org.apache.avro.util.Utf8")) +$II $var = in.readString(${var}); +#else +$II $var = in.readString(${var} instanceof Utf8 ? (Utf8)${var} : null); +#end +#end + +#macro( decodeArray $indent $var $s $rs ) +#set ($I = $this.indent($indent)) +#set ($t = $this.javaType($s)) +#set ($et = $this.javaType($s.ElementType)) +#set ($gat = "SpecificData.Array<${et}>") +$I long size${nv} = in.readArrayStart(); +$I $t a${nv} = ${var}; // Need fresh name due to limitation of macro system +$I if (a${nv} == null) { Review comment: but we probably don't need this comment in every generated file ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178148789

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java
            ##########
            @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) {
            }
            }

            + @Override
            + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in)
            + throws IOException {
            + SpecificData data = getSpecificData();
            + Object r = data.newRecord(old, expected);

            Review comment:
            'r' should only be created when custom coders are used, no?

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178148789 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java ########## @@ -101,6 +101,23 @@ private Class getPropAsClass(Schema schema, String prop) { } } + @Override + protected Object readRecord(Object old, Schema expected, ResolvingDecoder in) + throws IOException { + SpecificData data = getSpecificData(); + Object r = data.newRecord(old, expected); Review comment: 'r' should only be created when custom coders are used, no? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            cutting commented on a change in pull request #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#discussion_r178149568

            ##########
            File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java
            ##########
            @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in)
            new SpecificDatumReader(getSchema())
            .read(this, SpecificData.getDecoder(in));
            }
            +
            + /** Returns true iff an instance supports the

            {@link #encode} and
            + * {@link #decode} operations. Should only be used by
            + * <code>SpecificDatumReader/Writer</code> to selectively use
            + * {@link #encode}

            and

            {@link #decode}

            to optimize the (de)serialization of
            + * values. */
            + public boolean hasCustomCoders()

            { return false; }

            +

            Review comment:
            Do these need to be public, or is protected enough? Also, they need some javadoc.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - cutting commented on a change in pull request #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#discussion_r178149568 ########## File path: lang/java/avro/src/main/java/org/apache/avro/specific/SpecificRecordBase.java ########## @@ -90,4 +92,19 @@ public void readExternal(ObjectInput in) new SpecificDatumReader(getSchema()) .read(this, SpecificData.getDecoder(in)); } + + /** Returns true iff an instance supports the {@link #encode} and + * {@link #decode} operations. Should only be used by + * <code>SpecificDatumReader/Writer</code> to selectively use + * {@link #encode} and {@link #decode} to optimize the (de)serialization of + * values. */ + public boolean hasCustomCoders() { return false; } + Review comment: Do these need to be public, or is protected enough? Also, they need some javadoc. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
            githubbot ASF GitHub Bot added a comment -

            scottcarey commented on issue #256: AVRO-2090: Improve encode/decode time for SpecificRecord using code generation
            URL: https://github.com/apache/avro/pull/256#issuecomment-377157550

            This is a good idea. What is the performance improvement? Did you run any benchmarks?

            I had imagined generating bytecode using ASM, but generating it in the generated class would work too.

            ----------------------------------------------------------------
            This is an automated message from the Apache Git Service.
            To respond to the message, please log on GitHub and use the
            URL above to go to the specific comment.

            For queries about this service, please contact Infrastructure at:
            users@infra.apache.org

            githubbot ASF GitHub Bot added a comment - scottcarey commented on issue #256: AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation URL: https://github.com/apache/avro/pull/256#issuecomment-377157550 This is a good idea. What is the performance improvement? Did you run any benchmarks? I had imagined generating bytecode using ASM, but generating it in the generated class would work too. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

            +1 for the patch.

            thiru_mg Thiruvalluvan M. G. added a comment - +1 for the patch.
            howellbridger Bridger Howell added a comment -

            This looks like a really good idea. If I have some free time, I'll try to help with code review.

            howellbridger Bridger Howell added a comment - This looks like a really good idea. If I have some free time, I'll try to help with code review.
            githubbot ASF GitHub Bot added a comment -

            GitHub user rstata opened a pull request:

            https://github.com/apache/avro/pull/256

            AVRO-2090: Improve encode/decode time for SpecificRecord using code generation

            Initial patch for AVRO-2090

            You can merge this pull request into a Git repository by running:

            $ git pull https://github.com/rstata-projects/avro AVRO-2090

            Alternatively you can review and apply these changes as the patch at:

            https://github.com/apache/avro/pull/256.patch

            To close this pull request, make a commit to your master/trunk branch
            with (at least) the following in the commit message:

            This closes #256


            commit d2127c7a4051bf7efa56cb0d7e8d9de6ead31c16
            Author: rstata <rstata@yahoo.com>
            Date: 2017-07-13T08:36:39Z

            Saving initial work, still have more to do.

            commit 474cc97315ecfeb5bc79dc366424c342d57d83e2
            Author: rstata <rstata@yahoo.com>
            Date: 2017-07-14T04:22:46Z

            Finished initial implementation (not tested).

            commit 456d667c58df1493190b99bea40b24408e969679
            Author: rstata <rstata@yahoo.com>
            Date: 2017-07-14T05:51:24Z

            Poorly done feature flag, and formatting improvements (incl proper indentation).

            commit b1caba57a90a7fe9a7779c137137315d1d6a99ec
            Author: rstata <rstata@yahoo.com>
            Date: 2017-07-16T08:40:03Z

            Added Reader/Decoder code

            commit 9f8c853f6f43c9ce07eb9ad10da0d6acf9263c5e
            Author: rstata <rstata@yahoo.com>
            Date: 2017-09-16T01:33:16Z

            Updated output files to reflect new specific-compiler strategy.

            commit 83698d9e3ea04e00cd7da87373409b74f77d708a
            Author: rstata <rstata@yahoo.com>
            Date: 2017-10-03T21:44:05Z

            Reverting changes to SpecificFixed

            commit 84e4cbb1ada1ceb97dcec0364a231055cd25142a
            Author: rstata <rstata@yahoo.com>
            Date: 2017-10-04T01:54:24Z

            Change name of feature from Encodable to CustomCoders

            commit e57289bae26683ba4ea3ed30f863be5a79983bc0
            Author: rstata <rstata@yahoo.com>
            Date: 2017-10-04T04:49:15Z

            Fixed bugs in codegen template

            commit f8fae7bc307fae7d51afab7a99025b4213937d40
            Author: rstata <rstata@yahoo.com>
            Date: 2017-10-04T05:32:20Z

            Added feature flag for custom coders

            commit d5b45607ace5fbaf9ee526df2fa285a047365548
            Author: rstata <rstata@yahoo.com>
            Date: 2017-10-08T00:18:34Z

            Remove stale TODO comment


            githubbot ASF GitHub Bot added a comment - GitHub user rstata opened a pull request: https://github.com/apache/avro/pull/256 AVRO-2090 : Improve encode/decode time for SpecificRecord using code generation Initial patch for AVRO-2090 You can merge this pull request into a Git repository by running: $ git pull https://github.com/rstata-projects/avro AVRO-2090 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #256 commit d2127c7a4051bf7efa56cb0d7e8d9de6ead31c16 Author: rstata <rstata@yahoo.com> Date: 2017-07-13T08:36:39Z Saving initial work, still have more to do. commit 474cc97315ecfeb5bc79dc366424c342d57d83e2 Author: rstata <rstata@yahoo.com> Date: 2017-07-14T04:22:46Z Finished initial implementation (not tested). commit 456d667c58df1493190b99bea40b24408e969679 Author: rstata <rstata@yahoo.com> Date: 2017-07-14T05:51:24Z Poorly done feature flag, and formatting improvements (incl proper indentation). commit b1caba57a90a7fe9a7779c137137315d1d6a99ec Author: rstata <rstata@yahoo.com> Date: 2017-07-16T08:40:03Z Added Reader/Decoder code commit 9f8c853f6f43c9ce07eb9ad10da0d6acf9263c5e Author: rstata <rstata@yahoo.com> Date: 2017-09-16T01:33:16Z Updated output files to reflect new specific-compiler strategy. commit 83698d9e3ea04e00cd7da87373409b74f77d708a Author: rstata <rstata@yahoo.com> Date: 2017-10-03T21:44:05Z Reverting changes to SpecificFixed commit 84e4cbb1ada1ceb97dcec0364a231055cd25142a Author: rstata <rstata@yahoo.com> Date: 2017-10-04T01:54:24Z Change name of feature from Encodable to CustomCoders commit e57289bae26683ba4ea3ed30f863be5a79983bc0 Author: rstata <rstata@yahoo.com> Date: 2017-10-04T04:49:15Z Fixed bugs in codegen template commit f8fae7bc307fae7d51afab7a99025b4213937d40 Author: rstata <rstata@yahoo.com> Date: 2017-10-04T05:32:20Z Added feature flag for custom coders commit d5b45607ace5fbaf9ee526df2fa285a047365548 Author: rstata <rstata@yahoo.com> Date: 2017-10-08T00:18:34Z Remove stale TODO comment
            raymie Raymie Stata added a comment -

            Attaching a design document for (forthcoming) patch.

            raymie Raymie Stata added a comment - Attaching a design document for (forthcoming) patch.

            People

              raymie Raymie Stata
              raymie Raymie Stata
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: