Avro
  1. Avro
  2. AVRO-989

Java: Improve Builder performance in Specific API

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None

      Description

      The Specific API generates Builder objects for each record. This builder uses a boolean[] to store flags for each field to indicate whether the field is set or not.

      This is not space efficient, a boolean[] takes 16 bytes plus one byte per field, rounded up to the nearest 8 byte interval.

      This can be improved on by using BitSet for large records, and bitmasks on an int for records with less than 32 fields.

      1. AVRO-989.patch
        6 kB
        James Baldassari
      2. AVRO-989-v2.patch
        12 kB
        James Baldassari

        Activity

        Hide
        James Baldassari added a comment -

        Here's a patch for the boolean[] -> BitSet change.

        Show
        James Baldassari added a comment - Here's a patch for the boolean[] -> BitSet change.
        Hide
        Scott Carey added a comment -

        Overall looks good, but there is a test failure.

        Running org.apache.avro.tool.TestSpecificCompilerTool
        Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.708 sec <<< FAILURE!

        This fails because it is expecting the old output from the specific compiler. All other tests passed.

        Show
        Scott Carey added a comment - Overall looks good, but there is a test failure. Running org.apache.avro.tool.TestSpecificCompilerTool Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.708 sec <<< FAILURE! This fails because it is expecting the old output from the specific compiler. All other tests passed.
        Hide
        James Baldassari added a comment -

        Oh, that's weird; I thought I ran all the tests. I'll check it out and submit another patch.

        Show
        James Baldassari added a comment - Oh, that's weird; I thought I ran all the tests. I'll check it out and submit another patch.
        Hide
        James Baldassari added a comment -

        Here's a new version of the patch that fixes the compiler test issues.

        Show
        James Baldassari added a comment - Here's a new version of the patch that fixes the compiler test issues.
        Hide
        James Baldassari added a comment -

        Does the second version of this patch look ok? All the tests are passing for me, but it would be good to have someone else verify.

        Show
        James Baldassari added a comment - Does the second version of this patch look ok? All the tests are passing for me, but it would be good to have someone else verify.
        Hide
        Scott Carey added a comment -

        I may have some time to test it tomorrow. If I can't by tomorrow, another committer will need to as I'll be unavailable for a couple weeks.

        Show
        Scott Carey added a comment - I may have some time to test it tomorrow. If I can't by tomorrow, another committer will need to as I'll be unavailable for a couple weeks.
        Hide
        Harsh J added a comment -

        I ran tests locally on OSX, and all java tests passed with build.sh clean test.

        [INFO] Apache Avro Java .................................. SUCCESS [8.367s]
        [INFO] Apache Avro ....................................... SUCCESS [1:14.250s]
        [INFO] Apache Avro Compiler .............................. SUCCESS [7.478s]
        [INFO] Apache Avro Maven Plugin .......................... SUCCESS [4.702s]
        [INFO] Apache Avro IPC ................................... SUCCESS [2:42.863s]
        [INFO] Apache Avro Mapred API ............................ SUCCESS [1:43.056s]
        [INFO] Apache Avro Tools ................................. SUCCESS [19.334s]
        [INFO] Apache Avro Protobuf Compatibility ................ SUCCESS [3.816s]
        [INFO] Apache Avro Thrift Compatibility .................. SUCCESS [4.255s]
        [INFO] Apache Avro Maven Archetypes ...................... SUCCESS [3.289s]
        [INFO] Apache Avro Maven Service Archetype ............... SUCCESS [0.890s]
        
        Show
        Harsh J added a comment - I ran tests locally on OSX, and all java tests passed with build.sh clean test . [INFO] Apache Avro Java .................................. SUCCESS [8.367s] [INFO] Apache Avro ....................................... SUCCESS [1:14.250s] [INFO] Apache Avro Compiler .............................. SUCCESS [7.478s] [INFO] Apache Avro Maven Plugin .......................... SUCCESS [4.702s] [INFO] Apache Avro IPC ................................... SUCCESS [2:42.863s] [INFO] Apache Avro Mapred API ............................ SUCCESS [1:43.056s] [INFO] Apache Avro Tools ................................. SUCCESS [19.334s] [INFO] Apache Avro Protobuf Compatibility ................ SUCCESS [3.816s] [INFO] Apache Avro Thrift Compatibility .................. SUCCESS [4.255s] [INFO] Apache Avro Maven Archetypes ...................... SUCCESS [3.289s] [INFO] Apache Avro Maven Service Archetype ............... SUCCESS [0.890s]
        Hide
        Harsh J added a comment -

        (I mean, with this bitset patch applied)

        Show
        Harsh J added a comment - (I mean, with this bitset patch applied)
        Hide
        James Baldassari added a comment -

        Thanks. Shall I commit it then?

        Show
        James Baldassari added a comment - Thanks. Shall I commit it then?
        Hide
        Doug Cutting added a comment -

        This looks good to me. It will cause folks to have to re-generate their code when they upgrade since the base class for generated code has changed incompatibly. Should we hold it for 1.7.0 for that reason?

        (I branch lazily. I have not yet created a 1.6 branch since everything we've committed to trunk so far is also destined for 1.6. If we decide to hold this for 1.7.0 we could create a 1.6 branch from trunk, then commit this to trunk.)

        Show
        Doug Cutting added a comment - This looks good to me. It will cause folks to have to re-generate their code when they upgrade since the base class for generated code has changed incompatibly. Should we hold it for 1.7.0 for that reason? (I branch lazily. I have not yet created a 1.6 branch since everything we've committed to trunk so far is also destined for 1.6. If we decide to hold this for 1.7.0 we could create a 1.6 branch from trunk, then commit this to trunk.)
        Hide
        James Baldassari added a comment -

        That's a good point; this would require regenerating/recompiling the specific record classes. I guess the base classes and the generated implementations are a little too tightly coupled. If we had protected methods like 'boolean isSet(int field)' on RecordBuilderBase then we would be able to change the underlying implementation from a boolean[] to a BitSet without breaking backwards compatibility. It sounds like we'll need to hold this change back until 1.7.0 as you suggested.

        Since we're probably not going to get this in for 1.6, do you think I should rework this patch to reduce coupling in Builder API? Or should that be a separate patch? If reducing coupling in the whole Record API is a separate issue for 1.7, we could also roll in the change to make the record fields private in favor of the accessor/mutator methods introduced in AVRO-839.

        Show
        James Baldassari added a comment - That's a good point; this would require regenerating/recompiling the specific record classes. I guess the base classes and the generated implementations are a little too tightly coupled. If we had protected methods like 'boolean isSet(int field)' on RecordBuilderBase then we would be able to change the underlying implementation from a boolean[] to a BitSet without breaking backwards compatibility. It sounds like we'll need to hold this change back until 1.7.0 as you suggested. Since we're probably not going to get this in for 1.6, do you think I should rework this patch to reduce coupling in Builder API? Or should that be a separate patch? If reducing coupling in the whole Record API is a separate issue for 1.7, we could also roll in the change to make the record fields private in favor of the accessor/mutator methods introduced in AVRO-839 .
        Hide
        Doug Cutting added a comment -

        +1 for reducing the coupling. It's probably worthy of a separate issue. It will probably subsume this issue.

        In addition to making record fields private we might also change the default representation for strings to String (AVRO-803).

        Show
        Doug Cutting added a comment - +1 for reducing the coupling. It's probably worthy of a separate issue. It will probably subsume this issue. In addition to making record fields private we might also change the default representation for strings to String ( AVRO-803 ).

          People

          • Assignee:
            Unassigned
            Reporter:
            Scott Carey
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development