Avro
  1. Avro
  2. AVRO-753

Java: Improve BinaryEncoder Performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Hide
      The Encoder API has several resulting changes:
          * Construction and configuration is handled by EncoderFactory. All
            Constructors are hidden, and Encoder.init(OutputStream) is removed.
          * Some Encoders previously did not buffer output. Users must call
            Encoder.flush() to ensure output is written unless the EncoderFactory
            method used to construct an instance explicitly states that the Encoder
            does not buffer output.
      Show
      The Encoder API has several resulting changes:     * Construction and configuration is handled by EncoderFactory. All       Constructors are hidden, and Encoder.init(OutputStream) is removed.     * Some Encoders previously did not buffer output. Users must call       Encoder.flush() to ensure output is written unless the EncoderFactory       method used to construct an instance explicitly states that the Encoder       does not buffer output.

      Description

      BinaryEncoder has not had a performance improvement pass like BinaryDecoder did. It still mostly writes directly to the underlying OutputStream which is not optimal for performance. I like to use a rule that if you are writing to an OutputStream or reading from an InputStream in chunks smaller than 128 bytes, you have a performance problem.

      Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x performance improvement. The process is significantly simpler than BinaryDecoder because 'pushing' is easier than 'pulling' – and also because we do not need a 'direct' variant because BinaryEncoder already buffers sometimes.

      1. AVRO-753.v1.patch
        14 kB
        Scott Carey
      2. AVRO-753.v2.patch
        125 kB
        Scott Carey
      3. AVRO-753.v3.patch
        129 kB
        Scott Carey
      4. AVRO-753.v4.patch
        132 kB
        Scott Carey

        Issue Links

          Activity

          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Scott Carey made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Incompatible change]
          Release Note The Encoder API has several resulting changes:
              * Construction and configuration is handled by EncoderFactory. All
                Constructors are hidden, and Encoder.init(OutputStream) is removed.
              * Some Encoders previously did not buffer output. Users must call
                Encoder.flush() to ensure output is written unless the EncoderFactory
                method used to construct an instance explicitly states that the Encoder
                does not buffer output.
          Resolution Fixed [ 1 ]
          Hide
          Scott Carey added a comment -

          Committed in 1074364

          Show
          Scott Carey added a comment - Committed in 1074364
          Hide
          Doug Cutting added a comment -

          +1

          Show
          Doug Cutting added a comment - +1
          Hide
          Scott Carey added a comment -

          If there are no objections, I'll commit this soon as-is, and treat the javadoc/package.html issues as part of AVRO-769. The doc needs review as a whole for the entire API anyway, and it will be easier to review as such.

          Show
          Scott Carey added a comment - If there are no objections, I'll commit this soon as-is, and treat the javadoc/package.html issues as part of AVRO-769 . The doc needs review as a whole for the entire API anyway, and it will be easier to review as such.
          Hide
          Doug Cutting added a comment -

          This looks great! It does generate several javadoc warnings, but we could commit it and fix those in a subsequent pass. We should probably improve the package.html for the io package to point folks to EncoderFactory and DecoderFactory as primary entry points.

          Show
          Doug Cutting added a comment - This looks great! It does generate several javadoc warnings, but we could commit it and fix those in a subsequent pass. We should probably improve the package.html for the io package to point folks to EncoderFactory and DecoderFactory as primary entry points.
          Scott Carey made changes -
          Attachment AVRO-753.v4.patch [ 12471455 ]
          Hide
          Scott Carey added a comment -

          patch contains the following changes since the last:

          • Added EncoderFactory.get() which returns an immutable static factory instance.
          • Moved optimized binary write methods to BinaryData from BinaryEncoder.
          Show
          Scott Carey added a comment - patch contains the following changes since the last: Added EncoderFactory.get() which returns an immutable static factory instance. Moved optimized binary write methods to BinaryData from BinaryEncoder.
          Hide
          Scott Carey added a comment -
          • EncoderFactory.get() : This would mirror what we have in DecoderFactory, and I had something similar in in a previous version. I'll add it back and see how much that cleans up other code.
          • I'll move them. BinaryData is a good home for these, and already has a similar 'skipLong' method. Making them a public part of the API makes sense too.
          • Those methods are not quite equal, ensureBounds() is private in each implementation, and tends to a different buffer. It can't be made protected, since both have to exist at the same time to manage both buffers. Another way to state it is that ensureBounds() is an implementation detail, named the same thing in the two classes and similar in function, but not polymorphic.

          I'll put together a patch with changes later today.

          Show
          Scott Carey added a comment - EncoderFactory.get() : This would mirror what we have in DecoderFactory, and I had something similar in in a previous version. I'll add it back and see how much that cleans up other code. I'll move them. BinaryData is a good home for these, and already has a similar 'skipLong' method. Making them a public part of the API makes sense too. Those methods are not quite equal, ensureBounds() is private in each implementation, and tends to a different buffer. It can't be made protected, since both have to exist at the same time to manage both buffers. Another way to state it is that ensureBounds() is an implementation detail, named the same thing in the two classes and similar in function, but not polymorphic. I'll put together a patch with changes later today.
          Doug Cutting made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Hide
          Doug Cutting added a comment -

          Looks good, passes tests. A few questions:

          • Should we provide a 'public static EncoderFactory get()' method that returns an immutable factory? There's lot of 'new EncoderFactory().binaryEncoder();' and 'private static final EncoderFactor FACTORY = new EncoderFactory();' sprinkled in the code that might use this instead.
          • Should the 'protected final' BinaryEncoder methods perhaps be made 'public static'? They might even live on BinaryData instead, as a common place for such static utility methods related to the binary encoding. I don't feel too strongly about this, but it seems like these highly optimized routines might also be useful in other contexts.
          • Why does BlockingBinaryEncoder override BufferedBinaryEncoder methods with identical implementations, e.g., writeBoolean(), writeInt(), writeFloat()? Is this intentional, accidental or did I miss something?
          Show
          Doug Cutting added a comment - Looks good, passes tests. A few questions: Should we provide a 'public static EncoderFactory get()' method that returns an immutable factory? There's lot of 'new EncoderFactory().binaryEncoder();' and 'private static final EncoderFactor FACTORY = new EncoderFactory();' sprinkled in the code that might use this instead. Should the 'protected final' BinaryEncoder methods perhaps be made 'public static'? They might even live on BinaryData instead, as a common place for such static utility methods related to the binary encoding. I don't feel too strongly about this, but it seems like these highly optimized routines might also be useful in other contexts. Why does BlockingBinaryEncoder override BufferedBinaryEncoder methods with identical implementations, e.g., writeBoolean(), writeInt(), writeFloat()? Is this intentional, accidental or did I miss something?
          Scott Carey made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Carey made changes -
          Attachment AVRO-753.v3.patch [ 12471250 ]
          Hide
          Scott Carey added a comment -

          Updated patch, v3:

          Cleaner design, breaks the Encoder API with respect to initialization and configuration of encoders.

          All Encoders have no public constructors, and go through EncoderFactory.
          BinaryEncoder is an abstract type, with three subtypes:
          DirectBinaryEncoder, BufferedBinaryEncoder, and BlockingBinaryEncoder.

          Encoder.init(OutputStream) is removed, all construction and configuration flow thorugh EncoderFactory. Encoder's API is strictly about writing Avro primitives.

          Much JavaDoc.

          Intended CHANGES.txt message included. I think this is ready.

          Show
          Scott Carey added a comment - Updated patch, v3: Cleaner design, breaks the Encoder API with respect to initialization and configuration of encoders. All Encoders have no public constructors, and go through EncoderFactory. BinaryEncoder is an abstract type, with three subtypes: DirectBinaryEncoder, BufferedBinaryEncoder, and BlockingBinaryEncoder. Encoder.init(OutputStream) is removed, all construction and configuration flow thorugh EncoderFactory. Encoder's API is strictly about writing Avro primitives. Much JavaDoc. Intended CHANGES.txt message included. I think this is ready.
          Scott Carey made changes -
          Link This issue blocks AVRO-769 [ AVRO-769 ]
          Hide
          Doug Cutting added a comment -

          +1 for removing other deprecated stuff now, to establish new baseline encoder/decoder APIs.

          Show
          Doug Cutting added a comment - +1 for removing other deprecated stuff now, to establish new baseline encoder/decoder APIs.
          Hide
          Scott Carey added a comment -

          If we do go with that approach, perhaps it is also time to remove several deprecated methods elsewhere, rather than have that pain hit in 1.6 or later as well. I'm thinking mostly of other things on the Decoder side. Perhaps we can look at the Decoder and Encoder side with more scrutiny and label it "stable" after this pass?

          We're starting to be used in more frameworks, and once that happens API instability will be harder to manage. Its not too bad for one application to switch, but when we're brought in via multiple third party paths it becomes a problem.

          I'll post an alternate patch that rearranges the classes and changes the Factory to hide more abstraction (return the abstract parent type, not the implementation type, where possible).

          Show
          Scott Carey added a comment - If we do go with that approach, perhaps it is also time to remove several deprecated methods elsewhere, rather than have that pain hit in 1.6 or later as well. I'm thinking mostly of other things on the Decoder side. Perhaps we can look at the Decoder and Encoder side with more scrutiny and label it "stable" after this pass? We're starting to be used in more frameworks, and once that happens API instability will be harder to manage. Its not too bad for one application to switch, but when we're brought in via multiple third party paths it becomes a problem. I'll post an alternate patch that rearranges the classes and changes the Factory to hide more abstraction (return the abstract parent type, not the implementation type, where possible).
          Hide
          Doug Cutting added a comment -

          I also much prefer the cleaner class hierarchy.

          The biggest downside I see of forcing applications to upgrade is that they're unable to simply drop in newer jar files. That means that applications that use a library that was developed against Avro 1.4 will not be able to use Avro 1.5 until that library is also upgraded. But we've already made several incompatible API changes in Avro 1.5 so it may be too late to worry about that.

          Show
          Doug Cutting added a comment - I also much prefer the cleaner class hierarchy. The biggest downside I see of forcing applications to upgrade is that they're unable to simply drop in newer jar files. That means that applications that use a library that was developed against Avro 1.4 will not be able to use Avro 1.5 until that library is also upgraded. But we've already made several incompatible API changes in Avro 1.5 so it may be too late to worry about that.
          Hide
          Scott Carey added a comment -

          Performance results from the above patch.

          I tested with Sun JRE 6u22 (64 bit) on Mac OS X 10.6.6 pm a 2.4 Ghz Intel Core i5 (2 cores, 4 threads, can 'turbo' up to 2.93Ghz).

          I used the following JVM arguments:
          -server -Xmx256m -Xms256m -XX:+UseParallelGC -XX:+UseCompressedOops -XX:+DoEscapeAnalysis -XX:+UseLoopPredicate

          ParallelGC is fast and most common on servers. CompressedOops is highly recommended if running 64 bit, it improves performance and reduces memory footprint.
          The last two are default flags in JRE 6u23 and above, but are not in 6u22. These have measurable impact on the tests. UseLoopPredicate speeds up a couple cases by 10%.

          A 32 bit JVM slows down somewhat. In particular, writeLong is about 35% slower, and a few other cases degrade by 15% or so. Some others (writeDouble, writeFloat) don't change. More registers, and 64 bit integer native registers, help some of the inner loops significantly. I expect non-Intel hardware to behave more like the 64 bit case.

          I ran with the '-noread' command line option of Perf.java

          This is the performance of the legacy encoder:

          old legacy encoder:
                              test name     time    M entries/sec   M bytes/sec  bytes/cycle
                               IntWrite:   3784 ms      52.849       133.036        629325
                         SmallLongWrite:   3715 ms      53.828       135.500        629325
                              LongWrite:   6153 ms      32.502       142.013       1092353
                             FloatWrite:   7289 ms      27.437       109.748       1000000
                            DoubleWrite:  13988 ms      14.298       114.383       2000000
                           BooleanWrite:   2150 ms      93.001        93.001        250000
                             BytesWrite:   2588 ms      15.451       549.113       1776937
                            StringWrite:   9656 ms       4.142       147.535       1780910
                             ArrayWrite:   7315 ms      27.340       109.359       1000006
                               MapWrite:   8727 ms      22.916       114.581       1250004
                            RecordWrite:  10204 ms       3.266       126.771       1617069
                  ValidatingRecordWrite:  11584 ms       2.877       111.673       1617069
                           GenericWrite:   7522 ms       2.216        85.986        808498
                    GenericNested_Write:   9713 ms       1.716        66.588        808498
                GenericNestedFake_Write:   5893 ms       2.828       109.743        808498
          

          And the new BinaryEncoder:

                              test name     time    M entries/sec   M bytes/sec  bytes/cycle
                               IntWrite:   1558 ms     128.342       323.076        629325
                         SmallLongWrite:   1495 ms     133.760       336.714        629325
                              LongWrite:   2736 ms      73.083       319.329       1092353
                             FloatWrite:   1286 ms     155.517       622.066       1000000
                            DoubleWrite:   2005 ms      99.742       797.935       2000000
                           BooleanWrite:    597 ms     334.696       334.696        250000
                             BytesWrite:   2491 ms      16.054       570.550       1776937
                            StringWrite:   9050 ms       4.420       157.417       1780910
                             ArrayWrite:   1352 ms     147.852       591.412       1000006
                               MapWrite:   2245 ms      89.054       445.269       1250004
                            RecordWrite:   2418 ms      13.780       534.813       1617069
                  ValidatingRecordWrite:   4191 ms       7.952       308.631       1617069
                           GenericWrite:   3477 ms       4.792       185.978        808498
                    GenericNested_Write:   5661 ms       2.944       114.249        808498
                GenericNestedFake_Write:   2068 ms       8.057       312.696        808498
          

          Performance ranges from 2x to 7x faster, except for writing byte arrays and strings, which are only slightly faster. The test above writes strings and byte arrays that average 35 bytes in size – smaller ones will benefit more from the buffering, especially with high overhead OutputStreams.

          This is the performance of the new non-buffering variation, DirectBinaryEncoder:

                              test name     time    M entries/sec   M bytes/sec  bytes/cycle
                               IntWrite:   3446 ms      58.023       146.062        629325
                         SmallLongWrite:   3491 ms      57.274       144.176        629325
                              LongWrite:   5931 ms      33.716       147.320       1092353
                             FloatWrite:   4337 ms      46.105       184.419       1000000
                            DoubleWrite:   5525 ms      36.194       289.556       2000000
                           BooleanWrite:   1949 ms     102.603       102.603        250000
                             BytesWrite:   2814 ms      14.212       505.091       1776937
                            StringWrite:   9480 ms       4.219       150.285       1780910
                             ArrayWrite:   4437 ms      45.068       180.273       1000006
                               MapWrite:   5803 ms      34.464       172.321       1250004
                            RecordWrite:   5005 ms       6.659       258.446       1617069
                  ValidatingRecordWrite:   6519 ms       5.113       198.419       1617069
                           GenericWrite:   4978 ms       3.348       129.920        808498
                    GenericNested_Write:   6966 ms       2.392        92.838        808498
                GenericNestedFake_Write:   3507 ms       4.752       184.430        808498
          

          This is between 0x and 2.5x faster than the 'legacy' BinaryEncoder, with Float and Double encoding significantly faster and most other things only slightly faster. It is still substantially slower than the buffering variation.

          Next up: BlockingBinaryEncoder. This is essentially the same performance as the BinaryEncoder, however it defaults to a larger buffer size (64K instead of 2K) and due to this is slightly faster, except for MapWrite, ArrayWrite, where blocking is in effect.

                              test name     time    M entries/sec   M bytes/sec  bytes/cycle
                               IntWrite:   1512 ms     132.260       332.937        629325
                         SmallLongWrite:   1459 ms     137.012       344.902        629325
                              LongWrite:   2640 ms      75.739       330.937       1092353
                             FloatWrite:   1265 ms     158.088       632.352       1000000
                            DoubleWrite:   1999 ms     100.004       800.032       2000000
                           BooleanWrite:    638 ms     313.294       313.294        250000
                             BytesWrite:   2458 ms      16.273       578.305       1776937
                            StringWrite:   9259 ms       4.320       153.862       1780910
                             ArrayWrite:   1443 ms     138.580       554.373       1000098
                               MapWrite:   2589 ms      77.233       386.200       1250119
                            RecordWrite:   3001 ms      11.104       430.964       1617069
                  ValidatingRecordWrite:   5829 ms       5.718       221.933       1617069
                           GenericWrite:   3545 ms       4.701       182.450        808498
                    GenericNested_Write:   5831 ms       2.858       110.906        808498
                GenericNestedFake_Write:   2052 ms       8.119       315.091        808498
          

          And for those curious, this is what JSON looks like:

                              test name     time    M entries/sec   M bytes/sec  bytes/cycle
                               IntWrite:  10238 ms      19.534       115.334       1476104
                         SmallLongWrite:  10383 ms      19.261       113.722       1476104
                              LongWrite:  18078 ms      11.063       109.950       2484706
                             FloatWrite:  50300 ms       3.976        42.252       2656635
                            DoubleWrite:  96585 ms       2.071        39.894       4816469
                           BooleanWrite:   8940 ms      22.369       123.022       1374900
                             BytesWrite:  40859 ms       0.979        72.197       3687468
                            StringWrite:   9021 ms       4.434       166.411       1876635
                             ArrayWrite:  59728 ms       3.349        54.000       4031647
                               MapWrite:  63564 ms       3.146        55.460       4406637
                            RecordWrite:  63687 ms       0.523        64.246       5114596
                  ValidatingRecordWrite:  65488 ms       0.509        62.480       5114596
                           GenericWrite:  34985 ms       0.476        58.478       2557400
                    GenericNested_Write:  42137 ms       0.396        58.047       3057392
                GenericNestedFake_Write:  37551 ms       0.444        65.134       3057392
          

          Note that included in all of these results (including the legacy result) is improved string <> Utf8 conversion in Utf8.java. This brings String encoding up from ~120MB/sec to ~160MB/sec. I noticed that Jackson was faster than our binary encoder for the string test case, and now it is a tie. There is more to do there, but it is dominated by JVM code that isn't as optimal as it should be.

          Show
          Scott Carey added a comment - Performance results from the above patch. I tested with Sun JRE 6u22 (64 bit) on Mac OS X 10.6.6 pm a 2.4 Ghz Intel Core i5 (2 cores, 4 threads, can 'turbo' up to 2.93Ghz). I used the following JVM arguments: -server -Xmx256m -Xms256m -XX:+UseParallelGC -XX:+UseCompressedOops -XX:+DoEscapeAnalysis -XX:+UseLoopPredicate ParallelGC is fast and most common on servers. CompressedOops is highly recommended if running 64 bit, it improves performance and reduces memory footprint. The last two are default flags in JRE 6u23 and above, but are not in 6u22. These have measurable impact on the tests. UseLoopPredicate speeds up a couple cases by 10%. A 32 bit JVM slows down somewhat. In particular, writeLong is about 35% slower, and a few other cases degrade by 15% or so. Some others (writeDouble, writeFloat) don't change. More registers, and 64 bit integer native registers, help some of the inner loops significantly. I expect non-Intel hardware to behave more like the 64 bit case. I ran with the '-noread' command line option of Perf.java This is the performance of the legacy encoder: old legacy encoder: test name time M entries/sec M bytes/sec bytes/cycle IntWrite: 3784 ms 52.849 133.036 629325 SmallLongWrite: 3715 ms 53.828 135.500 629325 LongWrite: 6153 ms 32.502 142.013 1092353 FloatWrite: 7289 ms 27.437 109.748 1000000 DoubleWrite: 13988 ms 14.298 114.383 2000000 BooleanWrite: 2150 ms 93.001 93.001 250000 BytesWrite: 2588 ms 15.451 549.113 1776937 StringWrite: 9656 ms 4.142 147.535 1780910 ArrayWrite: 7315 ms 27.340 109.359 1000006 MapWrite: 8727 ms 22.916 114.581 1250004 RecordWrite: 10204 ms 3.266 126.771 1617069 ValidatingRecordWrite: 11584 ms 2.877 111.673 1617069 GenericWrite: 7522 ms 2.216 85.986 808498 GenericNested_Write: 9713 ms 1.716 66.588 808498 GenericNestedFake_Write: 5893 ms 2.828 109.743 808498 And the new BinaryEncoder: test name time M entries/sec M bytes/sec bytes/cycle IntWrite: 1558 ms 128.342 323.076 629325 SmallLongWrite: 1495 ms 133.760 336.714 629325 LongWrite: 2736 ms 73.083 319.329 1092353 FloatWrite: 1286 ms 155.517 622.066 1000000 DoubleWrite: 2005 ms 99.742 797.935 2000000 BooleanWrite: 597 ms 334.696 334.696 250000 BytesWrite: 2491 ms 16.054 570.550 1776937 StringWrite: 9050 ms 4.420 157.417 1780910 ArrayWrite: 1352 ms 147.852 591.412 1000006 MapWrite: 2245 ms 89.054 445.269 1250004 RecordWrite: 2418 ms 13.780 534.813 1617069 ValidatingRecordWrite: 4191 ms 7.952 308.631 1617069 GenericWrite: 3477 ms 4.792 185.978 808498 GenericNested_Write: 5661 ms 2.944 114.249 808498 GenericNestedFake_Write: 2068 ms 8.057 312.696 808498 Performance ranges from 2x to 7x faster, except for writing byte arrays and strings, which are only slightly faster. The test above writes strings and byte arrays that average 35 bytes in size – smaller ones will benefit more from the buffering, especially with high overhead OutputStreams. This is the performance of the new non-buffering variation, DirectBinaryEncoder: test name time M entries/sec M bytes/sec bytes/cycle IntWrite: 3446 ms 58.023 146.062 629325 SmallLongWrite: 3491 ms 57.274 144.176 629325 LongWrite: 5931 ms 33.716 147.320 1092353 FloatWrite: 4337 ms 46.105 184.419 1000000 DoubleWrite: 5525 ms 36.194 289.556 2000000 BooleanWrite: 1949 ms 102.603 102.603 250000 BytesWrite: 2814 ms 14.212 505.091 1776937 StringWrite: 9480 ms 4.219 150.285 1780910 ArrayWrite: 4437 ms 45.068 180.273 1000006 MapWrite: 5803 ms 34.464 172.321 1250004 RecordWrite: 5005 ms 6.659 258.446 1617069 ValidatingRecordWrite: 6519 ms 5.113 198.419 1617069 GenericWrite: 4978 ms 3.348 129.920 808498 GenericNested_Write: 6966 ms 2.392 92.838 808498 GenericNestedFake_Write: 3507 ms 4.752 184.430 808498 This is between 0x and 2.5x faster than the 'legacy' BinaryEncoder, with Float and Double encoding significantly faster and most other things only slightly faster. It is still substantially slower than the buffering variation. Next up: BlockingBinaryEncoder. This is essentially the same performance as the BinaryEncoder, however it defaults to a larger buffer size (64K instead of 2K) and due to this is slightly faster, except for MapWrite, ArrayWrite, where blocking is in effect. test name time M entries/sec M bytes/sec bytes/cycle IntWrite: 1512 ms 132.260 332.937 629325 SmallLongWrite: 1459 ms 137.012 344.902 629325 LongWrite: 2640 ms 75.739 330.937 1092353 FloatWrite: 1265 ms 158.088 632.352 1000000 DoubleWrite: 1999 ms 100.004 800.032 2000000 BooleanWrite: 638 ms 313.294 313.294 250000 BytesWrite: 2458 ms 16.273 578.305 1776937 StringWrite: 9259 ms 4.320 153.862 1780910 ArrayWrite: 1443 ms 138.580 554.373 1000098 MapWrite: 2589 ms 77.233 386.200 1250119 RecordWrite: 3001 ms 11.104 430.964 1617069 ValidatingRecordWrite: 5829 ms 5.718 221.933 1617069 GenericWrite: 3545 ms 4.701 182.450 808498 GenericNested_Write: 5831 ms 2.858 110.906 808498 GenericNestedFake_Write: 2052 ms 8.119 315.091 808498 And for those curious, this is what JSON looks like: test name time M entries/sec M bytes/sec bytes/cycle IntWrite: 10238 ms 19.534 115.334 1476104 SmallLongWrite: 10383 ms 19.261 113.722 1476104 LongWrite: 18078 ms 11.063 109.950 2484706 FloatWrite: 50300 ms 3.976 42.252 2656635 DoubleWrite: 96585 ms 2.071 39.894 4816469 BooleanWrite: 8940 ms 22.369 123.022 1374900 BytesWrite: 40859 ms 0.979 72.197 3687468 StringWrite: 9021 ms 4.434 166.411 1876635 ArrayWrite: 59728 ms 3.349 54.000 4031647 MapWrite: 63564 ms 3.146 55.460 4406637 RecordWrite: 63687 ms 0.523 64.246 5114596 ValidatingRecordWrite: 65488 ms 0.509 62.480 5114596 GenericWrite: 34985 ms 0.476 58.478 2557400 GenericNested_Write: 42137 ms 0.396 58.047 3057392 GenericNestedFake_Write: 37551 ms 0.444 65.134 3057392 Note that included in all of these results (including the legacy result) is improved string <> Utf8 conversion in Utf8.java. This brings String encoding up from ~120MB/sec to ~160MB/sec. I noticed that Jackson was faster than our binary encoder for the string test case, and now it is a tie. There is more to do there, but it is dominated by JVM code that isn't as optimal as it should be.
          Scott Carey made changes -
          Attachment AVRO-753.v2.patch [ 12470967 ]
          Hide
          Scott Carey added a comment -

          This patch changes BinaryEncoder for significantly improved performance. This requires that all users of BinaryEncoder use the Encoder API properly and call flush() as needed.

          This has resulted in 4 BinaryEncoder related classes:

          • AbstractBinaryEncoder – defines the common API and has much shared code, mostly low level encoding functions.
          • BinaryEncoder – a fast encoder that buffers, by default up to 2k.
          • BlockingBinaryEncoder – a buffering encoder that implements blocking of arrays and maps, extends BinaryEncoder
          • DirectBinaryEncoder – a light-weight encoder that does not buffer but is about 2.2 times slower than BinaryEncoder.

          I have implemented an EncoderFactory and deprected Encoder.init(OutputStream) in favor of having the factory or implementations take care of that. There are some other options for this factory that might better hide abstractions like BlockingBinaryEncoder, but the included one here is the simple.

          The decisions / discussions around this change that I am uncertain of are:

          • API Changes and migration: This change makes BinaryEncoder buffer all the time, instead of only sometimes. All prior uses that did not call flush() were bugs, but they are surely out in the wild. This variation leaves BinaryEncoder constructable the old way (the constructor is deprecated, but still there) so users might introduce bugs form this change silently. We could remove the constructor entirely, and force a choice through the factory to solve this instead.
          • Class Heirarchy. AbstractBinaryEncoder is package protected, and DirectBinaryEncoder does not inherit from BinaryEncoder (to keep it light weight with minimal member variables and overrides). Another option is to rename BinaryEncodr to BufferedBinaryEncoder, and then change the name of AbstractBinaryEncoder to BinaryEncoder and make it public. This is probably the best representation of the classes, but means that BinaryEncoder can no longer be constructed. It could lead to a cleaner Factory as well – the factory could always return the abstract BinaryEncoder type and thus we could hide more implementation details behind it and not expose the concrete classes.

          I prefer the cleaner factory and class heirarchy to encapsulate the details. For exmple, it would allow us to later merge BufferedBinaryEncoder and BlockingBinaryEncoder and not affect any user code. But it means that right now, we break an API without deprecating it first – BinaryEncoder would not have public constructors. A side effect would be that users compile breaks, forcing them to choose the fast buffered, or slower direct implementation.

          Show
          Scott Carey added a comment - This patch changes BinaryEncoder for significantly improved performance. This requires that all users of BinaryEncoder use the Encoder API properly and call flush() as needed. This has resulted in 4 BinaryEncoder related classes: AbstractBinaryEncoder – defines the common API and has much shared code, mostly low level encoding functions. BinaryEncoder – a fast encoder that buffers, by default up to 2k. BlockingBinaryEncoder – a buffering encoder that implements blocking of arrays and maps, extends BinaryEncoder DirectBinaryEncoder – a light-weight encoder that does not buffer but is about 2.2 times slower than BinaryEncoder. I have implemented an EncoderFactory and deprected Encoder.init(OutputStream) in favor of having the factory or implementations take care of that. There are some other options for this factory that might better hide abstractions like BlockingBinaryEncoder, but the included one here is the simple. The decisions / discussions around this change that I am uncertain of are: API Changes and migration: This change makes BinaryEncoder buffer all the time, instead of only sometimes. All prior uses that did not call flush() were bugs, but they are surely out in the wild. This variation leaves BinaryEncoder constructable the old way (the constructor is deprecated, but still there) so users might introduce bugs form this change silently. We could remove the constructor entirely, and force a choice through the factory to solve this instead. Class Heirarchy. AbstractBinaryEncoder is package protected, and DirectBinaryEncoder does not inherit from BinaryEncoder (to keep it light weight with minimal member variables and overrides). Another option is to rename BinaryEncodr to BufferedBinaryEncoder, and then change the name of AbstractBinaryEncoder to BinaryEncoder and make it public. This is probably the best representation of the classes, but means that BinaryEncoder can no longer be constructed. It could lead to a cleaner Factory as well – the factory could always return the abstract BinaryEncoder type and thus we could hide more implementation details behind it and not expose the concrete classes. I prefer the cleaner factory and class heirarchy to encapsulate the details. For exmple, it would allow us to later merge BufferedBinaryEncoder and BlockingBinaryEncoder and not affect any user code. But it means that right now, we break an API without deprecating it first – BinaryEncoder would not have public constructors. A side effect would be that users compile breaks, forcing them to choose the fast buffered, or slower direct implementation.
          Hide
          Scott Carey added a comment -

          Thanks Thiru, it is making more sense now. I've gotten BlockingBinaryEncoder integrated with my changes and it shares more code with BinaryEncoder.

          I'm nearly done here, but have found lots of bugs in Avro and our tests as a result. In most places, we assume that BinaryEncoder does not buffer. These are bugs because the Encoder contract calls out that it may buffer, and has a flush() method. JsonEncoder, BlockingBinaryEncoder, and some corner cases of BinaryEncoder do buffer, and there are several bits of code that have an Encoder object that is used without calling flush().

          Show
          Scott Carey added a comment - Thanks Thiru, it is making more sense now. I've gotten BlockingBinaryEncoder integrated with my changes and it shares more code with BinaryEncoder. I'm nearly done here, but have found lots of bugs in Avro and our tests as a result. In most places, we assume that BinaryEncoder does not buffer. These are bugs because the Encoder contract calls out that it may buffer, and has a flush() method. JsonEncoder, BlockingBinaryEncoder, and some corner cases of BinaryEncoder do buffer, and there are several bits of code that have an Encoder object that is used without calling flush().
          Hide
          Doug Cutting added a comment -

          Good point, Thiru--overflow doesn't need to be a special case in the format.

          Show
          Doug Cutting added a comment - Good point, Thiru--overflow doesn't need to be a special case in the format.
          Hide
          Thiruvalluvan M. G. added a comment - - edited

          We just write (positive) 1 for item-count and no byte-count (line 602 in BlockedBinaryEncoder.java) for overflow blocks. Since the reader is not supposed to expect byte-count when it encounters a positive item-count, it works.

          Show
          Thiruvalluvan M. G. added a comment - - edited We just write (positive) 1 for item-count and no byte-count (line 602 in BlockedBinaryEncoder.java) for overflow blocks. Since the reader is not supposed to expect byte-count when it encounters a positive item-count, it works.
          Hide
          Doug Cutting added a comment -

          The length of each block in the array is known, but not of the entire array. Arrays are buffered by block (except in the "overflow" case).

          The byte count of a block is 'unknown' when the item count is positive.

          I agree that the "overflow" case is ambiguous and should probably be changed to (-1,-1). Thiru?

          Show
          Doug Cutting added a comment - The length of each block in the array is known, but not of the entire array. Arrays are buffered by block (except in the "overflow" case). The byte count of a block is 'unknown' when the item count is positive. I agree that the "overflow" case is ambiguous and should probably be changed to (-1,-1). Thiru?
          Hide
          Scott Carey added a comment -

          I need to change BlockingBinaryEncoder as part of this process. It appears that I can simplify it significantly since both the new BinaryEncoder and the blocking variant will need to buffer data in a similar way and thus they will share a lot more code.

          I want to clarify how it should work. The spec doesn't seem to have the answer.
          It says: "If a block's count is negative, its absolute value is used, and the count is followed immediately by a long block size indicating the number of bytes in the block." and "The blocked representation permits one to read and write arrays larger than can be buffered in memory, since one can start writing items without knowing the full length of the array."

          If you need to write the length of the block, how can you write without knowing the full length of the array? Looking at the code, it mentions this:

          "Regular" blocks have a non-zero byte count.
          "Overflow" blocks help us deal with the case where a block
          contains a value that's too big to buffer. In this case, the
          block contains only one item, and we give it an unknown
          byte-count. Because these values (1,unknown) are fixed, we're
          able to write the header for these overflow blocks to the
          underlying stream without seeing the entire block. After writing
          this header, we've freed our buffer space to be fully devoted to
          blocking the large, inner value.

          The spec does not mention that a block can have an 'unknown' byte count. Is this something that should be added to the spec? Or is it somewhere else that I did not notice?

          The code indicates that an 'overflow' block has one item (count = -1) and size = 0. That seems a little ambiguous, "-1, -1" would make more sense since a negative size is impossible. A valid record can have size zero if it only contains null fields.

          I'm refactoring BlockingBinaryEncoder to share code with the new BinaryEncoder and be a little simpler. I don't intend to change its behavior but it would help to know more about the details of encoding 'too large to buffer' array values.

          Show
          Scott Carey added a comment - I need to change BlockingBinaryEncoder as part of this process. It appears that I can simplify it significantly since both the new BinaryEncoder and the blocking variant will need to buffer data in a similar way and thus they will share a lot more code. I want to clarify how it should work. The spec doesn't seem to have the answer. It says: "If a block's count is negative, its absolute value is used, and the count is followed immediately by a long block size indicating the number of bytes in the block." and "The blocked representation permits one to read and write arrays larger than can be buffered in memory, since one can start writing items without knowing the full length of the array." If you need to write the length of the block, how can you write without knowing the full length of the array? Looking at the code, it mentions this: "Regular" blocks have a non-zero byte count. "Overflow" blocks help us deal with the case where a block contains a value that's too big to buffer. In this case, the block contains only one item, and we give it an unknown byte-count. Because these values (1,unknown) are fixed, we're able to write the header for these overflow blocks to the underlying stream without seeing the entire block. After writing this header, we've freed our buffer space to be fully devoted to blocking the large, inner value. The spec does not mention that a block can have an 'unknown' byte count. Is this something that should be added to the spec? Or is it somewhere else that I did not notice? The code indicates that an 'overflow' block has one item (count = -1) and size = 0. That seems a little ambiguous, "-1, -1" would make more sense since a negative size is impossible. A valid record can have size zero if it only contains null fields. I'm refactoring BlockingBinaryEncoder to share code with the new BinaryEncoder and be a little simpler. I don't intend to change its behavior but it would help to know more about the details of encoding 'too large to buffer' array values.
          Hide
          Scott Carey added a comment -

          Nope. I'm just losing a bit of my mind. Decoder and Encoder are both abstract classes. OutputStream too. So the only related option is making Encoder extend OutputStream, but I'm not sure that is a good idea, since not all encoders encode to a byte stream. Making Encoder an interface would hurt performance. BinaryDecoder's inputStream() method was made due to the same reasoning.

          Show
          Scott Carey added a comment - Nope. I'm just losing a bit of my mind. Decoder and Encoder are both abstract classes. OutputStream too. So the only related option is making Encoder extend OutputStream, but I'm not sure that is a good idea, since not all encoders encode to a byte stream. Making Encoder an interface would hurt performance. BinaryDecoder's inputStream() method was made due to the same reasoning.
          Hide
          Doug Cutting added a comment -

          I think you mean that BinaryEncoder would extend BufferedOutputStream, not implement it, right? That seems fine to me since BinaryEncoder's public methods already all come from the Encoder interface and we don't lose any abstraction. But it could get tricky to also have DirectBinaryEncoder extend OutputStream, since it couldn't also then extend BinaryEncoder. It might be easier if OutputStream was an interface... Am I missing something?

          Show
          Doug Cutting added a comment - I think you mean that BinaryEncoder would extend BufferedOutputStream, not implement it, right? That seems fine to me since BinaryEncoder's public methods already all come from the Encoder interface and we don't lose any abstraction. But it could get tricky to also have DirectBinaryEncoder extend OutputStream, since it couldn't also then extend BinaryEncoder. It might be easier if OutputStream was an interface... Am I missing something?
          Hide
          Scott Carey added a comment -

          I think that 1.4.1 is significantly faster for Specific/Generic decoding than 1.3.2 due to AVRO-557.

          What about the buffering issue?

          Does it make sense to follow our Decoder pattern and have BinaryEncoder and DirectBinaryEncoder? Or follow the OutputStream naming convention and have BinaryEncoder (implements OutputStream) and BufferedBinaryEncoder (implements BufferedOutputStream) ?

          The former matches our decoder convention, but the latter will not introduce bugs to users who don't call flush() properly now.
          I'm leaning towards the latter, with careful javadoc and release notes.

          Show
          Scott Carey added a comment - I think that 1.4.1 is significantly faster for Specific/Generic decoding than 1.3.2 due to AVRO-557 . What about the buffering issue? Does it make sense to follow our Decoder pattern and have BinaryEncoder and DirectBinaryEncoder? Or follow the OutputStream naming convention and have BinaryEncoder (implements OutputStream) and BufferedBinaryEncoder (implements BufferedOutputStream) ? The former matches our decoder convention, but the latter will not introduce bugs to users who don't call flush() properly now. I'm leaning towards the latter, with careful javadoc and release notes.
          Hide
          Doug Cutting added a comment -

          It would be great to improve our ranking the "thrift/protobuf compare" benchmark. Currently that has Avro 1.3.2 at about half the speed of Thrift and Protobuf, but most of the gap is in deserialization.

          I like the idea of committing the simpler writeString() optimization first, then continuing to tune. +1

          Show
          Doug Cutting added a comment - It would be great to improve our ranking the "thrift/protobuf compare" benchmark. Currently that has Avro 1.3.2 at about half the speed of Thrift and Protobuf, but most of the gap is in deserialization. I like the idea of committing the simpler writeString() optimization first, then continuing to tune. +1
          Hide
          Scott Carey added a comment -

          Pursuing this further has led to new information, some questions, and some trouble.

          • The old BinaryEncoder in most cases wrote directly to the output stream. In some cases it buffered (writeBytes). Almost every use of it in Avro assumes that it does not buffer. Therefore, although we know from the mailing lists that many users have run into the buffering and now use flush(), many likely do not. Therefore we need something akin to "DirectBinaryEncoder", and another big note in CHANGES.txt. This should be much simpler than the Decoder case.
          • BlockingBinaryEncoder should be easy to adapt, and integrate with the factory. It should become simpler than it is now.
          • Does itt makes sense to have BinaryEncoder implement BufferedOutputStream? And likewise make "DirectBinaryEncoder" implement OutputStream? This should then be easier for users to understand the semantics and not have to keep a reference to the underlying stream around to close. Any use cases where one "weaves" avro and non-avro data to the same stream gets much simpler too.

          I have made a few more performance improvements, the big one is to writeString(String), which goes from ~125MB/sec to ~183MB/sec. The downside is that it requires an additional 50 lines of code and a simpler, 5 line variation gets 160MB/sec. This is the big one for the "thrift/protobuf compare" performance benchmark. http://evanjones.ca/software/java-string-encoding.html
          We could try adapting the raw UTF-8 code from the Hadoop project and see if that is faster. Perhaps for 1.5.0, we keep it simple and go with the 160MB/sec variant and research faster string encoding and decoding on its own later.

          Show
          Scott Carey added a comment - Pursuing this further has led to new information, some questions, and some trouble. The old BinaryEncoder in most cases wrote directly to the output stream. In some cases it buffered (writeBytes). Almost every use of it in Avro assumes that it does not buffer. Therefore, although we know from the mailing lists that many users have run into the buffering and now use flush(), many likely do not. Therefore we need something akin to "DirectBinaryEncoder", and another big note in CHANGES.txt. This should be much simpler than the Decoder case. BlockingBinaryEncoder should be easy to adapt, and integrate with the factory. It should become simpler than it is now. Does itt makes sense to have BinaryEncoder implement BufferedOutputStream? And likewise make "DirectBinaryEncoder" implement OutputStream? This should then be easier for users to understand the semantics and not have to keep a reference to the underlying stream around to close. Any use cases where one "weaves" avro and non-avro data to the same stream gets much simpler too. I have made a few more performance improvements, the big one is to writeString(String), which goes from ~125MB/sec to ~183MB/sec. The downside is that it requires an additional 50 lines of code and a simpler, 5 line variation gets 160MB/sec. This is the big one for the "thrift/protobuf compare" performance benchmark. http://evanjones.ca/software/java-string-encoding.html We could try adapting the raw UTF-8 code from the Hadoop project and see if that is faster. Perhaps for 1.5.0, we keep it simple and go with the 160MB/sec variant and research faster string encoding and decoding on its own later.
          Hide
          Scott Carey added a comment -

          Yup, that is a bug. The below is the correct form. It doesn't change the performance.

            @Override
            public void writeFixed(byte[] bytes, int start, int len) throws IOException {
              if (len > (limit >> 2)) {
                //greater than 25% of the buffer, write direct
                flushBuffer();
                out.write(bytes, start, len);
                return;
              }
              ensureBounds(len);
              System.arraycopy(bytes, start, buf, pos, len);
              pos+=len;
            }
          
          Show
          Scott Carey added a comment - Yup, that is a bug. The below is the correct form. It doesn't change the performance. @Override public void writeFixed( byte [] bytes, int start, int len) throws IOException { if (len > (limit >> 2)) { //greater than 25% of the buffer, write direct flushBuffer(); out.write(bytes, start, len); return ; } ensureBounds(len); System .arraycopy(bytes, start, buf, pos, len); pos+=len; }
          Hide
          Doug Cutting added a comment -

          This is great stuff!

          Glancing at the code, it appears that writeFixed() should return after calling out.write, no?

          Show
          Doug Cutting added a comment - This is great stuff! Glancing at the code, it appears that writeFixed() should return after calling out.write, no?
          Hide
          Scott Carey added a comment -

          AVRO-753 should fix AVRO-738 as a byproduct.

          Show
          Scott Carey added a comment - AVRO-753 should fix AVRO-738 as a byproduct.
          Scott Carey made changes -
          Link This issue is related to AVRO-738 [ AVRO-738 ]
          Hide
          Scott Carey added a comment -

          Not quite 2.5x faster... Bytes and String only increase performance moderately, since the former is just System.arraycopy in both cases, and the latter is dominated by char[] > Utf8 conversion. Everything else is a big gain because it avoids the very slow OutputStream.write(int b).

          Show
          Scott Carey added a comment - Not quite 2.5x faster... Bytes and String only increase performance moderately, since the former is just System.arraycopy in both cases, and the latter is dominated by char[] > Utf8 conversion. Everything else is a big gain because it avoids the very slow OutputStream.write(int b).
          Scott Carey made changes -
          Field Original Value New Value
          Attachment AVRO-753.v1.patch [ 12469906 ]
          Hide
          Scott Carey added a comment -

          This patch implements an experimental variation of BinaryEncoder named FastBinaryEncoder and its accompanying EncoderFactory.

          This is a first pass proof-of-concept. A final patch would replace BinaryEncoder rather than introduce FastBinaryEncoder. The purpose here is that you can do side-by-side comparison with the old one using the new Perf.java tool in AVRO-752.

          The results of Perf with '-noread' mode is as follows.

          BinaryEncoder (original):

                               IntWrite:  2219 ms,     36.047 million entries/sec.     90.729 million bytes/sec
                         SmallLongWrite:  2253 ms,     35.499 million entries/sec.     89.350 million bytes/sec
                              LongWrite:  4494 ms,     17.801 million entries/sec.     77.769 million bytes/sec
                             FloatWrite:  3088 ms,     25.900 million entries/sec.    103.599 million bytes/sec
                            DoubleWrite:  6000 ms,     13.333 million entries/sec.    106.663 million bytes/sec
                           BooleanWrite:   876 ms,     91.265 million entries/sec.     91.265 million bytes/sec
                             BytesWrite:  1007 ms,     15.882 million entries/sec.    565.653 million bytes/sec
                            StringWrite:  4835 ms,      3.309 million entries/sec.    117.875 million bytes/sec
                            RecordWrite:  5333 ms,      2.500 million entries/sec.     97.016 million bytes/sec
                  ValidatingRecordWrite:  5741 ms,      2.322 million entries/sec.     90.121 million bytes/sec
                           GenericWrite:  3953 ms,      1.686 million entries/sec.     65.439 million bytes/sec
                    GenericNested_Write:  4429 ms,      1.505 million entries/sec.     58.408 million bytes/sec
          

          FastBinaryEncoder:

                               IntWrite:   693 ms,    115.425 million entries/sec.    290.518 million bytes/sec
                         SmallLongWrite:   797 ms,    100.329 million entries/sec.    252.522 million bytes/sec
                              LongWrite:  1323 ms,     60.450 million entries/sec.    264.097 million bytes/sec
                             FloatWrite:   561 ms,    142.443 million entries/sec.    569.772 million bytes/sec
                            DoubleWrite:   893 ms,     89.528 million entries/sec.    716.227 million bytes/sec
                           BooleanWrite:   317 ms,    252.174 million entries/sec.    252.174 million bytes/sec
                             BytesWrite:   843 ms,     18.979 million entries/sec.    675.963 million bytes/sec
                            StringWrite:  4631 ms,      3.455 million entries/sec.    123.065 million bytes/sec
                            RecordWrite:  1255 ms,     10.617 million entries/sec.    412.047 million bytes/sec
                  ValidatingRecordWrite:  1686 ms,      7.907 million entries/sec.    306.883 million bytes/sec
                           GenericWrite:  1302 ms,      5.119 million entries/sec.    198.660 million bytes/sec
                    GenericNested_Write:  2073 ms,      3.215 million entries/sec.    124.769 million bytes/sec
          

          Performance is 2.5 to 6 times faster.

          There is more tuning and testing to do, but I wanted to checkpoint my work at this point and share progress.

          Show
          Scott Carey added a comment - This patch implements an experimental variation of BinaryEncoder named FastBinaryEncoder and its accompanying EncoderFactory. This is a first pass proof-of-concept. A final patch would replace BinaryEncoder rather than introduce FastBinaryEncoder. The purpose here is that you can do side-by-side comparison with the old one using the new Perf.java tool in AVRO-752 . The results of Perf with '-noread' mode is as follows. BinaryEncoder (original): IntWrite: 2219 ms, 36.047 million entries/sec. 90.729 million bytes/sec SmallLongWrite: 2253 ms, 35.499 million entries/sec. 89.350 million bytes/sec LongWrite: 4494 ms, 17.801 million entries/sec. 77.769 million bytes/sec FloatWrite: 3088 ms, 25.900 million entries/sec. 103.599 million bytes/sec DoubleWrite: 6000 ms, 13.333 million entries/sec. 106.663 million bytes/sec BooleanWrite: 876 ms, 91.265 million entries/sec. 91.265 million bytes/sec BytesWrite: 1007 ms, 15.882 million entries/sec. 565.653 million bytes/sec StringWrite: 4835 ms, 3.309 million entries/sec. 117.875 million bytes/sec RecordWrite: 5333 ms, 2.500 million entries/sec. 97.016 million bytes/sec ValidatingRecordWrite: 5741 ms, 2.322 million entries/sec. 90.121 million bytes/sec GenericWrite: 3953 ms, 1.686 million entries/sec. 65.439 million bytes/sec GenericNested_Write: 4429 ms, 1.505 million entries/sec. 58.408 million bytes/sec FastBinaryEncoder: IntWrite: 693 ms, 115.425 million entries/sec. 290.518 million bytes/sec SmallLongWrite: 797 ms, 100.329 million entries/sec. 252.522 million bytes/sec LongWrite: 1323 ms, 60.450 million entries/sec. 264.097 million bytes/sec FloatWrite: 561 ms, 142.443 million entries/sec. 569.772 million bytes/sec DoubleWrite: 893 ms, 89.528 million entries/sec. 716.227 million bytes/sec BooleanWrite: 317 ms, 252.174 million entries/sec. 252.174 million bytes/sec BytesWrite: 843 ms, 18.979 million entries/sec. 675.963 million bytes/sec StringWrite: 4631 ms, 3.455 million entries/sec. 123.065 million bytes/sec RecordWrite: 1255 ms, 10.617 million entries/sec. 412.047 million bytes/sec ValidatingRecordWrite: 1686 ms, 7.907 million entries/sec. 306.883 million bytes/sec GenericWrite: 1302 ms, 5.119 million entries/sec. 198.660 million bytes/sec GenericNested_Write: 2073 ms, 3.215 million entries/sec. 124.769 million bytes/sec Performance is 2.5 to 6 times faster. There is more tuning and testing to do, but I wanted to checkpoint my work at this point and share progress.
          Scott Carey created issue -

            People

            • Assignee:
              Scott Carey
              Reporter:
              Scott Carey
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development