Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-392

Binary Decoder Performance and flexibility overhaul

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 1.3.0
    • java
    • None
    • Reviewed

    Description

      BinaryDecoder has room for significant performance improvement. AVRO-327 has some preliminary work here, but in order to satisfy some use cases there is much more work to do.

      I am opening a new ticket because the scope of the changes needed to do this the right way are larger.

      I have done a large bulk of a new implementation that abstracts a 'ByteSource' from the BinaryDecoder. Currently BinaryDecoder is tightly coupled to InputStream. The ByteSource can wrap an InputStream, FileChannel, or byte[] in this version, but could be extended to support other channel types, sockets, etc. This abstraction allows the BinaryDecoder to buffer data from various sources while supporting interleaved access to the underlying data and greater flexibility going forward.
      The performance of this abstraction has been heavily tuned so that maximum performance can be achieved even for slower ByteSource implementations.

      For readers that must interleave reads on a stream with the decoder, this includes a

      public InputStream inputStream();
      

      method on the decoder that can serve interleaved reads.

      Additionally it will be necessary to have a constructor on BinaryDecoder that allows two BinaryDecoders to share a stream (and buffer).

      Performance results on this new version is better than previous prototypes:

      current trunk BinaryDecoder

      ReadInt: 983 ms, 30.497877855999185 million entries/sec
      ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec
      ReadLong: 1518 ms, 19.75179889508437 million entries/sec
      ReadFloat: 657 ms, 45.61031157924184 million entries/sec
      ReadDouble: 761 ms, 39.387756709704355 million entries/sec
      ReadBoolean: 331 ms, 90.4268145647456 million entries/sec
      RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec
      NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec
      ResolverTest: 8296 ms, 3.616055866616717 million entries/sec
      MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec
      

      buffering BinaryDecoder

      ReadInt: 187 ms, 160.22131904871262 million entries/sec
      ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec
      ReadLong: 613 ms, 48.882385721129246 million entries/sec
      ReadFloat: 253 ms, 118.16606270679061 million entries/sec
      ReadDouble: 275 ms, 108.94314257389068 million entries/sec
      ReadBoolean: 222 ms, 134.85327963176064 million entries/sec
      RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec
      NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec
      ResolverTest: 4213 ms, 7.120659335077578 million entries/sec
      MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec
      

      Performance is 2x to 5x the throughput of trunk on most tests.

      Attachments

        1. AVRO-392.patch
          108 kB
          Scott Carey
        2. AVRO-392.patch
          107 kB
          Scott Carey
        3. AVRO-392.patch
          92 kB
          Scott Carey
        4. AVRO-392.patch
          63 kB
          Scott Carey
        5. AVRO-392.patch
          59 kB
          Scott Carey
        6. AVRO-392.patch
          62 kB
          Scott Carey
        7. AVRO-392-preview.patch
          36 kB
          Scott Carey
        8. AVRO-392-with_DirectBinaryDecoder.patch
          104 kB
          Scott Carey
        9. AVRO-392-with_DirectBinaryDecoder-2.patch
          107 kB
          Thiruvalluvan M. G.

        Issue Links

          Activity

            People

              scott_carey Scott Carey
              scott_carey Scott Carey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: