Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-1765

Compare serialization metrics between a few frameworks.

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Fix Version/s: 0.8 beta 1
    • Component/s: None
    • Labels:
      None

      Description

      Compare serialization performance of Cassandra (ICompactSerializer), Thrift, Avro and Messagepack.

      1. 0001-inserts-codes-for-properly-calling-full-GC-into-Mess.patch
        19 kB
        Muga Nishizawa
      2. ASF.LICENSE.NOT.GRANTED--v1-0001-IDL-and-RangeSliceCommand-alteration.txt
        34 kB
        Gary Dusbabek
      3. ASF.LICENSE.NOT.GRANTED--v1-0002-msgpack-dependencies.txt
        1.19 MB
        Gary Dusbabek
      4. ASF.LICENSE.NOT.GRANTED--v1-0003-serialization-tests.txt
        22 kB
        Gary Dusbabek
      5. ASF.LICENSE.NOT.GRANTED--v1-0004-jvm-warmup-and-other-improvements.-by-Muga-Nishizawa.txt
        19 kB
        Gary Dusbabek
      6. ASF.LICENSE.NOT.GRANTED--v1-0001-CASSANDRA-1765-don-t-ser-dser-schema-string.txt
        3 kB
        Eric Evans
      7. 0001-CASSANDRA-1765-allocate-memory-for-binary-deserializ.patch
        2 kB
        Muga Nishizawa
      8. ASF.LICENSE.NOT.GRANTED--v2-0001-IDL-and-RangeSliceCommand-alteration.txt
        34 kB
        Gary Dusbabek
      9. ASF.LICENSE.NOT.GRANTED--v2-0002-msgpack-dependencies.txt
        1.19 MB
        Gary Dusbabek
      10. ASF.LICENSE.NOT.GRANTED--v2-0003-serialization-tests.txt
        22 kB
        Gary Dusbabek
      11. ASF.LICENSE.NOT.GRANTED--v2-0004-jvm-warmup-and-other-improvements.-by-Muga-Nishizawa.txt
        19 kB
        Gary Dusbabek
      12. ASF.LICENSE.NOT.GRANTED--v2-0005-don-t-bother-with-sending-schema-on-avro-tests.-patch-.txt
        3 kB
        Gary Dusbabek
      13. ASF.LICENSE.NOT.GRANTED--v2-0006-allocate-memory-more-efficently-for-msgpack.-patch-by-.txt
        2 kB
        Gary Dusbabek
      14. ASF.LICENSE.NOT.GRANTED--v1-0001-CASSANDRA-1765-create-reader-writer-once.txt
        4 kB
        Eric Evans
      15. ASF.LICENSE.NOT.GRANTED--v2-0001-CASSANDRA-1765-create-reader-writer-once.txt
        5 kB
        Eric Evans
      16. MessageSerializationThriftTest.java
        4 kB
        T Jake Luciani

        Issue Links

          Activity

          Hide
          gdusbabek Gary Dusbabek added a comment -

          Attached a set of naïve tests for the 4 different formats.

          Messagepack and Avro didn't fare so well, so I suspect I'm doing something wrong. Feedback appreciated.

          `ant message-tests` runs the tests. Results from my development machine can be found at https://spreadsheets.google.com/pub?key=0AlYIamTZjuoldGZXMExrXy1XeFZLOEI1RktTX2kxRkE&hl=en&output=html.

          Show
          gdusbabek Gary Dusbabek added a comment - Attached a set of naïve tests for the 4 different formats. Messagepack and Avro didn't fare so well, so I suspect I'm doing something wrong. Feedback appreciated. `ant message-tests` runs the tests. Results from my development machine can be found at https://spreadsheets.google.com/pub?key=0AlYIamTZjuoldGZXMExrXy1XeFZLOEI1RktTX2kxRkE&hl=en&output=html .
          Hide
          cowtowncoder Tatu Saloranta added a comment -

          Did the test do proper JVM warmup (it looks like they might not)? There should always be enough time for JVM "warm up", and specifically hot spot to have chance to inline commonly used code. Relatively short time (like 5 seconds) often works well enough.

          Show
          cowtowncoder Tatu Saloranta added a comment - Did the test do proper JVM warmup (it looks like they might not)? There should always be enough time for JVM "warm up", and specifically hot spot to have chance to inline commonly used code. Relatively short time (like 5 seconds) often works well enough.
          Hide
          gdusbabek Gary Dusbabek added a comment -

          It tries to. Each tests serializes and deserializes 10k identical records four times (40k operations per test). I start timing after the first 10k operations so that only the last 30k are timed. It would be trivial to bump the msgCount variable up to ensure that proper warm-up is happening if it isn't already.

          Show
          gdusbabek Gary Dusbabek added a comment - It tries to. Each tests serializes and deserializes 10k identical records four times (40k operations per test). I start timing after the first 10k operations so that only the last 30k are timed. It would be trivial to bump the msgCount variable up to ensure that proper warm-up is happening if it isn't already.
          Hide
          muga_nishizawa Muga Nishizawa added a comment -

          If you want to simply compare performance of each serialization library, I recommend that JVM warmup processing is properly separated from serialization (or deserialization) processing in your benchmark. For example, jvm-serializers, which is a benchmark, is composed of the following processings and executes those processings sequentially. jvm-serializers benchmark explicitly calls full GC before executing each processing.

          • object creation processing
          • JVM warmup processing for object serialization
          • object serialization processing
          • JVM warmup processing for binary deserialization
          • binary deserialization processing

          You can get more information about jvm-serializers at the following website.

          Show
          muga_nishizawa Muga Nishizawa added a comment - If you want to simply compare performance of each serialization library, I recommend that JVM warmup processing is properly separated from serialization (or deserialization) processing in your benchmark. For example, jvm-serializers, which is a benchmark, is composed of the following processings and executes those processings sequentially. jvm-serializers benchmark explicitly calls full GC before executing each processing. object creation processing JVM warmup processing for object serialization object serialization processing JVM warmup processing for binary deserialization binary deserialization processing You can get more information about jvm-serializers at the following website. jvm-serializers: https://github.com/eishay/jvm-serializers BenchmarkRunner: https://github.com/eishay/jvm-serializers/blob/kannan/tpc/src/serializers/BenchmarkRunner.java
          Hide
          muga_nishizawa Muga Nishizawa added a comment -

          Please find attached a patch that inserts codes for properly calling full GC into Gray's message-tests.

          Show
          muga_nishizawa Muga Nishizawa added a comment - Please find attached a patch that inserts codes for properly calling full GC into Gray's message-tests.
          Hide
          gdusbabek Gary Dusbabek added a comment -

          @Muga: thank you for the improvements. I have included them in v1 of the patchset.

          I also bumped the msgCount to 25k from 10k. The google spreadsheet has been updated.

          Show
          gdusbabek Gary Dusbabek added a comment - @Muga: thank you for the improvements. I have included them in v1 of the patchset. I also bumped the msgCount to 25k from 10k. The google spreadsheet has been updated.
          Hide
          urandom Eric Evans added a comment -

          The Avro tests are unnecessarily serializing and deserializing their own schema as part of the messages. In real life both reader and writer would have the schema, readers/writers would store it (once)/read it, to/from the file format, or handshake it (once) on an RPC session.

          The attached patch (v1-0001-CASSANDRA-1765-don-t-ser-dser-schema-string.txt) fixes this.

          Results run on my laptop here: https://gist.github.com/2f5d00d87c77260bcaac

          These results are averages of the 5 tests ran. One thing interesting about the Avro results is the high/low swing on the serialization times that make up the average. About half the runs are ~2175, and about half ~1200. It'd be interesting to find out why that is, 2175 would make it the slowest by a wide margin, and 1200 the fastest by a wide margin.

          Show
          urandom Eric Evans added a comment - The Avro tests are unnecessarily serializing and deserializing their own schema as part of the messages. In real life both reader and writer would have the schema, readers/writers would store it (once)/read it, to/from the file format, or handshake it (once) on an RPC session. The attached patch (v1-0001- CASSANDRA-1765 -don-t-ser-dser-schema-string.txt) fixes this. Results run on my laptop here: https://gist.github.com/2f5d00d87c77260bcaac These results are averages of the 5 tests ran. One thing interesting about the Avro results is the high/low swing on the serialization times that make up the average. About half the runs are ~2175, and about half ~1200. It'd be interesting to find out why that is, 2175 would make it the slowest by a wide margin, and 1200 the fastest by a wide margin.
          Hide
          muga_nishizawa Muga Nishizawa added a comment -

          Gray, thanks for your response.

          According to your google spreadsheet, binary deserialization with msgpack is effected by GC. The deserialization times vary a great deal.

          To solve this problem, I attached a patch, named "0001-CASSANDRA-1765-allocate-memory-for-binary-deserializ.patch". The patch enables allocating memory that binary deserialization uses with msgpack at once prior to deserialization and reduces the frequency of GC.

          Show
          muga_nishizawa Muga Nishizawa added a comment - Gray, thanks for your response. According to your google spreadsheet, binary deserialization with msgpack is effected by GC. The deserialization times vary a great deal. To solve this problem, I attached a patch, named "0001- CASSANDRA-1765 -allocate-memory-for-binary-deserializ.patch". The patch enables allocating memory that binary deserialization uses with msgpack at once prior to deserialization and reduces the frequency of GC.
          Hide
          gdusbabek Gary Dusbabek added a comment - - edited

          v3 incorporates Eric and Muga's latests changes. Spreadsheet is updated.

          Show
          gdusbabek Gary Dusbabek added a comment - - edited v3 incorporates Eric and Muga's latests changes. Spreadsheet is updated.
          Hide
          urandom Eric Evans added a comment -

          Scott Carey pointed out in (http://www.mail-archive.com/dev@avro.apache.org/msg01651.html), that it also does not make sense to recreate the reader/writer on each run (you'd never do this In Real Life). The attached patch (v1-0001-CASSANDRA-1765-create-reader-writer-once.txt) corrects this.

          This seems to improve the consistency of serialization times run-to-run, here is what it looks like now on my laptop: https://gist.github.com/48cb8f178b073db113be

          Show
          urandom Eric Evans added a comment - Scott Carey pointed out in ( http://www.mail-archive.com/dev@avro.apache.org/msg01651.html ), that it also does not make sense to recreate the reader/writer on each run (you'd never do this In Real Life). The attached patch (v1-0001- CASSANDRA-1765 -create-reader-writer-once.txt) corrects this. This seems to improve the consistency of serialization times run-to-run, here is what it looks like now on my laptop: https://gist.github.com/48cb8f178b073db113be
          Hide
          tjake T Jake Luciani added a comment -

          msgpack requires lgpl software see CASSANDRA-1735

          Show
          tjake T Jake Luciani added a comment - msgpack requires lgpl software see CASSANDRA-1735
          Hide
          urandom Eric Evans added a comment -

          So I went ahead and applied some of the remaining feedback from Scott Carey and the results where very impressive.

          https://gist.github.com/6be358038fce183911b5

          Most of this came from the reuse of the Encoder (and backing outputstream), and Decoder. I'm not entirely sure whether the datum reuse is valid, or representative of Real World use (I think it is), but that contributed to less than 100ms toward the deserialization improvements.

          Show
          urandom Eric Evans added a comment - So I went ahead and applied some of the remaining feedback from Scott Carey and the results where very impressive. https://gist.github.com/6be358038fce183911b5 Most of this came from the reuse of the Encoder (and backing outputstream), and Decoder. I'm not entirely sure whether the datum reuse is valid, or representative of Real World use (I think it is), but that contributed to less than 100ms toward the deserialization improvements.
          Hide
          tjake T Jake Luciani added a comment - - edited

          Updated thrift benchmarks, the current FBUtilities need to be optimized. I'll open a separate ticket for this.

          Updated results from my machine (with all other changes here)

          https://spreadsheets.google.com/pub?key=0AoFSNkh7LrJCdFlhdXQ1UW5QdFY5djlFYXc5eThEdFE&hl=en&output=html

          Show
          tjake T Jake Luciani added a comment - - edited Updated thrift benchmarks, the current FBUtilities need to be optimized. I'll open a separate ticket for this. Updated results from my machine (with all other changes here) https://spreadsheets.google.com/pub?key=0AoFSNkh7LrJCdFlhdXQ1UW5QdFY5djlFYXc5eThEdFE&hl=en&output=html
          Hide
          gdusbabek Gary Dusbabek added a comment -

          None of the libraries distinguished themselves as being a particularly crappy choice for serialization.

          Show
          gdusbabek Gary Dusbabek added a comment - None of the libraries distinguished themselves as being a particularly crappy choice for serialization.
          Hide
          cowtowncoder Tatu Saloranta added a comment - - edited

          It sort of makes sense that differences might not be huge, with well-written codecs and test.
          One more thing that might be useful would be to add baseline of basic JSON-based serialization (with Jackson, data binding), to see how much use of binary formats helps.
          If tests are easy enough to modify, maybe I'll try to add that.

          But it looks like tests are not (yet?) part of trunk?

          Show
          cowtowncoder Tatu Saloranta added a comment - - edited It sort of makes sense that differences might not be huge, with well-written codecs and test. One more thing that might be useful would be to add baseline of basic JSON-based serialization (with Jackson, data binding), to see how much use of binary formats helps. If tests are easy enough to modify, maybe I'll try to add that. But it looks like tests are not (yet?) part of trunk?
          Hide
          gdusbabek Gary Dusbabek added a comment -

          @Tatu: the patches were never committed. There didn't seem much point after I realized that all libraries were roughly comparable. Plus, the test has dependencies that are outside the scope of cassandra.

          The v2 patches are the most recent, but I suspect they are rather outdated.

          Show
          gdusbabek Gary Dusbabek added a comment - @Tatu: the patches were never committed. There didn't seem much point after I realized that all libraries were roughly comparable. Plus, the test has dependencies that are outside the scope of cassandra. The v2 patches are the most recent, but I suspect they are rather outdated.

            People

            • Assignee:
              gdusbabek Gary Dusbabek
              Reporter:
              gdusbabek Gary Dusbabek
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development