Avro
  1. Avro
  2. AVRO-217

Benchmark Python implementation of Avro

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: python
    • Labels:
      None

      Description

      Doug mentioned to me that the Python implementation of Avro could use some performance benchmarks. This ticket is meant to collect ideas for what to benchmark and subsequently the implementation and execution of those benchmarks.

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Profiling and performance tuning would also be good.

          A simple microbenchmark is serialization and deserialzation rate of simple records. I find it more interesting to report MB/s, not records/s.

          Another simple microbenchmark is RPCs/second. This can be done with small (100B) and large (100KB) payloads to gauge both RPC latency and throughput.

          Show
          Doug Cutting added a comment - Profiling and performance tuning would also be good. A simple microbenchmark is serialization and deserialzation rate of simple records. I find it more interesting to report MB/s, not records/s. Another simple microbenchmark is RPCs/second. This can be done with small (100B) and large (100KB) payloads to gauge both RPC latency and throughput.
          Hide
          Jeff Hammerbacher added a comment -

          Some benchmarks for Java implementation at http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

          Show
          Jeff Hammerbacher added a comment - Some benchmarks for Java implementation at http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
          Hide
          Jeff Hammerbacher added a comment -

          Patrick Hunt, on AVRO-240, notes:

          fyi, noticed this recently, it's a python json lib performance evaluation on python 2.6. You might want to
          add some detail in the avro py docs talking about how to get better/best performance, esp as it
          relates to the json library used (json stdlib performance is an order of magnitude behind the others tested)

          "I honestly didn't expect the stdlib json to be this far behind.

          Among the other C based libraries there isn't a clear winner. cjson is the best decoder but the slowest encoder, simplejson compiled with C speedups is the fastest encoder but the slowest decoder while jsonlib2 is somewhere in the middle for both cases."

          http://www.mikealrogers.com/archives/695

          Show
          Jeff Hammerbacher added a comment - Patrick Hunt, on AVRO-240 , notes: fyi, noticed this recently, it's a python json lib performance evaluation on python 2.6. You might want to add some detail in the avro py docs talking about how to get better/best performance, esp as it relates to the json library used (json stdlib performance is an order of magnitude behind the others tested) "I honestly didn't expect the stdlib json to be this far behind. Among the other C based libraries there isn't a clear winner. cjson is the best decoder but the slowest encoder, simplejson compiled with C speedups is the fastest encoder but the slowest decoder while jsonlib2 is somewhere in the middle for both cases." http://www.mikealrogers.com/archives/695
          Hide
          Miki Tebeka added a comment -

          +1 on this. I see python avro very slow. Takes about 60sec to process 33K file (30sec on 1.5.1). While the Java package process this in about 1sec.

          One simple think is to have a list of .avro files and time how much it takes to process all of them.

          Show
          Miki Tebeka added a comment - +1 on this. I see python avro very slow. Takes about 60sec to process 33K file (30sec on 1.5.1). While the Java package process this in about 1sec. One simple think is to have a list of .avro files and time how much it takes to process all of them.

            People

            • Assignee:
              Jeff Hammerbacher
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development