Avro
  1. Avro
  2. AVRO-1504

Improve python implementation performance

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.6
    • Fix Version/s: None
    • Component/s: python
    • Labels:

      Description

      Inspired by https://www.python.org/doc/essays/list2str/, there are some low hanging fruit to increase the performance for python implementation.

      Patch soon follow:

      https://github.com/smoy/avro/commits/smoy_reader_performance

      relevant commits

      • 71220bb4a84c7aa4d42b593a2c0f7cefa8cda82d
      • 542139ce1a40492c9234ee5f84a4410515877af4
      • 2f7a0ef8d02148cf69269f5b59f89481e7c86d34

        Issue Links

          Activity

          Hide
          Ryan Blue added a comment -

          Unfortunately, I haven't been following this from the beginning, so I'm not sure what the changes are for the improvements that Steven posted. However, since Justin Cunningham can validate that the validation lookup table and direct access to _writer are an improvement and they make sense, let's start by getting just those changes in a patch. Then we can look at optimizing the numeric deserialization.

          Show
          Ryan Blue added a comment - Unfortunately, I haven't been following this from the beginning, so I'm not sure what the changes are for the improvements that Steven posted. However, since Justin Cunningham can validate that the validation lookup table and direct access to _writer are an improvement and they make sense, let's start by getting just those changes in a patch. Then we can look at optimizing the numeric deserialization.
          Hide
          Ryan Blue added a comment -

          Justin Cunningham, I'll take a look this week. Thanks for bumping the ticket!

          Show
          Ryan Blue added a comment - Justin Cunningham , I'll take a look this week. Thanks for bumping the ticket!
          Hide
          Justin Cunningham added a comment -

          I started looking into making some performance improvements for writing in the python clientlib before I came across this ticket, and I found that the lookup table Steven implemented at https://github.com/smoy/avro/commit/71220bb4a84c7aa4d42b593a2c0f7cefa8cda82d#diff-438b29138d73e88e1a515a63c8250e25R124 and replacing the property at https://github.com/smoy/avro/commit/71220bb4a84c7aa4d42b593a2c0f7cefa8cda82d#diff-438b29138d73e88e1a515a63c8250e25R268 alone resulted in a 15% performance improvement. To encode 100,000 records, runtime dropped from 6.587 seconds to 5.616 seconds in my benchmark.

          Performance of the python client isn't great write now, these changes will result in a substantial improvement.

          Any chance a committer could do a code review?

          Show
          Justin Cunningham added a comment - I started looking into making some performance improvements for writing in the python clientlib before I came across this ticket, and I found that the lookup table Steven implemented at https://github.com/smoy/avro/commit/71220bb4a84c7aa4d42b593a2c0f7cefa8cda82d#diff-438b29138d73e88e1a515a63c8250e25R124 and replacing the property at https://github.com/smoy/avro/commit/71220bb4a84c7aa4d42b593a2c0f7cefa8cda82d#diff-438b29138d73e88e1a515a63c8250e25R268 alone resulted in a 15% performance improvement. To encode 100,000 records, runtime dropped from 6.587 seconds to 5.616 seconds in my benchmark. Performance of the python client isn't great write now, these changes will result in a substantial improvement. Any chance a committer could do a code review?
          Hide
          Steven Moy added a comment -

          Patch is to improve both encode and decode performance in the python implementation.

          Show
          Steven Moy added a comment - Patch is to improve both encode and decode performance in the python implementation.
          Hide
          Steven Moy added a comment -

          Can a python committer/reviewer do a code review on the patches?

          Thank you

          Show
          Steven Moy added a comment - Can a python committer/reviewer do a code review on the patches? Thank you
          Hide
          Steven Moy added a comment -

          Latest improvement:

          Before:

          ~/github/avro/lang/py/src (trunk) $ PYTHONPATH=. python ../test/av_bench.py 10000
          Write 0.6444
          Read 0.3712
          

          After:

          ~/github/avro/lang/py/src (smoy_reader_performance) $ PYTHONPATH=. python ../test/av_bench.py 10000
          Write 0.5358
          Read 0.1624
          
          Show
          Steven Moy added a comment - Latest improvement: Before: ~/github/avro/lang/py/src (trunk) $ PYTHONPATH=. python ../test/av_bench.py 10000 Write 0.6444 Read 0.3712 After: ~/github/avro/lang/py/src (smoy_reader_performance) $ PYTHONPATH=. python ../test/av_bench.py 10000 Write 0.5358 Read 0.1624
          Hide
          Steven Moy added a comment -

          New patch

          Show
          Steven Moy added a comment - New patch
          Hide
          Steven Moy added a comment -

          before:

          ~/github/avro/lang/py/src (trunk) $ PYTHONPATH=. python ../test/av_bench.py 10000
          Write 0.6488
          Read 0.3794
          

          after:

          ~/github/avro/lang/py/src (smoy_writer_performance) $ PYTHONPATH=. python ../test/av_bench.py 10000
          Write 0.5817
          Read 0.3842
          
          Show
          Steven Moy added a comment - before: ~/github/avro/lang/py/src (trunk) $ PYTHONPATH=. python ../test/av_bench.py 10000 Write 0.6488 Read 0.3794 after: ~/github/avro/lang/py/src (smoy_writer_performance) $ PYTHONPATH=. python ../test/av_bench.py 10000 Write 0.5817 Read 0.3842
          Hide
          Steven Moy added a comment -

          patch attached

          Show
          Steven Moy added a comment - patch attached

            People

            • Assignee:
              Unassigned
              Reporter:
              Steven Moy
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development