Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.6.1
-
None
-
OS X 10.6 i686 / Linux x86_64
Ruby 1.8.7-p334 / Ruby 1.9.2-p180
-
Patch Available
Description
I have a patch to the Ruby libraries that greatly increases deserializer speed. We've been running our production systems at Ooyala with this patch for weeks but it was previously just a standalone file which changed some methods inside Thrift code and we loaded it after requiring thrift. Over the weekend, I ported it into a proper patch against the 0.6.1 tag and would like to commit it back.
I originally wrote this while trying to speed up some code we have that has to deserialize a lot of thrift objects. I ran it under ruby-prof and noticed that a huge amount of time was spent inside thrift deserialization code. Digging deeper still, I saw a lot of time spent in String allocation and copy methods. It turns out that there are several low-hanging fruit:
1) XProtocol#read_byte() methods end up calling read_all(1), getting back a string of size 1, and converting it to a byte. This is an unnecessary string alloc + copy that's pretty easy to get around. The patch does this by adding a read_byte method to the XTransport classes. The transports that have buffering of some kind (BufferedTransport, FramedTransport, MemoryBufferTransport) can look up the byte, convert to unsigned, and return it without doing the extra alloc + copy.
2) the BaseProtocol#read_all() method always allocates an empty buffer string, reads bytes from the underlying transport, then appends the result to the buffer. This extra string alloc + copy is also removed in my patch as it's not needed.
3) Thrift::Struct#hash() is inefficient - it allocates an array and copies all struct fields into it. Replaced with logic copied from Apache's Java HashCodeBuilder class.
I've built a gem locally (i gave it version number 0.6.1.1) and wrote a simple benchmark to test the changes. The benchmark creates a struct, serializes it to a binary string, then deserializes it in a loop 10000 times (per protocol). Here are the results (all times are in seconds):
Benchmark | r1.8.7-p334/thrift-0.6.0 | r1.8.7-p334/thrift-0.6.1.1 | r1.9.2-p180/thrift-0.6.0 | r1.9.2-p180/thrift-0.6.1.1 |
---|---|---|---|---|
Deserialization: BinaryProtocol | 15.76 | 9.97 | 8.23 | 5.39 |
Deserialization: BinaryProtocolAccelerated | 11.65 | 4.14 | 5.73 | 3.15 |
Deserialization: CompactProtocol | 12.70 | 3.65 | 6.48 | 2.75 |
Hashing | 7.39 | 5.99 | 2.61 | 2.23 |
Equality | 3.84 | 2.93 | 1.24 | 0.96 |
I will be attaching the patch and benchmark code shortly.