Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-6235

Improve native protocol server latency

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      The tl;dr is that the native protocol server seems to add some non negligeable latency to operations compared to the thrift server. And the added latency seems to lie within Netty's internal as far as I can tell. I'm not sure what to tweak to try to reduce that.

      The test I ran is simple: it's stress -t 1 -L3, the Cassandra stress test for insertions with just 1 thread and using CQL-over-thrift (to make things more comparable). What I'm interested in is the average latency. Also, because I don't care about testing the storage engine or even CQL processing, I've disabled the processing of statements: all queries just return an empty result set right away (there's no parsing of the query in particular). The resulting branch is at https://github.com/pcmanus/cassandra/commits/latency-testing (note that there's a trivial patch to have stress show the latency in microseconds).

      With that branch (single node), I get with thrift ~62μs of average latency. That number is actually fairly stable across runs (not doing any real processing helps having consistent performance here).

      For the native protocol, I wanted to eliminate the possibility that the DataStax Java driver was the bottleneck so I wrote a very simple class (NPTester.java, attached) that emulates the stress test above but with the native protocol. It's not execssively pretty but its simple (no dependencies, compiles with javac NPTester.java) and it tries to minimize the client side overhead. It's just a basic loop that write query frames (serializing them largely manually) and read the result back. And it measures the latency as close to the socket as possible. Unless I've done something really wrong, it should have less client side overhead than what stress has.

      With that tester, the average latency I get is ~140μs. This is more than twice that of thrift.

      To try to understand where that additional latency was spent, I "instrumented" the Frame coder/decoder to record latencies (last commit of the latency-test branch above): it records how long it takes to decode, execute and re-encode the query. The latency for that is ~35μs (as other numbers above, this is pretty consistent over runs). Given that my ping on localhost is <30μs, this suggest that compared to thrift, Netty spends ~70μs more than the thrift server somewhere while reading and/or writing data on the wire. I've try yourkitting it but I didn't saw anything obvious so I'm not sure what's the problem, but it sure would be nice to get on par (or at least much closer) with thrift on such a simple test.

      I'll note that if I run the same tests without disabling actual query processing, the tests have a bit more variability, but for thrift I get ~220-230μs latency on average while the NPTester gets ~290-300μs. In other words, there still seems to be that 70μs overhead for the native protocol. Which in that case is still a >30% slowdown. I'll also note that test comparisons with more threads (using the java driver this time) also show the native protocol being slightly slower than thrift (~5-10% slower), and while there might be inefficiencies in the java driver, I'm growing more and more convinced that at least part of it is due to the latency "issue" described above.

      Attachments

        1. NPTester.java
          6 kB
          Sylvain Lebresne

        Issue Links

          Activity

            People

              Unassigned Unassigned
              slebresne Sylvain Lebresne
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: