Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.0
    • Component/s: Core
    • Labels:
      None

      Description

      Currently, we do three unnecessary copies (that is, writing to the socket is necessary; any other copies made are overhead) for each message:

      • constructing the Message body byte[] (this is typically a call to a ICompactSerializer[2] serialize method, but sometimes we cheat e.g. in SchemaCheckVerbHandler's reply)
      • which is copied to a buffer containing the entire Message (i.e. including Header) when sendOneWay calls Message.serializer.serialize()
      • which is copied to a newly-allocated ByteBuffer when sendOneWay calls packIt
      • which is what we write to the socket

      For deserialize we perform a similar orgy of copies:

      • IncomingTcpConnection reads the Message length, allocates a byte[], and reads the serialized Message into it
      • ITcpC then calls Message.serializer().deserialize, which allocates a new byte[] for the body and copies that part
      • finally, the verbHandler (determined by the now-deserialized Message header) deserializes the actual object represented by the body

      Most of these are out of scope for 0.7 but I think we can at least elide the last copy on the write path and the first on the read.

      1. 1788.txt
        7 kB
        Jonathan Ellis
      2. 1788-v2.txt
        13 kB
        Jonathan Ellis
      3. 1788-v3.txt
        13 kB
        Jonathan Ellis
      4. 1788-v4.txt
        13 kB
        Jonathan Ellis
      5. ASF.LICENSE.NOT.GRANTED--0001-setup.txt
        11 kB
        Jonathan Ellis
      6. ASF.LICENSE.NOT.GRANTED--0002-remove-copies-from-network-path.txt
        18 kB
        Jonathan Ellis
      7. 1788-v6.txt
        38 kB
        Jonathan Ellis
      8. 1788-v7.txt
        37 kB
        Brandon Williams

        Activity

        Hide
        Jonathan Ellis added a comment -

        As planned, removes last copy on write and first on read. Message.serializer calls are inlined and MessageSerializer is removed to avoid implying that it actually makes sense to call outside of sendOneWay / IncomingTcpConnection.

        Bonus fix: removed copying DataOutputBuffer.asByteArray, replacing with wrap() of existing buffer with appropriate limit.

        Show
        Jonathan Ellis added a comment - As planned, removes last copy on write and first on read. Message.serializer calls are inlined and MessageSerializer is removed to avoid implying that it actually makes sense to call outside of sendOneWay / IncomingTcpConnection. Bonus fix: removed copying DataOutputBuffer.asByteArray, replacing with wrap() of existing buffer with appropriate limit.
        Hide
        Gary Dusbabek added a comment -

        Could you save another copy on send if packIt returned a Message augmented with header information that would then be placed in the OutboundTcpConnection queue? The consumer there would then take the message and serialize it directly to the stream. The obvious disadvantage I see is that it shifts the serialization work onto the writer thread, which might be undesirable.

        Show
        Gary Dusbabek added a comment - Could you save another copy on send if packIt returned a Message augmented with header information that would then be placed in the OutboundTcpConnection queue? The consumer there would then take the message and serialize it directly to the stream. The obvious disadvantage I see is that it shifts the serialization work onto the writer thread, which might be undesirable.
        Hide
        Jonathan Ellis added a comment -

        Excellent! v2 attached.

        Question is, is it worth breaking compatibility with rc1? (v1 does not.) I would lean towards yes personally.

        Show
        Jonathan Ellis added a comment - Excellent! v2 attached. Question is, is it worth breaking compatibility with rc1? (v1 does not.) I would lean towards yes personally.
        Hide
        Jonathan Ellis added a comment -

        v3 w/ Gary's trick of copying just the Header part to maintain compatibility.

        Show
        Jonathan Ellis added a comment - v3 w/ Gary's trick of copying just the Header part to maintain compatibility.
        Hide
        Jonathan Ellis added a comment -

        v4 removes a bogus write(-1) call left over from v1. With that fix I have a 0.7 + v4 node working with a rc1 node, I can direct stress.py at either one and have writes go through to the other correctly.

        Show
        Jonathan Ellis added a comment - v4 removes a bogus write(-1) call left over from v1. With that fix I have a 0.7 + v4 node working with a rc1 node, I can direct stress.py at either one and have writes go through to the other correctly.
        Hide
        Gary Dusbabek added a comment -

        with v4 I still see errors on the rc1 node.

        INFO 09:10:43,847 [WRITE-/127.0.0.2] org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:115) error writing to /127.0.0.2

        Show
        Gary Dusbabek added a comment - with v4 I still see errors on the rc1 node. INFO 09:10:43,847 [WRITE-/127.0.0.2] org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:115) error writing to /127.0.0.2
        Hide
        Jonathan Ellis added a comment -

        Rebooted v4 by first adding MessageSerializerTest with a bytesToHex string of the old-style bytes-on-wire to make sure I'm not breaking it. Then when I add the new code I'm testing that new code can read old bytes, as well as old code reading new bytes. Everything comes up clean. I think I need another set of eyes on this.

        (01 is large because I encapculated MS.instance in a getter to break an initialization-cycle problem.)

        Show
        Jonathan Ellis added a comment - Rebooted v4 by first adding MessageSerializerTest with a bytesToHex string of the old-style bytes-on-wire to make sure I'm not breaking it. Then when I add the new code I'm testing that new code can read old bytes, as well as old code reading new bytes. Everything comes up clean. I think I need another set of eyes on this. (01 is large because I encapculated MS.instance in a getter to break an initialization-cycle problem.)
        Hide
        Jonathan Ellis added a comment -

        rebased, this time committing the MS.instance encapsulation separately

        Show
        Jonathan Ellis added a comment - rebased, this time committing the MS.instance encapsulation separately
        Hide
        Jonathan Ellis added a comment - - edited

        v5 attached for trunk.

        Message format should be identical to before:
        <MAGIC><packheader><length of packbody><packbody>

        packbody:
        <id><serialized header><body length><body>

        This does NOT require a message version bump.

        Show
        Jonathan Ellis added a comment - - edited v5 attached for trunk. Message format should be identical to before: <MAGIC><packheader><length of packbody><packbody> packbody: <id><serialized header><body length><body> This does NOT require a message version bump.
        Hide
        Jonathan Ellis added a comment -

        Part of the old sendOneWay (the "packbody" copy) looks like this:

                    DataOutputBuffer buffer = new DataOutputBuffer();
                    buffer.writeUTF(id);
                    Message.serializer().serialize(message, buffer, message.getVersion());
                    data = buffer.getData();
        

        byte[] data is NOT restricted to just the serialized bytes in the buffer – it will include any unused bytes at the end, as well.

        v5 skips garbage bytes like this for backwards compatibility.

        Show
        Jonathan Ellis added a comment - Part of the old sendOneWay (the "packbody" copy) looks like this: DataOutputBuffer buffer = new DataOutputBuffer(); buffer.writeUTF(id); Message.serializer().serialize(message, buffer, message.getVersion()); data = buffer.getData(); byte[] data is NOT restricted to just the serialized bytes in the buffer – it will include any unused bytes at the end, as well. v5 skips garbage bytes like this for backwards compatibility.
        Hide
        Brandon Williams added a comment - - edited

        With v5 (#2) I'm still periodically receiving bad magic in a mixed cluster (on the node with the patch applied):

        ERROR 19:54:07,747 Fatal exception in thread Thread[Thread-13,5,main]
        java.io.IOError: java.io.IOException: invalid protocol header
                at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:125)
        Caused by: java.io.IOException: invalid protocol header
                at org.apache.cassandra.net.MessagingService.validateMagic(MessagingService.java:467)
                at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:107)
        
        Show
        Brandon Williams added a comment - - edited With v5 (#2) I'm still periodically receiving bad magic in a mixed cluster (on the node with the patch applied): ERROR 19:54:07,747 Fatal exception in thread Thread[Thread-13,5,main] java.io.IOError: java.io.IOException: invalid protocol header at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:125) Caused by: java.io.IOException: invalid protocol header at org.apache.cassandra.net.MessagingService.validateMagic(MessagingService.java:467) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:107)
        Hide
        Brandon Williams added a comment -

        Here's a clue: the bad magic being passed is always zero.

        Show
        Brandon Williams added a comment - Here's a clue: the bad magic being passed is always zero.
        Hide
        Jonathan Ellis added a comment -

        Hmm. Not skipping enough "garbage bytes" would cause this (since Java zeros out all arrays on creation). But the node-with-patch-applied is getting lengths generated by nodes-without-patch-applied, which should be fine.

        So, still baffled.

        Show
        Jonathan Ellis added a comment - Hmm. Not skipping enough "garbage bytes" would cause this (since Java zeros out all arrays on creation). But the node-with-patch-applied is getting lengths generated by nodes-without-patch-applied, which should be fine. So, still baffled.
        Hide
        Jonathan Ellis added a comment -

        rebased to v6

        Show
        Jonathan Ellis added a comment - rebased to v6
        Hide
        Brandon Williams added a comment - - edited

        So, rebased v6 is even rarer now. I now have to do reads/writes to the cluster to trigger it, where before it looked like the occasional gossip message would cause it. And even doing reads/writes, it's still quite rare: out of the 333k (three node cluster, rf=1, 1M total) inserts/reads to the patched node, only 16 occurrences. When the patched node is the only coordinator, it never produces an exception on reads, however for writes it increases the amount of exceptions, nearly 60 out of 1M inserts. I suspect there is a problem in ITC or Message where it's not reading something correctly, but difficult to trigger. I confirmed with wireshark the other nodes are sending correct messages.

        Show
        Brandon Williams added a comment - - edited So, rebased v6 is even rarer now. I now have to do reads/writes to the cluster to trigger it, where before it looked like the occasional gossip message would cause it. And even doing reads/writes, it's still quite rare: out of the 333k (three node cluster, rf=1, 1M total) inserts/reads to the patched node, only 16 occurrences. When the patched node is the only coordinator, it never produces an exception on reads, however for writes it increases the amount of exceptions, nearly 60 out of 1M inserts. I suspect there is a problem in ITC or Message where it's not reading something correctly, but difficult to trigger. I confirmed with wireshark the other nodes are sending correct messages.
        Hide
        Brandon Williams added a comment -

        v7 rebased for trunk again. Now mysteriously has no errors. My guess is that something in CASSANDRA-1405 fixed it.

        Show
        Brandon Williams added a comment - v7 rebased for trunk again. Now mysteriously has no errors. My guess is that something in CASSANDRA-1405 fixed it.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Brandon Williams added a comment -

        Committed.

        Show
        Brandon Williams added a comment - Committed.
        Hide
        Hudson added a comment -

        Integrated in Cassandra #1001 (See https://builds.apache.org/job/Cassandra/1001/)
        Reduce copies on read/write paths.
        Patch by jbellis reviewed by brandonwilliams for CASSANDRA-1788

        brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153678
        Files :

        • /cassandra/trunk/test/unit/org/apache/cassandra/streaming/SerializationsTest.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java
        • /cassandra/trunk/test/unit/org/apache/cassandra/service/SerializationsTest.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/Header.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/Message.java
        • /cassandra/trunk/test/unit/org/apache/cassandra/net/MessageSerializer.java
        • /cassandra/trunk/test/unit/org/apache/cassandra/db/SerializationsTest.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/CompactEndpointSerializationHelper.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
        • /cassandra/trunk/test/unit/org/apache/cassandra/net
        Show
        Hudson added a comment - Integrated in Cassandra #1001 (See https://builds.apache.org/job/Cassandra/1001/ ) Reduce copies on read/write paths. Patch by jbellis reviewed by brandonwilliams for CASSANDRA-1788 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153678 Files : /cassandra/trunk/test/unit/org/apache/cassandra/streaming/SerializationsTest.java /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java /cassandra/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java /cassandra/trunk/test/unit/org/apache/cassandra/service/SerializationsTest.java /cassandra/trunk/src/java/org/apache/cassandra/net/IncomingTcpConnection.java /cassandra/trunk/src/java/org/apache/cassandra/net/Header.java /cassandra/trunk/src/java/org/apache/cassandra/net/Message.java /cassandra/trunk/test/unit/org/apache/cassandra/net/MessageSerializer.java /cassandra/trunk/test/unit/org/apache/cassandra/db/SerializationsTest.java /cassandra/trunk/src/java/org/apache/cassandra/net/CompactEndpointSerializationHelper.java /cassandra/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java /cassandra/trunk/test/unit/org/apache/cassandra/net

          People

          • Assignee:
            Jonathan Ellis
            Reporter:
            Jonathan Ellis
            Reviewer:
            Brandon Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development