Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-15299

CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

    XMLWordPrintableJSON

Details

    Description

      CASSANDRA-13304 made an important improvement to our native protocol: it introduced checksumming/CRC32 to request and response bodies. It’s an important step forward, but it doesn’t cover the entire stream. In particular, the message header is not covered by a checksum or a crc, which poses a correctness issue if, for example, streamId gets corrupted.

      Additionally, we aren’t quite using CRC32 correctly, in two ways:
      1. We are calculating the CRC32 of the decompressed value instead of computing the CRC32 on the bytes written on the wire - losing the properties of the CRC32. In some cases, due to this sequencing, attempting to decompress a corrupt stream can cause a segfault by LZ4.
      2. When using CRC32, the CRC32 value is written in the incorrect byte order, also losing some of the protections.

      See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for explanation for the two points above.

      Separately, there are some long-standing issues with the protocol - since way before CASSANDRA-13304. Importantly, both checksumming and compression operate on individual message bodies rather than frames of multiple complete messages. In reality, this has several important additional downsides. To name a couple:

      1. For compression, we are getting poor compression ratios for smaller messages - when operating on tiny sequences of bytes. In reality, for most small requests and responses we are discarding the compressed value as it’d be smaller than the uncompressed one - incurring both redundant allocations and compressions.
      2. For checksumming and CRC32 we pay a high overhead price for small messages. 4 bytes extra is a lot for an empty write response, for example.

      To address the correctness issue of streamId not being covered by the checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we should switch to a framing protocol with multiple messages in a single frame.

      I suggest we reuse the framing protocol recently implemented for internode messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, and that we do it before native protocol v5 graduates from beta. See https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java and https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.

      Attachments

        1. Process CQL Frame.png
          22 kB
          Alex Petrov
        2. V5 Flow Chart.png
          73 kB
          Alex Petrov

        Issue Links

          Activity

            People

              samt Sam Tunnicliffe
              aleksey Aleksey Yeschenko
              Sam Tunnicliffe
              Alex Petrov, Caleb Rackliffe
              Votes:
              1 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: