Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19753

Not getting responses with concurrent stream IDs in native protocol v5

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • Messaging/Client
    • None
    • Correctness - Transient Incorrect Response
    • Critical
    • Normal
    • Unit Test
    • All
    • None

    Description

      This is not gonna be an easy bug to report or to give a great set of repro steps for, so apologies in advance. I’m one of the authors and the maintainer of Xandra, the Cassandra client for Elixir.

      We noticed an issue with request timeouts in a new version of our client. Just for reference, the issue is this one.

      After some debugging, we figured out that the issue was limited to native protocol v5. With native protocol v5, the issue shows up in C* 4.1 and 5.0. With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm running C* in a Docker container, but I've had folks reproduce this with all sorts of C* setups.

      The Issue

      The new version of our client in question uses concurrent requests. We assign each request a sequential stream ID (1, 2, ...). We behave in a compliant way with section 2.4.1.3. of the native protocol v5 spec—to the best of my knowledge.

      Now, it seems like C* does not respond do all requests this way. We have a simple test in our repo that reproduces this. It just issues two requests in parallel (with stream IDs 1 and 2) and then keeps issuing requests as soon as there are responses. Almost 100% of the times, we don't get the response on at least one stream. I've also attached some debug logs that show this in case it can be helpful (from the client perspective). The <<56, 0, 2, 67, 161, ...>> syntax is Erlang's syntax for bytestrings, where each number is the decimal value for a single byte. You can see in the logs that we never get the response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.

      I’m pretty short on what to do next on our end. I’ve tried shuffling around the socket buffer size as well (from 10 bytes to 1000000 bytes) to get the packets to split up in all sorts of places, but everything works as expected except for the requests that are not coming out of C*.

      Any other help is appreciated here, but I've started to suspect this might be something with C*. It could totally not be, but I figured it was worth to post out here.

      Thank you all in advance folks! 💟

      Attachments

        1. xandra.log
          4 kB
          Andrea Leopardi

        Activity

          People

            samt Sam Tunnicliffe
            whatyouhide Andrea Leopardi
            Sam Tunnicliffe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: