Cassandra
  1. Cassandra
  2. CASSANDRA-3378

Allow configuration of storage protocol socket buffer

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.2.2
    • Component/s: Core
    • Labels:

      Description

      Similar to rpc_[send,recv]_buff_size_in_bytes, we should expose this for high latency connections.

      1. 3378-v3.patch
        5 kB
        Jason Brown
      2. 3378-v4.patch
        4 kB
        Michał Michalski
      3. cassandra-3378-v1.patch
        4 kB
        Michał Michalski
      4. cassandra-3378-v2.patch
        5 kB
        Michał Michalski

        Activity

        Hide
        Chris Goffinet added a comment - - edited

        Assigning to Harish, starting him out with something small as his first patch.

        Show
        Chris Goffinet added a comment - - edited Assigning to Harish, starting him out with something small as his first patch.
        Hide
        Michał Michalski added a comment -

        I started to investigate this task (lfh label sounds like a good point to get deeper into Cassandra architecture ), but as I'm new to Cassandra internal architecture, I'm not really sure if I understand the subject of this task. To be more precise - I have a problem with understanding what does "storage protocol" really refers to in this case?

        At first I thought it refers to Thrift (and ThriftServer started by CassandraDaemon), but when I was investigating this issue (by checking how "looking at rpc_[send,recv]_buff_size_in_bytes" setting is set and where) I discovered that ThriftServer already uses the setting mentioned in this task.

        My second idea was "native server" (org.apache.cassandra.transport.Server) which I found in CassandraDaemon too. It uses ServerBootstrap created using NioServerSocketChannelFactory and it seems possible to add such setting there, but I wonder if this is really the service this issue tells about? Or maybe I'm completely wrong? Could you give me some clue?

        Sorry if it's a silly question, but I'm confused

        BTW: Is there maybe a doc that tells a bit more on how Cassandra works in terms of "internal" architecture/dependencies/protocols etc.?

        Show
        Michał Michalski added a comment - I started to investigate this task (lfh label sounds like a good point to get deeper into Cassandra architecture ), but as I'm new to Cassandra internal architecture, I'm not really sure if I understand the subject of this task. To be more precise - I have a problem with understanding what does "storage protocol" really refers to in this case? At first I thought it refers to Thrift (and ThriftServer started by CassandraDaemon), but when I was investigating this issue (by checking how "looking at rpc_ [send,recv] _buff_size_in_bytes" setting is set and where) I discovered that ThriftServer already uses the setting mentioned in this task. My second idea was "native server" (org.apache.cassandra.transport.Server) which I found in CassandraDaemon too. It uses ServerBootstrap created using NioServerSocketChannelFactory and it seems possible to add such setting there, but I wonder if this is really the service this issue tells about? Or maybe I'm completely wrong? Could you give me some clue? Sorry if it's a silly question, but I'm confused BTW: Is there maybe a doc that tells a bit more on how Cassandra works in terms of "internal" architecture/dependencies/protocols etc.?
        Hide
        Brandon Williams added a comment -

        "storage protocol" is the internode communication protocol Cassandra uses, which is completely custom (ie, not thrift-based or anything like that.) OutboundTcpConnection is the class you need to modify to set the socket buffer.

        Show
        Brandon Williams added a comment - "storage protocol" is the internode communication protocol Cassandra uses, which is completely custom (ie, not thrift-based or anything like that.) OutboundTcpConnection is the class you need to modify to set the socket buffer.
        Hide
        Michał Michalski added a comment -

        Thanks
        I'll try to find some time soon and submit a patch (hopefully) by the end of the week.

        Show
        Michał Michalski added a comment - Thanks I'll try to find some time soon and submit a patch (hopefully) by the end of the week.
        Hide
        Michał Michalski added a comment -

        Just pondering... In the existing code there's a BufferedOutputStream with buffer size of 4096 bytes at first, which can be then "overwritten" by a stream with default buffer size (no specified size is passed to constructor) if compression is enabled.
        My concern is: if there's a reason for setting different buffer size for compressed and non-compressed connection by default, is this valid to apply the same value (provided by user in config) for both connection types?
        Shouldn't they be distinguished on a config level too? For example by "splitting" storage_send_buff_size_bytes (that's how I temporarily named this new param) into two options: storage_uncompressed_send_buff_size_bytes and storage_compressed_send_buff_size_bytes?
        Or maybe we assume that for this special environment (high latency connection) the performance gain achieved by using larger buffer is big enough to just ignore the case I mentioned and use the same value in these two places in code?

        Show
        Michał Michalski added a comment - Just pondering... In the existing code there's a BufferedOutputStream with buffer size of 4096 bytes at first, which can be then "overwritten" by a stream with default buffer size (no specified size is passed to constructor) if compression is enabled. My concern is: if there's a reason for setting different buffer size for compressed and non-compressed connection by default, is this valid to apply the same value (provided by user in config) for both connection types? Shouldn't they be distinguished on a config level too? For example by "splitting" storage_send_buff_size_bytes (that's how I temporarily named this new param) into two options: storage_uncompressed_send_buff_size_bytes and storage_compressed_send_buff_size_bytes? Or maybe we assume that for this special environment (high latency connection) the performance gain achieved by using larger buffer is big enough to just ignore the case I mentioned and use the same value in these two places in code?
        Hide
        Michał Michalski added a comment -

        I've attached an initial patch. As I mentioned I had a little concern on what to do with the default buffer size that is set for BufferedOutputStream and why is it set only for uncompressed OutputStream. I've read CASSANDRA-1896 and I understand why it's set to 4096, but I can only guess why it's not set for compressed stream too. Anyway - I'm pretty sure that user settings should overwrite this default 4096, but I wonder if default of 4096 should stay in code (so default buffer size values for compressed and uncompressed streams are handled separately) or maybe it should be set by default in cassandra.yaml. However, second solution (which I picked for now) causes compressed output stream to have default buffer size of 4096 bytes too, so it leads back to my first concern: is this a behaviour we expect?

        Waiting for a comment and I'll improve this patch according to your suggestion.

        Show
        Michał Michalski added a comment - I've attached an initial patch. As I mentioned I had a little concern on what to do with the default buffer size that is set for BufferedOutputStream and why is it set only for uncompressed OutputStream. I've read CASSANDRA-1896 and I understand why it's set to 4096, but I can only guess why it's not set for compressed stream too. Anyway - I'm pretty sure that user settings should overwrite this default 4096, but I wonder if default of 4096 should stay in code (so default buffer size values for compressed and uncompressed streams are handled separately) or maybe it should be set by default in cassandra.yaml. However, second solution (which I picked for now) causes compressed output stream to have default buffer size of 4096 bytes too, so it leads back to my first concern: is this a behaviour we expect? Waiting for a comment and I'll improve this patch according to your suggestion.
        Hide
        Brandon Williams added a comment -

        You actually need to set the receive buffer size in IncomingTcpConnection. It sounds like the lack of a buffer for compression is either an oversight or is buffered somewhere deeper in the stack. I'll run some tests when we're ready.

        Show
        Brandon Williams added a comment - You actually need to set the receive buffer size in IncomingTcpConnection. It sounds like the lack of a buffer for compression is either an oversight or is buffered somewhere deeper in the stack. I'll run some tests when we're ready.
        Hide
        Michał Michalski added a comment -

        Thanks for reviewing, Brandon.

        Changes in v2:
        1) Remove receive buffer setting from OutboundTcpConnection - as it's being used for Messaging purposes only and not really needed there
        2) Set receive buffer (only) for IncommingTcpConnection. My concern is the way of changing this parameter in a kind of "wide" way - not only for streaming (for which it makes sense for me), but much earlier, so it "affects" messaging parts of code too. I'm unsure of the impact it may have on the messaging service in general - I don't have that much experience with Cassandra & network programming to judge it "just like that".

        Show
        Michał Michalski added a comment - Thanks for reviewing, Brandon. Changes in v2: 1) Remove receive buffer setting from OutboundTcpConnection - as it's being used for Messaging purposes only and not really needed there 2) Set receive buffer (only) for IncommingTcpConnection. My concern is the way of changing this parameter in a kind of "wide" way - not only for streaming (for which it makes sense for me), but much earlier, so it "affects" messaging parts of code too. I'm unsure of the impact it may have on the messaging service in general - I don't have that much experience with Cassandra & network programming to judge it "just like that".
        Hide
        Jason Brown added a comment -

        Looks like snappy does it's own internal buffering (for compressing streams). SnappyOutputStream has an 'uncompressed' buffer of 32k, which flushes to it's output stream when full or flush() is called. SnappyInputStream creates a buffer based upon the size of the chunk size of the current block. Thus, we shouldn't need to worry about having Buffered*putStream for compressed streams. We could play with the buffer size there (4k vs. 8k), but I think the biggest win is just having any buffer before the socket output stream (which doesn't buffer at all). I'm in favor of leaving the default 4096 buffer size for Buffered*putStream on uncompressed streams until there is sufficient evidence/testing to switch.

        As for the socket's buffer config, we can use Michal's new yaml settings for the in/out sockets. However, I think storage_send_buff_size_in_bytes should be commented out by default in the yaml.

        Show
        Jason Brown added a comment - Looks like snappy does it's own internal buffering (for compressing streams). SnappyOutputStream has an 'uncompressed' buffer of 32k, which flushes to it's output stream when full or flush() is called. SnappyInputStream creates a buffer based upon the size of the chunk size of the current block. Thus, we shouldn't need to worry about having Buffered*putStream for compressed streams. We could play with the buffer size there (4k vs. 8k), but I think the biggest win is just having any buffer before the socket output stream (which doesn't buffer at all). I'm in favor of leaving the default 4096 buffer size for Buffered*putStream on uncompressed streams until there is sufficient evidence/testing to switch. As for the socket's buffer config, we can use Michal's new yaml settings for the in/out sockets. However, I think storage_send_buff_size_in_bytes should be commented out by default in the yaml.
        Hide
        Jason Brown added a comment -

        Minor cleanup of Michal's patch. Rebased against current 1.2 branch. Restored the Buffered*outStream size to 4096 and added yaml comment.

        Show
        Jason Brown added a comment - Minor cleanup of Michal's patch. Rebased against current 1.2 branch. Restored the Buffered*outStream size to 4096 and added yaml comment.
        Hide
        Marcus Eriksson added a comment -

        could we perhaps call the config parameter "internode_(send|recv)_buff_size_in_bytes" instead? makes it a bit clearer (atleast to me) what we are actually configuring

        other than that, lgtm

        Show
        Marcus Eriksson added a comment - could we perhaps call the config parameter "internode_(send|recv)_buff_size_in_bytes" instead? makes it a bit clearer (atleast to me) what we are actually configuring other than that, lgtm
        Hide
        Jason Brown added a comment -

        Marcus, yeah, that's a useful modification . +1 on renaming the yaml property name (and the downstream changes changes in Config/DD).

        Show
        Jason Brown added a comment - Marcus, yeah, that's a useful modification . +1 on renaming the yaml property name (and the downstream changes changes in Config/DD).
        Hide
        Michał Michalski added a comment - - edited

        I agree, for me internode_(send|recv)_buff_size_in_bytes is much better / descriptive name too. Attaching updated patch.

        Show
        Michał Michalski added a comment - - edited I agree, for me internode_(send|recv)_buff_size_in_bytes is much better / descriptive name too. Attaching updated patch.
        Hide
        Brandon Williams added a comment -

        Committed v4. Thanks!

        Show
        Brandon Williams added a comment - Committed v4. Thanks!

          People

          • Assignee:
            Michał Michalski
            Reporter:
            Brandon Williams
            Reviewer:
            Jason Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development