Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-8611

give streaming_socket_timeout_in_ms a non-zero default

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.1.10, 2.2.2, 3.0 beta 2
    • None
    • None

    Description

      Sometimes as mentioned in CASSANDRA-8472 streams will hang. We have streaming_socket_timeout_in_ms which can retry after a timeout. It would be good to make a default non-zero value. We don't want to paper over problems, but streams sometimes hang and you don't want long running streaming operations to just fail - as in repairs or bootstraps.

      streaming_socket_timeout_in_ms should be based on the tcp idle timeout so it shouldn't be a problem to set it to on the order of minutes. Also the socket should only be open during the actual streaming and not during operations such as merkle tree generation. We can set it to a conservative value and people can set it more aggressively as needed. Disabling as a default, in my opinion, is too conservative.

      Attachments

        Activity

          jeromatron Jeremy Hanna added a comment -

          jblangston@datastax.com mentioned that it would be good to make sure that it does time out and that it gives an understandable message in the logs when it does time out. That way, it can be tracked in case of troubleshooting socket timeouts/resets at the router level, for example.

          jeromatron Jeremy Hanna added a comment - jblangston@datastax.com mentioned that it would be good to make sure that it does time out and that it gives an understandable message in the logs when it does time out. That way, it can be tracked in case of troubleshooting socket timeouts/resets at the router level, for example.

          This bites most production bootstraps that I encounter, especially on the cloud. Are there any downsides to improving this bad default?

          sebastian.estevez@datastax.com Sebastian Estevez added a comment - This bites most production bootstraps that I encounter, especially on the cloud. Are there any downsides to improving this bad default?
          blerer Benjamin Lerer added a comment -

          yukim, pauloricardomg any suggestion for the default value? My knowledge of streaming is limited.

          blerer Benjamin Lerer added a comment - yukim , pauloricardomg any suggestion for the default value? My knowledge of streaming is limited.
          pauloricardomg Paulo Motta added a comment -

          If we want to be really conservative, how about setitng it to default linux tcp_keepalive_time of 7200 seconds (two hours)? Given that I have seen streams hang on EC2 for tens of hours or even days, this should be sufficient to catch the most extreme scenarios, while still allowing operators to set it to a lower value if they want to. If this is too conservative, maybe we can set it to 10-30 minutes.

          pauloricardomg Paulo Motta added a comment - If we want to be really conservative, how about setitng it to default linux tcp_keepalive_time of 7200 seconds (two hours)? Given that I have seen streams hang on EC2 for tens of hours or even days, this should be sufficient to catch the most extreme scenarios, while still allowing operators to set it to a lower value if they want to. If this is too conservative, maybe we can set it to 10-30 minutes.
          elubow Eric Lubow added a comment -

          I've seen streams hang for days on EC2 as well. This can be especially problematic when you are trying to add capacity. Typically if nothing has happened in an hour, then it's probably the result of a hung stream and waiting another hour doesn't serve to benefit much. The one thing to keep in mind for a timeout of two hours is that on smaller datasets, the timeout for the stream is going to be longer than the entire bootstrap of the machine would take. I think it would be safe to bring thing down to an hour which is also still very conservative.

          elubow Eric Lubow added a comment - I've seen streams hang for days on EC2 as well. This can be especially problematic when you are trying to add capacity. Typically if nothing has happened in an hour, then it's probably the result of a hung stream and waiting another hour doesn't serve to benefit much. The one thing to keep in mind for a timeout of two hours is that on smaller datasets, the timeout for the stream is going to be longer than the entire bootstrap of the machine would take. I think it would be safe to bring thing down to an hour which is also still very conservative.
          rcoli Robert Coli added a comment -

          Attaching a patch which sets this timeout to 10 minutes. Rationale is as follows :

          • Streams continue to hang in normal operation.
          • Operators want hung streams to restart faster than they could (notice they were hung and then) restart them by restarting the node. This is in the order of 10 minutes for a typical node.
          • Re-streaming 10 minutes worth of data is not prohibitive; at the default throttle of 25megabytes/second, it's "only" 15gb.

          Patch was created before seeing above discussion, but seems to be within the bounds discussed above.

          rcoli Robert Coli added a comment - Attaching a patch which sets this timeout to 10 minutes. Rationale is as follows : Streams continue to hang in normal operation. Operators want hung streams to restart faster than they could (notice they were hung and then) restart them by restarting the node. This is in the order of 10 minutes for a typical node. Re-streaming 10 minutes worth of data is not prohibitive; at the default throttle of 25megabytes/second, it's "only" 15gb. Patch was created before seeing above discussion, but seems to be within the bounds discussed above.
          blerer Benjamin Lerer added a comment -

          What we are looking for is a safety net, not something too aggressive. Based on the discussion, I am on favor of setting it to 1 hour. We can still lower it down in the future if needed.
          rcoli can you provide another patch for 2.1?

          blerer Benjamin Lerer added a comment - What we are looking for is a safety net, not something too aggressive. Based on the discussion, I am on favor of setting it to 1 hour. We can still lower it down in the future if needed. rcoli can you provide another patch for 2.1?
          rcoli Robert Coli added a comment -

          Attached two patches, one against trunk and one against 2.1, which default to 3600000 aka 1 hour.

          rcoli Robert Coli added a comment - Attached two patches, one against trunk and one against 2.1, which default to 3600000 aka 1 hour.
          blerer Benjamin Lerer added a comment -

          Thanks for the patch.

          • the results for the unit tests for 2.1 are here
          • the results for the dtests for 2.1 are here
          • the results for the unit tests for 2.2 are here
          • the results for the dtests for 2.2 are here
          • the results for the unit tests for 3.0 are here
          • the results for the dtests for 3.0 are here

          LGTM

          blerer Benjamin Lerer added a comment - Thanks for the patch. the results for the unit tests for 2.1 are here the results for the dtests for 2.1 are here the results for the unit tests for 2.2 are here the results for the dtests for 2.2 are here the results for the unit tests for 3.0 are here the results for the dtests for 3.0 are here LGTM
          blerer Benjamin Lerer added a comment -

          commited: 7e1ea4c8c1af0809b990c27648edbff2efb2434a

          blerer Benjamin Lerer added a comment - commited: 7e1ea4c8c1af0809b990c27648edbff2efb2434a

          People

            rcoli Robert Coli
            jeromatron Jeremy Hanna
            Robert Coli
            Benjamin Lerer
            Votes:
            5 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: