Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8352

Timeout Exception on Node Failure in Remote Data Center

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • Unix, Cassandra 2.0.3

    • Normal

    Description

      We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 <Cassandra-pid>, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~).

      Questions:
      1. We need to understand why reads fail on DC1 when a node in another DC i.e. DC2 fails? As we are using LOCAL_QUORUM for both reads/writes in DC1, request should return once 2 nodes in local DC have replied instead of timing out because of node in remote DC.
      2. We want to make sure that no Cassandra requests fail in case of node failures. We used rapid read protection of ALWAYS/99percentile/10ms as mentioned in http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2. But nothing worked. How to ensure zero request failures in case a node fails?
      3. What is the right way of handling HTimedOutException exceptions in Hector?
      4. Please confirm are we using public private hostnames as expected?

      We are using Cassandra 2.0.3.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Akhtar_ecil Akhtar Hussain
            Anuj Wadehra
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: