Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14715

Read repairs can result in bogus timeout errors to the client

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Low
    • Resolution: Unresolved
    • None
    • None
    • Low
    • Hide

      ci

      Show
      ci

    Description

      In RepairMergeListener:close() it does the following:

       

      try
      {
          FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout());
      }
      catch (TimeoutException ex)
      {
          // We got all responses, but timed out while repairing
          int blockFor = consistency.blockFor(keyspace);
          if (Tracing.isTracing())
              Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor);
          else
              logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor);
      
          throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
      }
      

      This propagates up and gets sent to the client and we have customers get confused cause they see timeouts for CL ALL requiring ALL replicas even though they have read_repair_chance = 0 and using a LOCAL_* CL.

      At minimum I suggest instead of using the consistency level of DataResolver (which is always ALL with read repairs) for the timeout it instead use repairResults.size(). That is blockFor = repairResults.size() . But saying it received blockFor - 1 is bogus still. Fixing that would require more changes. I was thinking maybe like so:

       

      public static void waitOnFutures(List<AsyncOneResponse> results, long ms, MutableInt counter) throws TimeoutException
      {
          for (AsyncOneResponse result : results)
          {
              result.get(ms, TimeUnit.MILLISECONDS);
              counter.increment();
          }
      }
      

       

       

       

      Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says blockFor - 1 for how many were received, which is also bogus.

       

      Steps used to reproduce was modify RepairMergeListener:close() to always throw timeout exception.  With schema:

      CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;
      
      CREATE TABLE weather.city (
      cityid int PRIMARY KEY,
      name text
      ) WITH bloom_filter_fp_chance = 0.01
      AND dclocal_read_repair_chance = 0.0
      AND read_repair_chance = 0.0
      AND speculative_retry = 'NONE';
      

      Then using the following steps:

      1. ccm node1 cqlsh
      2. INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra');
      3. exit;
      4. ccm node1 flush
      5. ccm node1 stop
      6. rm rf ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big* # remove the sstable with the insert
      7. ccm node1 start
      8. ccm node1 cqlsh
      9. CONSISTENCY LOCAL_QUORUM;
      10. select * from weather.city where cityid = 1;

      You get result of:

      ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 5 responses." info={'received_responses': 5, 'required_responses': 6, 'consistency': 'ALL'}

      But was expecting:

      ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}

      Attachments

        Activity

          People

            smiklosovic Stefan Miklosovic
            cam1982 Cameron Zemek
            Stefan Miklosovic
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m