Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-13863

Speculative retry causes read repair even if read_repair_chance is 0.0.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Legacy/Coordination
    • None

    Description

      read_repair_chance = 0.0 and dclocal_read_repair_chance = 0.0 should cause no read repair, but read repair happens with speculative retry. I think read_repair_chance = 0.0 and dclocal_read_repair_chance = 0.0 should stop read repair completely because the user wants to stop read repair in some cases.

      Case 1: TWCS users

      The documentation states how to disable read repair.

      While TWCS tries to minimize the impact of comingled data, users should attempt to avoid this behavior. Specifically, users should avoid queries that explicitly set the timestamp via CQL USING TIMESTAMP. Additionally, users should run frequent repairs (which streams data in such a way that it does not become comingled), and disable background read repair by setting the table’s read_repair_chance and dclocal_read_repair_chance to 0.

      Case 2. Strict SLA for read latency

      In a peak time, read latency is a key for us but, read repair causes latency higher than no read repair. We can use anti entropy repair in off peak time for consistency.

      Here is my procedure to reproduce the problem.

      1. Create a cluster and set hinted_handoff_enabled to false.

      $ ccm create -v 3.0.14 -n 3 cluster_3.0.14
      $ for h in $(seq 1 3) ; do perl -pi -e 's/hinted_handoff_enabled: true/hinted_handoff_enabled: false/' ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done
      $ for h in $(seq 1 3) ; do grep "hinted_handoff_enabled:" ~/.ccm/cluster_3.0.14/node$h/conf/cassandra.yaml ; done
      hinted_handoff_enabled: false
      hinted_handoff_enabled: false
      hinted_handoff_enabled: false
      $ ccm start

      2. Create a keyspace and a table.

      $ ccm node1 cqlsh
      DROP KEYSPACE IF EXISTS ks1;
      CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
      CREATE TABLE ks1.t1 (
              key text PRIMARY KEY,
              value blob
          ) WITH bloom_filter_fp_chance = 0.01
              AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
              AND comment = ''
              AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
              AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
              AND crc_check_chance = 1.0
              AND dclocal_read_repair_chance = 0.0
              AND default_time_to_live = 0
              AND gc_grace_seconds = 864000
              AND max_index_interval = 2048
              AND memtable_flush_period_in_ms = 0
              AND min_index_interval = 128
              AND read_repair_chance = 0.0
              AND speculative_retry = 'ALWAYS';
      QUIT;
      

      3. Stop node2 and node3. Insert a row.

      $ ccm node3 stop && ccm node2 stop && ccm status
      Cluster: 'cluster_3.0.14'
      ----------------------
      node1: UP
      node3: DOWN
      node2: DOWN
      
      $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1));"
      Current consistency level is ONE.
      Now Tracing is enabled
      
      Tracing session: 01d74590-97cb-11e7-8ea7-c1bd4d549501
      
       activity                                                                                            | timestamp                  | source    | source_elapsed
      -----------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                        Execute CQL3 query | 2017-09-12 23:59:42.316000 | 127.0.0.1 |              0
       Parsing insert into ks1.t1 (key, value) values ('mmullass', bigintAsBlob(1)); [SharedPool-Worker-1] | 2017-09-12 23:59:42.319000 | 127.0.0.1 |           4323
                                                                 Preparing statement [SharedPool-Worker-1] | 2017-09-12 23:59:42.320000 | 127.0.0.1 |           5250
                                                   Determining replicas for mutation [SharedPool-Worker-1] | 2017-09-12 23:59:42.327000 | 127.0.0.1 |          11886
                                                              Appending to commitlog [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 |          12195
                                                               Adding to t1 memtable [SharedPool-Worker-3] | 2017-09-12 23:59:42.327000 | 127.0.0.1 |          12392
                                                                                          Request complete | 2017-09-12 23:59:42.328680 | 127.0.0.1 |          12680
      
      
      $ ccm node1 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';"
      Current consistency level is ONE.
      Now Tracing is enabled
      
       key      | value
      ----------+--------------------
       mmullass | 0x0000000000000001
      
      (1 rows)
      
      Tracing session: 3420ce90-97cb-11e7-8ea7-c1bd4d549501
      
       activity                                                                   | timestamp                  | source    | source_elapsed
      ----------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                               Execute CQL3 query | 2017-09-13 00:01:06.681000 | 127.0.0.1 |              0
       Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 |            296
                                        Preparing statement [SharedPool-Worker-1] | 2017-09-13 00:01:06.681000 | 127.0.0.1 |            561
                     Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 |           1056
                               Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 |           1142
                                  Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 |           1206
                          Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:01:06.682000 | 127.0.0.1 |           1455
                                                                 Request complete | 2017-09-13 00:01:06.682794 | 127.0.0.1 |           1794
      

      4. Start node2 and confirm node2 has no data.

      $ ccm node2 start && ccm status
      Cluster: 'cluster_3.0.14'
      -------------------------
      node1: UP
      node3: DOWN
      node2: UP
      
      $ ccm node2 nodetool flush
      $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db
      ls: /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db: No such file or directory
      

      5. Select the row from node2 and read repair works.

      $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';"
      Current consistency level is ONE.
      Now Tracing is enabled
      
       key | value
      -----+-------
      
      (0 rows)
      
      Tracing session: 72a71fc0-97cb-11e7-83cc-a3af9d3da979
      
       activity                                                                                                                                                                                                                                | timestamp                  | source    | source_elapsed
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                                                                                                                                            Execute CQL3 query | 2017-09-13 00:02:51.582000 | 127.0.0.2 |              0
                                                                                                                                                                    Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 |           1112
                                                                                                                                                                                                     Preparing statement [SharedPool-Worker-2] | 2017-09-13 00:02:51.583000 | 127.0.0.2 |           1412
                                                                                                                                                                                            reading data from /127.0.0.1 [SharedPool-Worker-2] | 2017-09-13 00:02:51.584000 | 127.0.0.2 |           2107
                                                                                                                                                                                  Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 |           3492
                                                                                                                                                                     Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 |           3516
                                                                                                                                                                                            Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:02:51.585000 | 127.0.0.2 |           3595
                                                                                                                                                                                               Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 |           3673
                                                                                                                                                                                       Read 0 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:02:51.585001 | 127.0.0.2 |           3851
                                                                                                                                                                  READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:02:51.588000 | 127.0.0.1 |             33
                                                                                                                                                                                            Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 |          12444
                                                                                                                                                                                               Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 |          12536
                                                                                                                                                                                       Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 |          12765
                                                                                                                                                                                        Enqueuing response to /127.0.0.2 [SharedPool-Worker-2] | 2017-09-13 00:02:51.600000 | 127.0.0.1 |          12929
                                                                                                                                                         Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:02:51.602000 | 127.0.0.1 |          14686
                                                                                                                                                      REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:02:51.603000 | 127.0.0.2 |             --
                                                                                                                                                                                     Processing response from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 |             --
                                                                                                                                                                                                  Initiating read-repair [SharedPool-Worker-3] | 2017-09-13 00:02:51.610000 | 127.0.0.2 |             --
       Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-4886857781295767937, 6d6d756c6c617373) (d41d8cd98f00b204e9800998ecf8427e vs f8e0f9262a889cd3ebf4e5d50159757b) [ReadRepairStage:1] | 2017-09-13 00:02:51.624000 | 127.0.0.2 |             --
                                                                                                                                                                                                                              Request complete | 2017-09-13 00:02:51.586892 | 127.0.0.2 |           4892
      

      6. As a result, node2 has the row.

      $ ccm node2 cqlsh -k ks1 -e "consistency; tracing on; select * from ks1.t1 where key = 'mmullass';"
      Current consistency level is ONE.
      Now Tracing is enabled
      
       key      | value
      ----------+--------------------
       mmullass | 0x0000000000000001
      
      (1 rows)
      
      Tracing session: 78526330-97cb-11e7-83cc-a3af9d3da979
      
       activity                                                                                 | timestamp                  | source    | source_elapsed
      ------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                             Execute CQL3 query | 2017-09-13 00:03:01.091000 | 127.0.0.2 |              0
                     Parsing select * from ks1.t1 where key = 'mmullass'; [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 |            216
                                                      Preparing statement [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 |            390
                                             reading data from /127.0.0.1 [SharedPool-Worker-3] | 2017-09-13 00:03:01.091000 | 127.0.0.2 |            808
                                   Executing single-partition query on t1 [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 |           1041
                   READ message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2017-09-13 00:03:01.092000 | 127.0.0.1 |             33
                      Sending READ message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2017-09-13 00:03:01.092000 | 127.0.0.2 |           1036
                                   Executing single-partition query on t1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 |            189
                                             Acquiring sstable references [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 |           1113
                                             Acquiring sstable references [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 |            276
                                                Merging memtable contents [SharedPool-Worker-2] | 2017-09-13 00:03:01.092000 | 127.0.0.2 |           1172
                                                Merging memtable contents [SharedPool-Worker-1] | 2017-09-13 00:03:01.092000 | 127.0.0.1 |            332
       REQUEST_RESPONSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2017-09-13 00:03:01.093000 | 127.0.0.2 |             --
                                        Read 1 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 |            565
                                         Enqueuing response to /127.0.0.2 [SharedPool-Worker-1] | 2017-09-13 00:03:01.093000 | 127.0.0.1 |            648
          Sending REQUEST_RESPONSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2017-09-13 00:03:01.093000 | 127.0.0.1 |            783
                                      Processing response from /127.0.0.1 [SharedPool-Worker-1] | 2017-09-13 00:03:01.094000 | 127.0.0.2 |             --
                                                   Initiating read-repair [SharedPool-Worker-1] | 2017-09-13 00:03:01.099000 | 127.0.0.2 |             --
                                        Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-09-13 00:03:01.101000 | 127.0.0.2 |          10113
                                                                               Request complete | 2017-09-13 00:03:01.092830 | 127.0.0.2 |           1830
      
      $ ccm node2 nodetool flush
      $ ls ~/.ccm/cluster_3.0.14/node2/data0/ks1/t1-*/*-Data.db
      /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db
      
      $ ~/.ccm/repository/3.0.14/tools/bin/sstabledump /Users/hiwakaba/.ccm/cluster_3.0.14/node2/data0/ks1/t1-ec659e0097ca11e78ea7c1bd4d549501/mc-1-big-Data.db -k mmullass
      [
        {
          "partition" : {
            "key" : [ "mmullass" ],
            "position" : 0
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 36,
              "liveness_info" : { "tstamp" : "2017-09-12T14:59:42.312969Z" },
              "cells" : [
                { "name" : "value", "value" : "0000000000000001" }
              ]
            }
          ]
        }
      ]
      

      In CASSANDRA-11409, cam1982 commented this was not a bug. So I filed this issue as an improvement.

      Attachments

        1. 0001-Use-read_repair_chance-when-starting-repairs-due-to-.patch
          2 kB
          Murukesh Mohanan
        2. speculative retries.pdf
          830 kB
          Shogo Hoshii

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hiwkby Hiro Wakabayashi
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: