Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-12200

Backlogged compactions can make repair on trivially small tables waiting for a long time to finish

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Legacy/Core
    • None

    Description

      In C* 3.0 we started to use incremental repair by default. However, this seems to create a repair performance problem if you have a relatively write-heavy workload that can drive all available concurrent_compactors to be used by active compactions.

      I was able to demonstrate this issue by the following scenario:

      1. On a three-node C* 3.0.7 cluster, use "cassandra-stress write n=100000000" to generate 100GB of data with keyspace1.standard1 table using LCS (ctrl+c the stress client once the data size on each node reaches 35+GB).
      2. At this point, there will be hundreds of L0 SSTables waiting for LCS to digest on each node, and with concurrent_compactors set to default at 2, the two compaction threads are constantly busy processing the backlogged L0 SSTables.
      3. Now create a new keyspace called "trivial_ks" with RF=3 and create a small two-column CQL table in it, and insert 6 records.
      4. Start a "nodetool repair trivial_ks" session on one of the nodes, and watch the following behavior:

      automaton@wdengdse50google-98425b985-3:~$ nodetool repair trivial_ks
      [2016-07-13 01:57:28,364] Starting repair command #1, repairing keyspace trivial_ks with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
      [2016-07-13 01:57:31,027] Repair session 27212dd0-489d-11e6-a6d6-cd06faa0aaa2 for range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] finished (progress: 66%)
      [2016-07-13 02:07:47,637] Repair completed successfully
      [2016-07-13 02:07:47,657] Repair command #1 finished in 10 minutes 19 seconds
      

      Basically for such a small table it took 10+ minutes to finish the repair. Looking at debug.log for this particular repair session UUID, you will find that all nodes were able to pass through validation compaction within 15ms, but one of the nodes actually got stuck waiting for a compaction slot because it has to do an anti-compaction step before it can finally tell the initiating node that it's done with its part of the repair session, so it took 10+ minutes for one compaction slot to be freed up, like shown in the following debug.log entries:

      DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  RepairMessageVerbHandler.java:149 - Got anticompaction request AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2} org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
      <...>
      <snip>
      <...>
      DEBUG [CompactionExecutor:5] 2016-07-13 02:07:47,506  CompactionTask.java:217 - Compacted (286609e0-489d-11e6-9e03-1fd69c5ec46c) 32 sstables to [/var/lib/cassandra/data/keyspace1/standard1-9c02e9c1487c11e6b9161dbd340a212f/mb-499-big,] to level=0.  2,892,058,050 bytes to 2,874,333,820 (~99% of original) in 616,880ms = 4.443617MB/s.  0 total partitions merged to 12,233,340.  Partition merge counts were {1:12086760, 2:146580, }
      INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest on 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')] sstables
      INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  CompactionManager.java:540 - SSTable BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting
      INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  CompactionManager.java:578 - Completed anticompaction successfully
      

      Since validation compaction has its own threads outside of the regular compaction thread pool restricted by concurrent_compactors, we were able to pass through validation compaction without any issue. If we could treat anti-compaction the same way (i.e. to give it its own thread pool), we can avoid this kind of repair performance problem from happening.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              weideng Wei Deng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: