Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15743

ActiveRepairService#terminateSessions can leak RepairJob threads



    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Consistency/Repair
    • None
    • Degradation - Resource Management
    • Normal
    • Normal
    • User Report
    • All
    • None


      (We reported this to https://github.com/thelastpickle/cassandra-reaper/issues/898, as the behavior can be triggered by reaper. I will copy-paste here and rephrase slightly..)

      We have a fairly big table (240GB per node) where the reaper repairs would kept failing as they get killed by reaper's handlePotentialStuckRepairs, which calls ActiveRepairService#terminateSessions.

      On this cluster (with G1GC), we also experience memory leak, where the old gen would keep growing, until JVM has to do minutes-long full GC, which still couldn't recover much memory from the old gen.

      From heapdump, we eventually trace the memory leak to dozens of RepairJob threads, each one holding on to hundreds of megabytes of MerkleTrees objects.

      The threads would look like this in jmap output (cassandra 3.11.4):

      "Repair#30:1" #14352 daemon prio=5 os_prio=0 tid=0x00007f39f6ac7110 nid=0x1f3d waiting on condition [0x00007f7609ce8000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x0000000243c005a8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
              at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
              at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
              at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
              at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
              at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
              at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/1896622931.run(Unknown Source)
              at java.lang.Thread.run(Thread.java:748)
         Locked ownable synchronizers:
              - <0x0000000243c00658> (a java.util.concurrent.ThreadPoolExecutor$Worker)

      After checking the code, we think this is what happens:

      1. reaper schedules repair #1 to node A
      2. node A requests merkle trees from neighboring node B and C
      3. node B finishes validation phase, sends merkle tree to node A
      4. node C finishes validation phase, sends merkle tree to node A
      5. reaper schedules repair #2, calls `handlePotentialStuckRepairs`
      6. node A finishes validation phase
      7. node A starts sync phase
      8. repair #1 on node A, B, and C all stuck indefinitely, as the executor was already shutdown by `handlePotentialStuckRepairs`, and thus nobody would pick up the sync task




            Unassigned Unassigned
            sayap Yap Sok Ann
            0 Vote for this issue
            3 Start watching this issue