[CASSANDRA-15743] ActiveRepairService#terminateSessions can leak RepairJob threads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Normal
Resolution: Unresolved
Fix Version/s: None
Component/s: Consistency/Repair
Labels:
None

Bug Category:
Degradation - Resource Management
Severity:
Normal
Complexity:
Normal
Discovered By:
User Report
Platform:

All
Impacts:

None

Description

(We reported this to https://github.com/thelastpickle/cassandra-reaper/issues/898, as the behavior can be triggered by reaper. I will copy-paste here and rephrase slightly..)

We have a fairly big table (240GB per node) where the reaper repairs would kept failing as they get killed by reaper's handlePotentialStuckRepairs, which calls ActiveRepairService#terminateSessions.

On this cluster (with G1GC), we also experience memory leak, where the old gen would keep growing, until JVM has to do minutes-long full GC, which still couldn't recover much memory from the old gen.

From heapdump, we eventually trace the memory leak to dozens of RepairJob threads, each one holding on to hundreds of megabytes of MerkleTrees objects.

The threads would look like this in jmap output (cassandra 3.11.4):

"Repair#30:1" #14352 daemon prio=5 os_prio=0 tid=0x00007f39f6ac7110 nid=0x1f3d waiting on condition [0x00007f7609ce8000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000243c005a8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
        at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509)
        at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
        at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/1896622931.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x0000000243c00658> (a java.util.concurrent.ThreadPoolExecutor$Worker)

After checking the code, we think this is what happens:

1. reaper schedules repair #1 to node A
2. node A requests merkle trees from neighboring node B and C
3. node B finishes validation phase, sends merkle tree to node A
4. node C finishes validation phase, sends merkle tree to node A
5. reaper schedules repair #2, calls `handlePotentialStuckRepairs`
6. node A finishes validation phase
7. node A starts sync phase
8. repair #1 on node A, B, and C all stuck indefinitely, as the executor was already shutdown by `handlePotentialStuckRepairs`, and thus nobody would pick up the sync task

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Yap Sok Ann

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Apr/20 10:53

Updated:: 21/Apr/20 18:33