[CASSANDRA-14764] Test Messaging Refactor with: 12 Node Breaking Point, compression=none, encryption=none, coalescing=off - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Normal
Resolution: Done
Fix Version/s: 4.0-rc1, 4.0
Component/s: Legacy/Streaming and Messaging
Labels:
None

Complexity:
Normal

Description

Setup:

Cassandra: 12 (2*6) node i3.xlarge AWS instance (4 cpu cores, 30GB ram) running cassandra trunk off of jasobrown/14503 jdd7ec5a2 (Jasons patched internode messaging branch) vs the same footprint running 3.0.17
Two datacenters with 100ms latency between them
No compression, encryption, or coalescing turned on

Test #1:

ndbench sent 1.5k QPS at a coordinator level to one datacenter (RF=3*2 = 6 so 3k global replica QPS) of 4kb single partition BATCH mutations at LOCAL_ONE. This represents about 250 QPS per coordinator in the first datacenter or 60 QPS per core. The goal was to observe P99 write and read latencies under various QPS.

Result:

The good news is since the ~~CASSANDRA-14503~~ changes, instead of keeping the mutations on heap we put the message into hints instead and don't run out of memory. The bad news is that the MessagingService-NettyOutbound-Thread's would occasionally enter a degraded state where they would just spin on a core. I've attached flame graphs showing the CPU state as jasobrown applied fixes to the OutboundMessagingConnection class.

Follow Ups:
jasobrown has committed a number of fixes onto his jasobrown/14503-collab branch including:
1. Limiting the amount of time spent dequeuing messages if they are expired (previously if messages entered the queue faster than we could dequeue them we'd just inifinte loop on the consumer side)
2. Don't call dequeueMessages from within dequeueMessages created callbacks.

We're continuing to use CPU flamegraphs to figure out where we're looping and fixing bugs as we find them.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

i-03341e1c52de6ea3e-after-queue-change.svg
21/Sep/18 18:13
1.13 MB
Joey Lynch
i-07cd92e844d66d801-after-queue-bound.svg
19/Sep/18 01:44
442 kB
Joey Lynch
i-07cd92e844d66d801-hint-play.svg
21/Sep/18 22:23
219 kB
Joey Lynch
i-07cd92e844d66d801-uninlined-with-jvm-methods.svg
19/Sep/18 17:29
491 kB
Joey Lynch
ttop.txt
19/Sep/18 01:44
2 kB
Joey Lynch

Activity

People

Assignee:: Vinay Chella

Reporter:: Joey Lynch

Authors:: Vinay Chella

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 19/Sep/18 01:41

Updated:: 31/Jul/21 21:35

Resolved:: 19/Mar/21 11:09