[CASSANDRA-15726] buffer pool may throw NPE with concurrent release due to in-progress tiny pool eviction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0-beta1, 4.0
Component/s: Legacy/Core
Labels:
None

Bug Category:
Correctness
Severity:
Normal
Complexity:
Normal
Discovered By:
Unit Test
Platform:

All
Impacts:

None
Since Version:

4.0-alpha
Source Control Link:

fb9e74a4fe26eda988c0e98d578f7ded80a8c390
Test and Documentation Plan:

Hide

With patch, cannot reproduce anymore on LongBufferPoolTest

Show
With patch, cannot reproduce anymore on LongBufferPoolTest

Description

This can be reproduced by running LongBufferPoolTest, 1 out 5 runs..

java.lang.NullPointerException
	at org.apache.cassandra.utils.memory.BufferPool$Chunk.access$1300(BufferPool.java:836)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.lambda$remove$1(BufferPool.java:716)
	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.removeIf(BufferPool.java:460)
	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$1500(BufferPool.java:304)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.remove(BufferPool.java:716)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.put(BufferPool.java:590)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.recycle(BufferPool.java:709)
	at org.apache.cassandra.utils.memory.BufferPool$Chunk.recycle(BufferPool.java:909)
	at org.apache.cassandra.utils.memory.BufferPool$Chunk.tryRecycle(BufferPool.java:903)
	at org.apache.cassandra.utils.memory.BufferPool$Chunk.release(BufferPool.java:896)
	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.removeIf(BufferPool.java:465)
	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$1500(BufferPool.java:304)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunk(BufferPool.java:736)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:725)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:691)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:679)
	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:518)
	at org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:120)                       
	at org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:497)
	at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:558)
	at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:538)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

The cause is that:

When evicting a normal chunk from a full MicroQueueOfChunks, local pool will try to remove corresponding tiny chunks, via MicroQueueOfChunks#removeIf.
If matching tiny chunk is found, tiny chunk.release() is called immediately before moving null chunk to the back of the queue.
Due to concurrent release from different threads, tiny chunk.release() may cause its parent normal chunk, aka. the evicted chunk in #1, to be removed from local pool and causes tiny pool to remove corresponding tiny chunks again in LocalPool#remove().
In MicroQueueOfChunks#removeIf, due to previous in-progress removeIf, it throws NPE as it violate MicroQueueOfChunks's assumption which requires null chunks to be put at the back of queue.

patch

The fix is to put null chunks to the back of queue before releasing any chunks.

Attachments

Activity

People

Assignee:: Zhao Yang

Reporter:: Zhao Yang

Authors:: Zhao Yang

Reviewers:: Aleksey Yeschenko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Apr/20 12:04

Updated:: 21/Dec/20 08:07

Resolved:: 28/Apr/20 17:56