Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15726

buffer pool may throw NPE with concurrent release due to in-progress tiny pool eviction

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 4.0, 4.0-beta1
    • Component/s: Legacy/Core
    • Labels:
      None
    • Bug Category:
      Correctness
    • Severity:
      Normal
    • Complexity:
      Normal
    • Discovered By:
      Unit Test
    • Platform:
      All
    • Impacts:
      None
    • Since Version:
    • Test and Documentation Plan:
      Hide

      With patch, cannot reproduce anymore on LongBufferPoolTest

      Show
      With patch, cannot reproduce anymore on LongBufferPoolTest

      Description

      This can be reproduced by running LongBufferPoolTest, 1 out 5 runs..

      java.lang.NullPointerException
      	at org.apache.cassandra.utils.memory.BufferPool$Chunk.access$1300(BufferPool.java:836)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.lambda$remove$1(BufferPool.java:716)
      	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.removeIf(BufferPool.java:460)
      	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$1500(BufferPool.java:304)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.remove(BufferPool.java:716)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.put(BufferPool.java:590)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.recycle(BufferPool.java:709)
      	at org.apache.cassandra.utils.memory.BufferPool$Chunk.recycle(BufferPool.java:909)
      	at org.apache.cassandra.utils.memory.BufferPool$Chunk.tryRecycle(BufferPool.java:903)
      	at org.apache.cassandra.utils.memory.BufferPool$Chunk.release(BufferPool.java:896)
      	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.removeIf(BufferPool.java:465)
      	at org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$1500(BufferPool.java:304)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunk(BufferPool.java:736)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:725)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:691)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:679)
      	at org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:518)
      	at org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:120)                       
      	at org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:497)
      	at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:558)
      	at org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:538)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      	at java.lang.Thread.run(Thread.java:748)
      

      The cause is that:

      • When evicting a normal chunk from a full MicroQueueOfChunks, local pool will try to remove corresponding tiny chunks, via MicroQueueOfChunks#removeIf.
      • If matching tiny chunk is found, tiny chunk.release() is called immediately before moving null chunk to the back of the queue.
      • Due to concurrent release from different threads, tiny chunk.release() may cause its parent normal chunk, aka. the evicted chunk in #1, to be removed from local pool and causes tiny pool to remove corresponding tiny chunks again in LocalPool#remove().
      • In MicroQueueOfChunks#removeIf, due to previous in-progress removeIf, it throws NPE as it violate MicroQueueOfChunks's assumption which requires null chunks to be put at the back of queue.

       

      The fix is to put null chunks to the back of queue before releasing any chunks.

        Attachments

          Activity

            People

            • Assignee:
              jasonstack Zhao Yang
              Reporter:
              jasonstack Zhao Yang
              Authors:
              Zhao Yang
              Reviewers:
              Aleksey Yeschenko
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: