Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15481

Concurrency bug in RemoteIndexCache leads to IOException



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.6.0
    • 3.7.0, 3.6.1
    • None
    • None


      RemoteIndexCache has a concurrency bug which leads to IOException while fetching data from remote tier.

      Below events in order of timeline -

      Thread 1 (cache thread): invalidates the entry, removalListener is invoked async, so the files have not been renamed to "deleted" suffix yet.

      Thread 2: (fetch thread): tries to find entry in cache, doesn't find it because it has been removed by 1, fetches the entry from S3, writes it to existing file (using replace existing)

      Thread 1: async removalListener is invoked, acquires a lock on old entry (which has been removed from cache), it renames the file to "deleted" and starts deleting it

      Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM returns an error as it won't allow creation of 2GB random access file.

      Potential Fix
      Use EvictionListener instead of RemovalListener in Caffeine cache as per the documentation:

       When the operation must be performed synchronously with eviction, use Caffeine.evictionListener(RemovalListener) instead. This listener will only be notified when RemovalCause.wasEvicted() is true. For an explicit removal, Cache.asMap() offers compute methods that are performed atomically.

      This will ensure that removal from cache and marking the file with delete suffix is synchronously done, hence the above race condition will not occur.


        Issue Links



              jeel Jeel Jotaniya
              divijvaidya Divij Vaidya
              0 Vote for this issue
              7 Start watching this issue