Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30354

Reducing the number of ThreadPools in LookupFullCache and related cache-loading classes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Table SQL / Runtime
    • None

    Description

      In the course of reviewing FLINK-29405, I came up with a proposal to reduce the complexity of the LookupFullCache implementation and shrinking the amount of threadpools being used from 3 to 2. Here's the proposal I also shared in the FLINK-29405 PR comment:

      About the responsibilities how I see them:

      • LookupFullCache is the composite class for combining the CacheLoader and the CacheReloadTrigger through the ReloadTriggerContext
      • ReloadTriggerContext provides an async call to trigger the reload but also some utility methods for providing processing or event time (where it's not clear to me why this is connected with the reload. It looks like a future task based on the TODO comments)
      • CacheLoader is in charge of loading the data into memory (if possible concurrently).
        CacheReloadTrigger provides different strategies to trigger new reloads.

      About the different executors:

      • The CacheReloadTrigger utilize a SingleThreadScheduledExecutor which triggers ReloadTriggerContext::reload subsequently. If the loading takes longer, subsequently triggered calls pile up. Here, I'm wondering whether that's what we want. thinking
      • CacheLoader utilizes a SingleThreadExecutor in CacheLoader#reloadExecutor which is kind of the "main" thread for reloading the data. It triggers CacheLoader#updateCache with CacheLoader#reloadLock being acquired. (InputFormat)CacheLoader#updateCache is implemented synchronously. The data is loaded concurrently if possible using a FixedThreadPool.

      My proposal is now to reduce the number of used thread pools: Instead of having a SingleThreadExecutor and a FixedThreadPool in the CacheLoader implementation, couldn't we come up with a custom ThreadPoolExecutor where we specify the minimum number of threads being 1 and the maximum being the number of cores (similar to what is already there with ThreadUtils#newThreadPool). That would free the CacheLoader from starting and shutting down thread pools by moving its ownership from CacheLoader to LookupFullCache calling it the cacheLoadingThreadPool (or similar). Additionally, the ScheduledThreadPool currently living in the CacheReloadTrigger implementations could move into LookupFullCache as well calling it something like cacheLoadSchedulingThreadPool. LookupFullCache would be in charge of managing all cache loading-related threads. Additionally, it would manage the current execution through CompletableFutures (one for triggering the reload and one for executing the reload. Triggering a reload would require cancelling the current future (if it's not completed, yet) or ignoring the trigger if we want a reload to finish before triggering a new one.

      CacheLoader#updateCache would become CacheLoader#updateCacheAsync(ExecutorService) returning a CompletableFuture that completes as soon as all subtasks are completed. CacheLoader#reloadAsync would return this CompletableFuture instead of creating its own future. The lifecycle (as already explained in the previous paragraph) would be managed by LookupFullCache. The benefit would be that we wouldn't have to deal interrupts in CacheLoader.

      I see the following benefits:

      • ReloadtriggerContext becomes obsolete (one has to clarify what the event time and processing time functions are for, though).
      • CacheLoader#awaitFirstLoad becomes obsolete as well. We can verify the completion of the cache loading in LookupFullCache through the CompletableFuture instances.
      • CacheReloadTrigger can focus on the strategy implementation without worrying about instantiating threads. This is duplicated code right now in PeriodicCacheReloadTrigger and TimedCacheReloadTrigger.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: