Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43300

Cascade failure in Guava cache due to fate-sharing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • Spark Core
    • None

    Description

      Guava cache is widely used in spark, however, it suffers from fate-sharing behavior: If there are multiple requests trying to access the same key in the cache at the same time when the key is not in the cache, Guava cache will block all requests and create the object only once. If the creation fails, all requests will fail immediately without retry. So we might see task failure due to irrelevant failure in other queries due to fate sharing.

      This fate sharing behavior might lead to unexpected results in some situation.

      We can wrap around Guava cache with a KeyLock to synchronize all requests with the same key, so they will run individually and fail as if they come one at a time.

      Attachments

        Activity

          People

            liuzq12 Ziqi Liu
            liuzq12 Ziqi Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: