Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26851

CachedRDDBuilder only partially implements double-checked locking

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0, 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In CachedRDDBuilder, cachedColumnBuffers uses double-checked locking to lazily initialize _cachedColumnBuffers. Also, clearCache uses double-checked locking to likely avoid synchronization when _cachedColumnBuffers is still null.

      However, the resource (in this case, _cachedColumnBuffers) is not declared as volatile, which could cause some visibility problems, particularly in clearCache, which may see null reference when actually there is an RDD.

      From Java Concurrency in Practice by Brian Goetz et al:

      Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads.

      There are comments in other documentation that volatile is not needed if the resourceĀ is immutable. While an RDD is immutable from a Spark user's point of view, it may not be from a JVM's point of view, since not all internal fields are final.

      I've marked this as minor since the race conditions are highly unlikely.

        Attachments

          Activity

            People

            • Assignee:
              bersprockets Bruce Robbins
              Reporter:
              bersprockets Bruce Robbins
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: