Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1866

Closure cleaner does not null shadowed fields when outer scope is referenced

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.0.0
    • None
    • Spark Core

    Description

      Take the following example:

      val x = 5
      val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
      sc.parallelize(0 until 10).map { _ =>
        val instances = 3
        (instances, x)
      }.collect
      

      This produces a "java.io.NotSerializableException: org.apache.hadoop.fs.Path", despite the fact that the outer instances is not actually used within the closure. If you change the name of the outer variable instances to something else, the code executes correctly, indicating that it is the fact that the two variables share a name that causes the issue.

      Additionally, if the outer scope is not used (i.e., we do not reference "x" in the above example), the issue does not appear.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ilikerps Aaron Davidson
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment