Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30856

SQLContext retains reference to unusable instance after SparkContext restarted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.5
    • 3.1.0
    • PySpark, SQL
    • None

    Description

      When the underlying SQLContext is instantiated for a SparkSession, the instance is saved as a class attribute and returned from subsequent calls to SQLContext.getOrCreate(). If the SparkContext is stopped and a new one started, the SQLContext class attribute is never cleared so any code which calls SQLContext.getOrCreate() will get a SQLContext with a reference to the old, unusable SparkContext.

      A similar issue was identified and fixed for SparkSession in SPARK-19055, but the fix did not change SQLContext as well. I ran into this because mllib still uses SQLContext.getOrCreate() under the hood.

      I've already written a fix for this, which I'll be sharing in a PR, that clears the class attribute on SQLContext when the SparkSession is stopped. Another option would be to deprecate SQLContext.getOrCreate() entirely since the corresponding Scala method is itself deprecated. That seems like a larger change for a relatively minor issue, however.

      Attachments

        Issue Links

          Activity

            People

              afavaro Alex Favaro
              afavaro Alex Favaro
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: