Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32534

Cannot load a Pipeline Model on a stopped Spark Context

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 2.4.6
    • None
    • Deploy, Kubernetes
    • None
    • Important

    Description

      I am running Spark in a Kubernetes cluster than is running Spark NLP using the Pyspark ML Pipeline Model class to load the model and then transform on the spark dataframe. We run this within a docker container that starts up a spark context, mounts volumes, spins up executors, etc and then does it transformations, udfs, etc and then closes down the spark context. The first time I load the model when my service has just been started, everything is fine. If I run my application for a second time without resetting my service, even though the context is entirely stopped from the previous run and a new one is started up, the Pipeline Model has some attribute in one of its base classes that thinks the context its running on is closed, so then I get a : cannot call a function on a stopped spark context when I try and load the model in my service again. I have to shut down my service each time if I want consecutive runs through my spark pipeline, which is not ideal, so I was wondering if this was a common issue amongst fellow pyspark users that use Pipeline Model, or is there a common work around to resetting all spark contexts or whether the pipeline model caches a spark context of some sort. Any help is very useful. 

       

       
      cls.pipeline = PipelineModel.read().load(NLP_MODEL)
       
      is how I load the model. And our spark context is very similar to a typical kubernetes/spark setup. Nothing special there

      Attachments

        Activity

          People

            Unassigned Unassigned
            kvanlieshout Kevin Van Lieshout
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified