Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50118

Spark removes working directory while Python UDF runs

    XMLWordPrintableJSON

Details

    Description

      With Spark Connect + PySpark, we can stage files using `spark.addArtifacts`. When a Python UDF is executed, the working directory is set to a folder with the corresponding artifacts available.

      I have observed on large scale jobs with long running tasks (>45 mins) that Spark sometimes removes that working directory, even though UDF tasks are still running. This can be seen by periodically running `os.getcwd()` in the UDF, which raises `FileNotFoundError`.

      This seems to coincide with log records indicating 'Session evicted: <uuid>`, from `isolatedSessionCache`. There is a 30 minute timeout here that might be to blame.

      I have not yet been able to write a simple program to reproduce. I suspect that there might be a conjunction of multiple events, such as when a task is scheduled on an executor 30 mins after the last task started. https://issues.apache.org/jira/browse/SPARK-44290 might be relevant.

      cc gurwls223 

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              peay2 Peter Andrew
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: