[SPARK-23015] spark-submit fails when submitting several jobs in parallel - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
Fix Version/s: 4.0.0
Component/s: Spark Submit
Labels:
- bulk-closed
- pull-request-available
Environment:

Windows 10 (1709/16299.125)
Spark 2.3.0
Java 8, Update 151

Description

Spark Submit's launching library prints the command to execute the launcher (org.apache.spark.launcher.main) to a temporary text file, reads the result back into a variable, and then executes that command.

set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%

bin/spark-class2.cmd, L67

That temporary text file is given a pseudo-random name by the %RANDOM% env variable generator, which generates a number between 0 and 32767.

This appears to be the cause of an error occurring when several spark-submit jobs are launched simultaneously. The following error is returned from stderr:

The process cannot access the file because it is being used by another process. The system cannot find the file
USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
The process cannot access the file because it is being used by another process.

My hypothesis is that %RANDOM% is returning the same value for multiple jobs, causing the launcher library to attempt to write to the same file from multiple processes. Another mechanism is needed for reliably generating the names of the temporary files so that the concurrency issue is resolved.

Attachments

Issue Links

links to

GitHub Pull Request #43706

Activity

People

Assignee:: Unassigned

Reporter:: Hugh Zabriskie

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Jan/18 23:06

Updated:: 29/May/24 02:09

Resolved:: 29/May/24 02:09