[SPARK-1025] pyspark hangs when parent base file is unavailable - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.1, 0.9.0
Fix Version/s: 0.9.1
Component/s: PySpark
Labels:
None

Description

Working with a file in pyspark. These steps work fine:

mydata = sc.textFile("somefile")
myfiltereddata = mydata.filter(some filter)
myfiltereddata.count()

But then I delete "somefile" from the file system and attempt to run

myfiltereddata.count()

This hangs indefinitely. I eventually hit Ctrl-C and it displayed a stack trace. (I will attach the output as a file.)

It works in scala though. If I do the same thing, the last line produces an expected error message:

14/01/14 08:41:43 ERROR Executor: Exception in task ID 4
java.io.FileNotFoundException: File file:somefile does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)

Attachments

Activity

People

Assignee:: Josh Rosen

Reporter:: Diana Carroll

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Jan/14 08:55

Updated:: 26/Jan/14 00:23

Resolved:: 26/Jan/14 00:23