Description
Working with a file in pyspark. These steps work fine:
mydata = sc.textFile("somefile")
myfiltereddata = mydata.filter(some filter)
myfiltereddata.count()
But then I delete "somefile" from the file system and attempt to run
myfiltereddata.count()
This hangs indefinitely. I eventually hit Ctrl-C and it displayed a stack trace. (I will attach the output as a file.)
It works in scala though. If I do the same thing, the last line produces an expected error message:
14/01/14 08:41:43 ERROR Executor: Exception in task ID 4 java.io.FileNotFoundException: File file:somefile does not exist at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)