Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.6.1
-
None
-
None
-
AWS EC2
Description
Downloaded 1.6.1 for Hadoop 2.4.
I used the spark-ec2 script to create a cluster and I'm running into an issue which prevents importing sqlContext. Reading prior reports I looked at the output to find the first error:
java.lang.RuntimeException: java.io.IOException: Filesystem closed
Not sure how to diagnose this. Exiting the Spark REPL and reentering, every subsequent time I get this error:
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-x-x
I assume that some env script is specifying this, since /tmp/hive doesn't exist. I thought that this would be taken care of by the spark-ec2 script so you could just go to town.
I have no experience with HDFS. I have used Spark on Cassandra and on of S3, but I've never deployed it myself. I tried this:
root@ip-172-31-57-109 ephemeral-hdfs]$ bin/hadoop fs -ls
Warning: $HADOOP_HOME is deprecated.
ls: Cannot access .: No such file or directory.
I did see that under /mnt there is the ephemeral-hdfs folder which is in core-site.xml, but there is no tmp folder.
I tried again with the download for Hadoop 1.x.
Same behavior.
It's curious to me that spark-ec2 has an argument for specifying the Hadoop version; is this required? It would seem that you've already specified it when downloading.
I tried to create the path "tmp/hive" under /mnt/ephemeral-hdfs and chmod to 777. No joy.
sqlContext is obviously a critical part of the Spark platform. The interesting thing is that I don't need HDFS at all - I'm going to be reading from S3 and writing to MySQL.