Description
Current (Spark 1.0) implementation of PySpark on Yarn requires python to be able to read Spark assembly JAR. But Spark assembly JAR compiled with Java 7 can sometimes be not readable by python. This can be due to the fact that JARs created by Java 7 with more 2^16 files is encoded in Zip64, which python cant read.
SPARK-1911 warns users from using Java 7 when creating Spark distribution.
One way to fix this is to put pyspark in a different smaller JAR than rest of Spark so that it is readable by python.
Attachments
Issue Links
- duplicates
-
SPARK-6869 Add pyspark archives path to PYTHONPATH
- Resolved
- relates to
-
SPARK-6869 Add pyspark archives path to PYTHONPATH
- Resolved