Details
Description
submitting a pyspark batch with pyFiles is not working.
i created a simple program containing 2 files.
a.py with the content:
from pyspark import SparkContext import os import bubu sc = SparkContext(os.environ['SPARKMASTER'], 'lll') rdd = sc.parallelize([1,2,3,4,5]) res = rdd.map(bubu.func).collect() print res
i also created bubu.py:
def func(x): return x+1
when i run this as a batch:
curl -X POST --data '
{"file": "/apps/try/a.py", "pyFiles": ["/apps/try/out.zip"]}' -H "Content-Type: application/json" localhost:8998/batches
"ImportError: No module named bubu",")
the root cause is that the zip file is not uploaded into the executors:
6/09/29 09:57:06 INFO Utils: Copying /try/a.py to /tmp/spark-eabfc8a2-5f41-4d10-8c81-0d97be08f1ea/userFiles-8dc3cf7e-f122-4f4c-8a6d-4d2672a3e7fe/a.py
16/09/29 09:57:06 INFO SparkContext: Added file file:/try/a.py at http://172.19.0.3:42761/files/a.py with timestamp 1475143026005
and i don't see the same line for the zip file as it should (when working from spark directly).