Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-220

pyspark zipfile uploading is not working

    XMLWordPrintableJSON

Details

    Description

      submitting a pyspark batch with pyFiles is not working.

      i created a simple program containing 2 files.
      a.py with the content:

      from pyspark import SparkContext
      import os
      import bubu
      
      sc = SparkContext(os.environ['SPARKMASTER'], 'lll')
      rdd = sc.parallelize([1,2,3,4,5])
      res =  rdd.map(bubu.func).collect()
      print res
      

      i also created bubu.py:

      def func(x):
         return x+1
      

      when i run this as a batch:

      curl -X POST --data '

      {"file": "/apps/try/a.py", "pyFiles": ["/apps/try/out.zip"]}

      ' -H "Content-Type: application/json" localhost:8998/batches

      "ImportError: No module named bubu",")

      the root cause is that the zip file is not uploaded into the executors:

      6/09/29 09:57:06 INFO Utils: Copying /try/a.py to /tmp/spark-eabfc8a2-5f41-4d10-8c81-0d97be08f1ea/userFiles-8dc3cf7e-f122-4f4c-8a6d-4d2672a3e7fe/a.py
      16/09/29 09:57:06 INFO SparkContext: Added file file:/try/a.py at http://172.19.0.3:42761/files/a.py with timestamp 1475143026005

      and i don't see the same line for the zip file as it should (when working from spark directly).

      Attachments

        Activity

          People

            ofer Ofer Eliassaf
            ofer Ofer Eliassaf
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: