Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-927

PySpark sample() doesn't work if numpy is installed on master but not on workers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.8.0, 0.9.1, 1.0.2, 1.1.2
    • 1.2.0
    • PySpark
    • None

    Description

      PySpark's sample() method crashes with ImportErrors on the workers if numpy is installed on the driver machine but not on the workers. I'm not sure what's the best way to fix this. A general mechanism for automatically shipping libraries from the master to the workers would address this, but that could be complicated to implement.

      Attachments

        Issue Links

          Activity

            People

              farrellee Matthew Farrellee
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: