Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-927

PySpark sample() doesn't work if numpy is installed on master but not on workers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.1, 1.0.2, 1.1.2
    • Fix Version/s: 1.2.0
    • Component/s: PySpark
    • Labels:
      None

      Description

      PySpark's sample() method crashes with ImportErrors on the workers if numpy is installed on the driver machine but not on the workers. I'm not sure what's the best way to fix this. A general mechanism for automatically shipping libraries from the master to the workers would address this, but that could be complicated to implement.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                farrellee Matthew Farrellee
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: