Description
PySpark's sample() method crashes with ImportErrors on the workers if numpy is installed on the driver machine but not on the workers. I'm not sure what's the best way to fix this. A general mechanism for automatically shipping libraries from the master to the workers would address this, but that could be complicated to implement.
Attachments
Issue Links
- is related to
-
SPARK-4477 remove numpy from RDDSampler of PySpark
- Resolved
- links to