Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
Py4J 0.10.1 hasn't landed yet, but it will likely cause a significant performance improvement for PySpark and MLLib in particular. More details are available at https://github.com/bartdag/py4j/issues/201
The syscall overhead was likely the reason that https://issues.apache.org/jira/browse/SPARK-6728 was reported as well - dropping the base64 encoding will help too, but I imagine this fix will help more.