Description
When launching spark on a system with multiple javas installed, there are a few options for choosing which JRE to use, setting `JAVA_HOME` being the most straightforward.
However, when pyspark's internal py4j launches its JavaGateway, it always invokes `java` directly, without qualification. This means you get whatever java's first on your path, which is not necessarily the same one in spark's JAVA_HOME.
This could be seen as a py4j issue, but from their point of view, the fix is easy: make sure the java you want is first on your path. I can't figure out a way to make that reliably happen through the pyspark executor launch path, and it seems like something that would ideally happen automatically. If I set JAVA_HOME when launching spark, I would expect that to be the only java used throughout the stack.
Attachments
Issue Links
- is duplicated by
-
SPARK-17220 Upgrade Py4J to 0.10.3
- Resolved
- links to