Description
I am working on a project where gapply() is being used with a large dataset that happens to be extremely skewed. On that skewed partition, the user function takes more than 2 hours to return and that turns out to be larger than the timeout that we hardcode in SparkR for backend connection.
connectBackend <- function(hostname, port, timeout = 6000)
Ideally user should be able to reconfigure Spark and increase the timeout. It should be a small fix.
Attachments
Issue Links
- is duplicated by
-
SPARK-12609 Make R to JVM timeout configurable
-
- Resolved
-
- links to