I am working on a project where gapply() is being used with a large dataset that happens to be extremely skewed. On that skewed partition, the user function takes more than 2 hours to return and that turns out to be larger than the timeout that we hardcode in SparkR for backend connection.
connectBackend <- function(hostname, port, timeout = 6000)
Ideally user should be able to reconfigure Spark and increase the timeout. It should be a small fix.
- is duplicated by
SPARK-12609 Make R to JVM timeout configurable
- links to