This issue is a more specific version of
Supporting larger than 2GB arguments is more general and arguably harder to do because the limit exists both in R and JVM (because we receive data as a ByteArray). However, to support parallalizing R data.frames that are larger than 2GB we can do what PySpark does.
PySpark uses files to transfer bulk data between Python and JVM. It has worked well for the large community of Spark Python users.