Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.2.0
-
None
Description
When using gapply() (or other members of apply() family) with a schema, Spark will try to parse data returned form the R process on each worker as Spark DataFrame Rows based on the schema. In this case our provided schema suggests that we have six column. When an R worker returns results to JVM, SparkSQL will try to access its columns one by one and cast them to proper types. If R worker returns nothing, JVM will throw ArrayIndexOutOfBoundsException exception.