Previously posted this as a stack overflow question but it seems to be a bug.
If I have an R data.frame where one of the column data types is an integer list – i.e., each of the elements in the column embeds an entire R list of integers – then it seems I can convert this data.frame to a SparkR DataFrame just fine... SparkR treats the column as ArrayType(Double).
However, any subsequent operation on this SparkR DataFrame appears to throw an error.
Create an example R data.frame:
Examine it to make sure it looks okay:
Convert it to a SparkR DataFrame:
Examine the SparkR DataFrame schema; notice that the list column was successfully converted to ArrayType:
However, operating on the SparkR DataFrame throws an error:
Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.