Details
Description
I am using SparkR from RStudio, and I ran into an error with the join function that I recreated with a smaller example:
Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <- sparkR.init("local[4]") sqlContext <- sparkRSQL.init(sc) n = c(2, 3, 5) s = c("aa", "bb", "cc") b = c(TRUE, FALSE, TRUE) df = data.frame(n, s, b) df1= createDataFrame(sqlContext, df) showDF(df1) x = c(2, 3, 10) t = c("dd", "ee", "ff") c = c(FALSE, FALSE, TRUE) dff = data.frame(x, t, c) df2 = createDataFrame(sqlContext, dff) showDF(df2) res = join(df1, df2, df1$n == df2$x, "semijoin") showDF(res)
Running this code, I encountered the error:
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
java.lang.IllegalArgumentException: Unsupported join type 'semijoin'. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
However, if I changed the joinType to "leftsemi",
res = join(df1, df2, df1$n == df2$x, "leftsemi")
I would get the error:
Error in .local(x, y, ...) :
joinType must be one of the following types: 'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'
Since the join function in R appears to invoke a Java method, I went into DataFrame.R and changed the code on line 1374 and line 1378 to change the "semijoin" to "leftsemi" to match the Java function's parameters. These also make the R joinType accepted values match those of Scala's.
semijoin:
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "semijoin")) { sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) } else { stop("joinType must be one of the following types: ", "'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'") }
leftsemi:
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "leftsemi")) { sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) } else { stop("joinType must be one of the following types: ", "'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'") }
This fixed the issue, but I'm not sure if this solution breaks hive compatibility or causes other issues, but I can submit a pull request to change this
Attachments
Issue Links
- links to