Description
Base R 4.2.0 introduced a change ([Rd] R 4.2.0 is released), "Calling if() or while() with a condition of length greater than one gives an error rather than a warning."
The below code is a reproducible example of the issue. If it is executed in R >=4.2.0 then it will generate an error, or else just a warning message. `Sys.time()` is a multi-class object in R, and throughout the Spark R repository 'if' statement is used as: `if(class == "Column")` - this causes error in the latest R version >= 4.2.0. Note that R allows an object to have multiple 'class' names as a character vector (R: Object Classes); hence this type of check itself was not a good idea in the first place.
The below chunks are executed on R version 4.1.3.
{ SparkR::sparkR.session() t <- Sys.time() sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1))) SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t)) } #> Warning in if (class(e2) == 'Column') {: the condition has length > 1 #> and only the first element will be used #> x #> 1 2023-01-07 20:40:20 #> 2 2023-01-07 20:40:20
{ Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true") SparkR::sparkR.session() t <- Sys.time() sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1))) SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t)) } #> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' #> in selecting a method for function 'collect': error in evaluating the #> argument 'condition' in selecting a method for function 'filter': the #> condition has length > 1
Similar issue is noted for these SparkR functions where Sys.time() type of multi-class data might be used: lit, fillna, when, otherwise, contains, ifelse
The suggested change is to add the `all` function (or `any`, as appropriate) while doing the check of whether `class(.)` is `Column` or not: `if(all(class(.) == "Column"))`. Or, better to use `base::inherits` for this check as `if(inherits(., "Column"))`.