Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
(This is the flip side of ARROW-12959.)
Currently the Arrow compute kernel is_nan always treats null as a missing value, returning null at positions of the input datum with null (missing) values.
It would be helpful to be able to control this behavior with an option. The option could be named value_for_null or something similar and it would take a nullable boolean scalar. It would default to null, consistent with current behavior. When set to false or true, it would return false or true at positions of the input datum with null values.
Among other things, this would enable the arrow R package to evaluate is.nan() consistently with the way base R does. In base R, is.nan() returns FALSE on NA. But in the arrow R package, it returns NA:
> is.nan(c(3.14, NA, NaN)) ##[1] FALSE FALSE TRUE as.vector(is.nan(Array$create(c(3.14, NA, NaN)))) ##[1] FALSE NA TRUE
I think solving this with an option in the C++ kernel is the best solution, because I suspect there are other cases in which users would want the ability to return all non-missing values in the output from is_nan without needing to call another kernel to fill the missing values in. However, it would also be possible to solve this just in the R package, by changing the mapping of is.nan in the R package. If we choose to go that route, we should change this Jira issue summary to "[R] Make is.nan(NA) consistent with base R".
Attachments
Issue Links
- is duplicated by
-
ARROW-13366 [C++] Add option to is_nan kernel to return true on null
- Open
- relates to
-
ARROW-12959 [C++][R] Option for is_null(NaN) to evaluate to true
- Resolved