[ARROW-12959] [C++][R] Option for is_null(NaN) to evaluate to true - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.0.0
Component/s: C++, R
Labels:
- kernel
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/28680

Description

(This is the flip side of ARROW-12960.)

Currently the Arrow compute kernel is_null always treats NaN as a non-missing value, returning false at positions of the input datum with value NaN.

It would be helpful to be able to control this behavior with an option. The option could be named nan_is_null or something similar. It would default to false, consistent with current behavior. When set to true, it should check if the input datum has a floating point data type, and if so, return true at positions where the input is NaN. If the input datum has some other type, the option should be silently ignored.

Among other things, this would enable the arrow R package to evaluate is.na() consistently with the way base R does. In base R, is.na() returns TRUE on NaN. But in the arrow R package, it returns FALSE:

is.na(c(3.14, NA, NaN))
## [1] FALSE TRUE TRUE

as.vector(is.na(Array$create(c(3.14, NA, NaN))))
## [1] FALSE TRUE FALSE

I think solving this with an option in the C++ kernel is the best solution, because I suspect there are other cases in which users might want to treat NaN as a missing value. However, it would also be possible to solve this just in the R package, by defining a mapping of is.na in the R package that checks if the input x has a floating point data type, and if so, evaluates is.na(x) | is.nan(x). If we choose to go that route, we should change this Jira issue summary to "[R] Make is.na(NaN) consistent with base R".

Attachments

Issue Links

is duplicated by

ARROW-13367 [C++] Add option to is_null kernel to return true on NaN

Closed

is related to

ARROW-12960 [C++][R] Option for is_nan(null) to evaluate to false or true

Open

links to

GitHub Pull Request #10896

Activity

People

Assignee:: Christian Cordova

Reporter:: Ian Cook

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 03/Jun/21 17:01

Updated:: 11/Jan/23 08:29

Resolved:: 26/Aug/21 17:09

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: