[ARROW-16010] [R] write_parquet alters <dttm> value - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Information Provided
Affects Version/s: 6.0.0
Fix Version/s: None
Component/s: R
Labels:
None
Environment:
Ubuntu focal
R 4.1.1
RStudio 1.4.1772

External issue URL:
https://github.com/apache/arrow/issues/31433
Language:
- R

Description

When we write a dataframe column of type `<dttm>` to parquet using the arrow package, subsequent reading in of the parquet file to dataframe returns a slightly different value.

This behaviour does not replicate with columns of type `<double>`

Reprex:

 

#Create sample dataframe
n <-  1631494810.376999855041503906250000000000000000000000000000000000
df <- data.frame(x = "a",
                 n = n,
                 t = as.POSIXct(n, origin = "1970-01-01"))
#Write to disk
df %>% write_parquet("/tmp/tmp.parquet")


#Extract time-based cols
dft <- df %>% 
  filter(x == "a") %>% 
  pull(t) %>% 
  as.numeric 

pqt <- read_parquet("/tmp/tmp.parquet") %>% 
  filter(x == "a") %>% 
  pull(t) %>% 
  as.numeric 
dft == pqt
sprintf("%.54f",dft)
sprintf("%.54f",pqt)

#Extract numeric cols
dfn <- df %>% 
  filter(x == "a") %>% 
  pull(n) %>% 
  as.numeric 

pqn <- read_parquet("/tmp/tmp.parquet") %>% 
  filter(x == "a") %>% 
  pull(n) %>% 
  as.numeric 
dfn == pqn
sprintf("%.54f",dfn)
sprintf("%.54f",pqn)

The critical issue is that `dft == pqt` returns `FALSE` while `dfn == pqn` returns TRUE.

Why is this a problem? We use `arrow` to store dataframes to disk. When we want to update these parquet files, we first check whether any data has actually changed and put in place tripwires to ensure that if a significant proportion of the data has changed the pipeline fails and is flagged for manual review.

With the current behaviour, above, all of the dataframes that contain `<dttm>` type columns are failing.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Riaz Arbi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 23/Mar/22 13:23

Updated:: 11/Jan/23 11:41

Resolved:: 02/Jul/22 14:11