Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
7.0.0, 8.0.0
Description
unique() on a column of a tibble is much slower after writing to and reading from a parquet file.
Here is a reprex.
df1 <- tibble::tibble(x=as.character(floor(runif(1000000) * 20)))
write_parquet(df1,"/tmp/test.parquet")
df2 <- read_parquet("/tmp/test.parquet")
system.time(unique(df1$x))
# Result on my late 2020 macbook pro with M1 processor:
# user system elapsed
# 0.020 0.000 0.021
system.time(unique(df2$x))
# user system elapsed
# 5.230 0.419 5.649
Attachments
Issue Links
- is related to
-
ARROW-16188 [R] Fix excess "Handling string data with embedded nuls" warning in tests
- Open
- links to