Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.0
-
Windows 10 Pro and Ubuntu
Description
Problem
Loading any of the data I mentioned below is 20x slower than the fst format in R.
How to get the data
https://loanperformancedata.fanniemae.com/lppub/index.html
Register and download any of these. I can't provide the data to you, and I think it's best you register.
Code
```r
path = "data/Performance_2016Q4.txt"
library(data.table)
library(arrow)
a = data.table::fread(path, header = FALSE)
fst::write_fst(a, "data/a.fst")
arrow::write_parquet(a, "data/a.parquet")
rm(a); gc()
#read in test
system.time(a <- fst::read_fst("data/a.fst")) # 4.61 seconds
rm(a); gc()
read in test
system.time(a <- arrow::read_parquet("data/a.parquet") # 99.19 seconds
```
Attachments
Attachments
Issue Links
- relates to
-
ARROW-6060 [Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True
- Resolved