Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
R 4.0.3, Ubuntu 20.04.
sessionInfo():
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 magrittr_2.0.1 hms_0.5.3 tidyselect_1.1.0 bit_4.0.4 R6_2.5.0 rlang_0.4.9
[8] dplyr_1.0.2 tools_4.0.3 R.oo_1.24.0 arrow_2.0.0 DBI_1.1.0 ellipsis_0.3.1 bit64_4.0.5
[15] assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 crayon_1.3.4 readr_1.4.0 purrr_0.3.4 arkdb_0.0.8
[22] duckdb_0.2.4 fs_1.5.0 vctrs_0.3.5 R.utils_2.10.1 curl_4.3 glue_1.4.2 compiler_4.0.3
[29] pillar_1.4.7 generics_0.1.0 R.methodsS3_1.8.1 pkgconfig_2.0.3R 4.0.3, Ubuntu 20.04. sessionInfo(): > sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04 LTS Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] rstudioapi_0.13 magrittr_2.0.1 hms_0.5.3 tidyselect_1.1.0 bit_4.0.4 R6_2.5.0 rlang_0.4.9 [8] dplyr_1.0.2 tools_4.0.3 R.oo_1.24.0 arrow_2.0.0 DBI_1.1.0 ellipsis_0.3.1 bit64_4.0.5 [15] assertthat_0.2.1 tibble_3.0.4 lifecycle_0.2.0 crayon_1.3.4 readr_1.4.0 purrr_0.3.4 arkdb_0.0.8 [22] duckdb_0.2.4 fs_1.5.0 vctrs_0.3.5 R.utils_2.10.1 curl_4.3 glue_1.4.2 compiler_4.0.3 [29] pillar_1.4.7 generics_0.1.0 R.methodsS3_1.8.1 pkgconfig_2.0.3
Description
I'd like to use the R package interface to access data distributed in a tab-separated text file that is much larger than available RAM. I understand that in principle this is possible using `open_datatset()` in text mode and then streaming data out to parquet via `write_dataset()`, but this strategy fails even on small text files with an unexpected error:
Here's a minimal reproducible example.
fs::dir_create("import_dir")
readr::write_tsv(mtcars, "import_dir/mtcars.tsv")
ds <- arrow::open_dataset("import_dir", format="text", delim="\t")
arrow::write_dataset(ds, "parquet_dir")
The error I get occurs only on the last line (`write_dataset()`), saying:
Error in options$update(...) : attempt to apply non-function
Attachments
Issue Links
- links to