Details
-
Wish
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.1
-
Linux
Description
Hello,
I would like to install Arrow on Linux using only CRAN, without downloading additional files from Github, Apache, or Ursa Labs. I understand this is a big ask, and might not be a priority for you all. Feel free to close if you feel that this is out of scope.
Why is a CRAN-only installation useful?
- It's common for organizations to set up firewalls that prevent arbitrary downloads, but allow access to their own internal CRAN mirror.
- Sometimes these firewalls also allow requests to Github, but often not.
- On a broader level, my favorite thing about R is CRAN, the CRAN maintainers, and their policy that "Source packages may not contain any form of binary executable code." By distributing most of the Arrow code separately (either as source C++ or a compiled library), automated code archives and other source-based tools become much less useful.
Of course, arrow isn't the only R package to depend on external libraries or distribute code separately. If a CRAN-only approach isn't viable, it would still be useful to have an all-offline method. I'm also having trouble getting an offline install to work, even with a local copy of the Arrow repo. (See the bottom of the script below.)
What does does installing offline look like now?
Here's a bash script that approximates installing behind a firewall.
git clone --depth 1 git@github.com:apache/arrow.git test_arrow cd test_arrow wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz' # Set up a temporary R library (optional) mkdir test_r_lib export R_LIBS_USER=test_r_lib export ARROW_R_DEV=true export LIBARROW_MINIMAL=false export LIBARROW_DOWNLOAD=false export LIBARROW_BINARY=false export LIBARROW_BUILD=true # These are all of the direct dependencies, including Suggests # This isn't required if the packages are already installed Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', 'tibble', 'withr'))" # Disable your internet connection here. # Now try to install the R package we downloaded with wget. # This is an approximation of being behind a firewall. Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)' # It successfully installs the R component, but not the C++ library, # even with LIBARROW_BUILD=true Rscript -e "arrow::arrow_available()" # [1] FALSE # As mentioned in the installation vignette, # we can R CMD INSTALL in the git repo. R CMD INSTALL r # This will try to build the C++ library, but fails when mimalloc and # jemalloc can't be downloaded from Github. # (Seems not to be affected by LIBARROW_DOWNLOAD=false). # When C++ compilation fails, the R component still installs.
Attachments
Issue Links
- is duplicated by
-
ARROW-12049 [R] [Documentation] Documentation for making an offline build
- Closed
- links to