Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12981

[R] Install source package from CRAN alone

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.1
    • 6.0.0
    • Packaging, R
    • Linux



      I would like to install Arrow on Linux using only CRAN, without downloading additional files from Github, Apache, or Ursa Labs. I understand this is a big ask, and might not be a priority for you all. Feel free to close if you feel that this is out of scope.

      Why is a CRAN-only installation useful?

      1. It's common for organizations to set up firewalls that prevent arbitrary downloads, but allow access to their own internal CRAN mirror.
        • Sometimes these firewalls also allow requests to Github, but often not.
      2. On a broader level, my favorite thing about R is CRAN, the CRAN maintainers, and their policy that "Source packages may not contain any form of binary executable code." By distributing most of the Arrow code separately (either as source C++ or a compiled library), automated code archives and other source-based tools become much less useful.

      Of course, arrow isn't the only R package to depend on external libraries or distribute code separately. If a CRAN-only approach isn't viable, it would still be useful to have an all-offline method. I'm also having trouble getting an offline install to work, even with a local copy of the Arrow repo. (See the bottom of the script below.)


      What does does installing offline look like now?
      Here's a bash script that approximates installing behind a firewall.

      git clone --depth 1 git@github.com:apache/arrow.git test_arrow
      cd test_arrow
      wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz'
      # Set up a temporary R library (optional)
      mkdir test_r_lib
      export R_LIBS_USER=test_r_lib
      export ARROW_R_DEV=true
      export LIBARROW_MINIMAL=false
      export LIBARROW_DOWNLOAD=false
      export LIBARROW_BINARY=false
      export LIBARROW_BUILD=true
      # These are all of the direct dependencies, including Suggests
      # This isn't required if the packages are already installed
      Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', 'tibble', 'withr'))"
      # Disable your internet connection here.
      # Now try to install the R package we downloaded with wget.
      # This is an approximation of being behind a firewall.
      Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)'
      # It successfully installs the R component, but not the C++ library, 
      # even with LIBARROW_BUILD=true
      Rscript -e "arrow::arrow_available()"
      # [1] FALSE
      # As mentioned in the installation vignette, 
      # we can R CMD INSTALL in the git repo.
      # This will try to build the C++ library, but fails when mimalloc and 
      # jemalloc can't be downloaded from Github.
      # (Seems not to be affected by LIBARROW_DOWNLOAD=false).
      # When C++ compilation fails, the R component still installs.


        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            npr Neal Richardson Assign to me
            karldw Karl Dunkle Werner
            0 Vote for this issue
            5 Start watching this issue



              Time Tracking

              Original Estimate - Not Specified
              Not Specified
              Remaining Estimate - 0h
              Time Spent - 28.5h

              Issue deployment