Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12981

[R] Install source package from CRAN alone

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.1
    • 6.0.0
    • Packaging, R
    • Linux

    Description

      Hello,

      I would like to install Arrow on Linux using only CRAN, without downloading additional files from Github, Apache, or Ursa Labs. I understand this is a big ask, and might not be a priority for you all. Feel free to close if you feel that this is out of scope.

      Why is a CRAN-only installation useful?

      1. It's common for organizations to set up firewalls that prevent arbitrary downloads, but allow access to their own internal CRAN mirror.
        • Sometimes these firewalls also allow requests to Github, but often not.
      2. On a broader level, my favorite thing about R is CRAN, the CRAN maintainers, and their policy that "Source packages may not contain any form of binary executable code." By distributing most of the Arrow code separately (either as source C++ or a compiled library), automated code archives and other source-based tools become much less useful.

      Of course, arrow isn't the only R package to depend on external libraries or distribute code separately. If a CRAN-only approach isn't viable, it would still be useful to have an all-offline method. I'm also having trouble getting an offline install to work, even with a local copy of the Arrow repo. (See the bottom of the script below.)

       

      What does does installing offline look like now?
      Here's a bash script that approximates installing behind a firewall.

      git clone --depth 1 git@github.com:apache/arrow.git test_arrow
      
      cd test_arrow
      wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz'
      
      # Set up a temporary R library (optional)
      mkdir test_r_lib
      export R_LIBS_USER=test_r_lib
      
      export ARROW_R_DEV=true
      export LIBARROW_MINIMAL=false
      export LIBARROW_DOWNLOAD=false
      export LIBARROW_BINARY=false
      export LIBARROW_BUILD=true
      
      # These are all of the direct dependencies, including Suggests
      # This isn't required if the packages are already installed
      Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', 'tibble', 'withr'))"
      
      
      
      # Disable your internet connection here.
      
      
      
      # Now try to install the R package we downloaded with wget.
      # This is an approximation of being behind a firewall.
      Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)'
      
      # It successfully installs the R component, but not the C++ library, 
      # even with LIBARROW_BUILD=true
      Rscript -e "arrow::arrow_available()"
      # [1] FALSE
      
      
      # As mentioned in the installation vignette, 
      # we can R CMD INSTALL in the git repo.
      
      R CMD INSTALL r
      
      # This will try to build the C++ library, but fails when mimalloc and 
      # jemalloc can't be downloaded from Github.
      # (Seems not to be affected by LIBARROW_DOWNLOAD=false).
      # When C++ compilation fails, the R component still installs.
      
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            npr Neal Richardson Assign to me
            karldw Karl Dunkle Werner
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 28.5h
              28.5h

              Issue deployment