Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15072

[R] Error: This build of the arrow package does not support Datasets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 6.0.1
    • 6.0.1
    • Parquet, R
    • None
    • x86_64-pc-linux-gnu (64-bit) via rocker/docker rocker/r-base:4.1.2

    Description

      Hello,

      I would like to report a possible issue (or I did not grasp the documentation and I apologize in advance)

      Im trying to use R with arrow on docker in order to read parquet files from s3:

       

      FROM rocker/r-base:4.1.2
      
      # TO READ FROM S3
      RUN apt update -qq \    
          && apt install -t unstable -y --no-install-recommends \    
             libcurl4-openssl-dev 
      
      ENV LIBARROW_MINIMAL false
      
      RUN apt update && \    
          apt install -y -V ca-certificates lsb-release wget && \    
          wget "https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-  apt-source-latest-$(lsb_release --codename --short).deb" && \    
          apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
      
      RUN apt update && \    
           apt install -y -V -f \        
           libarrow-dev \        
           libarrow-dataset-dev \        
           libarrow-glib-dev \        
           libarrow-flight-dev \        
           libparquet-dev \        
           libparquet-glib-dev
      
      RUN install2.r --error \    
            arrow

      Thats the output of sessionInfo from the container running R

       

      sessionInfo()
      R version 4.1.2 (2021-11-01)
      Platform: x86_64-pc-linux-gnu (64-bit)
      Running under: Debian GNU/Linux 11 (bullseye)Matrix products: default
      BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
      LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.18.solocale:
       [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
       [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
       [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
       [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
       [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
      [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8attached base packages:
      [1] stats     graphics  grDevices utils     datasets  methods   base     other attached packages:
      [1] arrow_6.0.1 DBI_1.1.1  loaded via a namespace (and not attached):
       [1] tidyselect_1.1.1   bit_4.0.4          compiler_4.1.2     magrittr_2.0.1    
       [5] assertthat_0.2.1   R6_2.5.1           tools_4.1.2        glue_1.5.1        
       [9] bit64_4.0.5        vctrs_0.3.8        RJDBC_0.2-8        rlang_0.4.12      
      [13] rJava_1.0-5        AWR.Athena_2.0.7-0 purrr_0.3.4      

      And as far as I understand,  all requierements are fulfilled to use datasets

      R version 4.1.2

      Platform: x86_64-pc-linux-gnu (64-bit)

      arrow_6.0.1

       

      > .Machine$sizeof.pointer < 8
      [1] FALSE
      > getRversion() < "4.0.0"
      [1] FALSE
      > tolower(Sys.info()[["sysname"]]) == "windows"
      [1] FALSE
      >  

      Nevertheless I get 

      Error: This build of the arrow package does not support Datasets

      in return when

      arrow::open_dataset(sources = path) 

      Appreciate any help!

      Attachments

        Activity

          People

            Unassigned Unassigned
            hugeme hu geme
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: