Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4316

Reusing arrow.so for both Python and R

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.12.0
    • None
    • Python, R
    • None
    • Ubuntu 16.04, R 3.4.4, pyarrow 0.12, cmake 3.12

    Description

      My team uses both pyarrow and R arrow, we'd like both libraries to link to the same arrow.so file for consistency. pyarrow ships both arrow.so and parquet.so, if I can reuse those .so's to  link R that would guarantee consistency. 
      Under arrow v0.11.1 I was able to link R against libarrow.so found under pyarrow by passing LIB_DIR to the R configure file. However, in v0.12.0 I am no longer able to do that. Here is a reproducible example on Ubuntu 16.04 which produces the error:

       

      sh: line 1: 5404 Segmentation fault (core dumped) '/usr/lib/R/bin/R' --no-save --slave 2>&1 < '/tmp/RtmpyOuz4g/file14716feda8fc'
      *** caught segfault ***
      address 0x7f160f026250, cause 'invalid permissions'
      An irrecoverable exception occurred. R is aborting now ...
      

       

      Reproducible example:

       # get the parquet headers which are not shipped with pyarrow
        
      tee /etc/apt/sources.list.d/apache-arrow.list <<APT_LINE
      deb [arch=amd64] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
      deb-src [] https://dl.bintray.com/apache/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/ $(lsb_release --codename --short) main
      APT_LINE
      apt-get update
      mkdir /tmp/arrow_headers; cd /tmp/arrow_headers
      apt-get download --allow-unauthenticated libparquet-dev
      ar -x libparquet-dev_0.12.0-1_amd64.deb
      tar -xJvf data.tar.xz
        
       #get pyarrow v0.12
        
       pip3 install pyarrow --upgrade
       #figure out where pyarrow is
       PY_ARROW_PATH=$(python3 -c "import pyarrow, os; print(os.path.dirname(pyarrow.__file__))")
       PY_ARROW_VERSION=$(python3 -c "import pyarrow; print(pyarrow.__version__)")
       PYTHON_LIBDIR=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
        
      
       # pyarrow doesn't ship parquet headers. Copy the ones from apt into the pyarrow dir
       mkdir $PY_ARROW_PATH/include/parquet
       cp -r /tmp/arrow_headers/usr/include/parquet/* $PY_ARROW_PATH/include/parquet/
        
       #install R arrow
       echo "export LD_LIBRARY_PATH=\"\${LD_LIBRARY_PATH}:${PYTHON_LIBDIR}:${PY_ARROW_PATH}\"" | tee -a /usr/lib/R/etc/ldpaths
       git clone https://github.com/apache/arrow.git /tmp/arrow
       cd /tmp/arrow/r
       git checkout "apache-arrow-${PY_ARROW_VERSION}"
       sed -i "/Depends: R/c\Depends: R (>= 3.4)" DESCRIPTION
       sed -i "s/PKG_CXXFLAGS=/PKG_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 /g" src/Makevars.in
       R CMD INSTALL ./ --configure-vars="INCLUDE_DIR=$PY_ARROW_PATH/include LIB_DIR=$PY_ARROW_PATH" 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jeffreyw Jeffrey Wong
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: