Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-955

[Docs] Guide for building Python from source on Ubuntu 16.04/18.04 LTS without conda

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Python
    • None
    • Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
      Python 2.7.6

    Description

      I built pyarrow, arrow, and parquet-cpp from source - so that I could use the new read_row_group() interface and in general, have access to the latest versions. I ran into many issues during the build but was ultimately successful (notes below). However, I am not able to import pyarrow.parquet due to the following issue:

      >>import pyarrow.parquet
      Traceback (most recent call last):
      File "", line 1, in
      File "pyarrow/init.py", line 28, in
      import pyarrow._config
      ImportError: No module named _config

      This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, where also I posted this...but I think this forum is more direct and appropriate - so re-posting here.

      I used instructions at https://arrow.apache.org/docs/python/install.html to build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations (I view them as possibly bugs in the instructions):

      arrow/cpp build:
      export ARROW_HOME=$HOME/local
      I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)

      parquet-cpp build:

      export ARROW_HOME=$HOME/local

      cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static -DPARQUET_ARROW=ON .
      make

      sudo make install ----> this installs parquet libs in the std systems location (/usr/local/lib) so that the pyarrow build (see below) can find the parquet libs

      pyarrow build:

      export ARROW_HOME=$HOME/local (not a deviation; just repeating here)

      export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest

      sudo python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install

      sudo python setup.py install

      (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )

      These are the steps and modifications to the instructions needed for me to build the pyarrow.parquet package. However, when I now try to import the package I get the error specified above.

      Maybe I did something wrong in my steps which I kind of put together by searching for these issues...but really can't tell what. It took me almost a whole day to get to the point where I can build pyarrow and parquet, and now I can't use what I built.

      Any comments, help appreciated! Thanks in advance.

      Attachments

        Activity

          People

            Unassigned Unassigned
            derringdo Devang Shah
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: