gI tried following the instructions for installing pyarrow for developers on macos, and I ran into quite a bit of difficulty. I'm hoping we can improve our documentation and/or tooling to make this a smoother process.
I know we can't anticipate every quirk of everyone's dev environment, but in my case, I was getting set up on a new machine, so this was from a clean slate. I'm also new to contributing to the project, so I'm a "clean slate" in that regard too, so my ignorance may be exposing other assumptions in the docs.
- The instructions recommend using conda, but as this Stack Overflow question notes, cmake fails. Uwe helpfully suggested installing an older MacOS SDK from here. That may work, but I'm personally wary to install binaries from an unofficial github account, let alone record that in our docs as an official recommendation. Either way, we should update the docs either to note this necessity or to recommend against installing with conda on macos.
- After that, I tried to go the Homebrew path. Ultimately this did succeed, but it was rough. It seemed that I had to `brew install` a lot of packages that weren't included in the arrow/python/Brewfile (i.e. try to cmake, see what missing dependency it failed on, `brew install` it, retry `cmake`, and repeat). Among the libs I installed this way were double-conversion snappy brotli protobuf gtest rapidjson flatbuffers lz4 zstd c-ares boost. It's not clear how many of these extra dependencies I had to install were because I'd only installed the xcode command-line tools and not the full xcode from the App Store; regardless, the Brewfile should be complete if we want to use it.
- In searching Jira for the double-conversion issue (the first one I hit), I found this issue/PR, which added double-conversion to a different Brewfile, in c_glib. So I tried `brew bundle` installing that Brewfile. It would probably be good to have a common Brewfile for the C++ setup, which the python and glib ones could load and then add any other extra dependencies, if necessary. That way, there's one place to add common dependencies.
- I got close here but still had issues with `BOOST_HOME` not being found, even though I had brew-installed it. From the console output, it appeared that even though I was not using conda and did not have an active conda environment (I'd even done `conda env remove --name pyarrow-dev`), the cmake configuration script detected that conda existed and decided to use conda to resolve dependencies. I tried setting lots of different environment variables to tell cmake not to use conda, but ultimately I was only able to get past this by deleting conda from my system entirely.
- This let me get to the point of being able to `import pyarrow`. But then running tests failed because the `hypothesis` package was not installed. I see that it is included in requirements-test.txt and setup.py under tests_require, but I followed the installation instructions and this package did not end up in my virtualenv. `pip install hypothesis` resolved it.