Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12585

[Packaging][C++][Python] Published apt packages incompatible with pip binary wheels

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 4.0.0
    • None
    • C++, Packaging, Python
    • None

    Description

      We have a shared library that uses the shared libarrow and libplasma plasma libraries. Our shared library is then eventually loaded by a python process where we use also pyarrow. To avoid compilation of arrow/plasma we are installing the libarrow-dev and libplasma-dev apt packages (as per the official instructions) and the binary wheel of pyarrow.

      Each method brings its own copy of libarrow.so.400, and it turns out the two libraries are not equal: the library contained within pyarrow is compiled most probably with an older gcc version than that installed via apt, which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't have any visible effects, except that std::string is used (and maybe more affected types) in some arrow API points. The difference in the ABI used to compile libarrow.so.400 eventually means they contain differently named symbols.

      Back to our shared library, we load it in a python process. When this happens, and if the pyarrow has already been imported, then its copy of libarrow.so.400 is already in memory, and loading our shared library doesn't load the "apt" copy of libarrow.so.400. This means our library doesn't trigger the loading of the copy of libarrow.so.400 that it was compiled against, and if our library refers to one of the symbols that has changed name then it fails to load due to this missing symbol.

      I've attached a fairly minimal example: a Dockerfile prepares a system with libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles a shared library against libarrow-dev. The command ran by default by the container is a small test that runs python and loads the example shared library, both with and without loading pyarrow first. When pyarrow is loaded first then a missing symbol error happens and the shared library fails to load.

      I've experienced this in an Ubuntu-based linux distro and against Arrow 4.0.0, but I'd assume this happens in other distros and versions.

      The workaround we are using at the moment is simple: we are installing a pyarrow version that is different from the arrow version installed via apt. We are lucky we can run in this mixed-version, multiple-libraries-loaded scenario, but it might not be for everyone.

      Attachments

        1. example.tar.gz
          0.9 kB
          Rodrigo Tobar

        Activity

          People

            Unassigned Unassigned
            rtobar Rodrigo Tobar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: