Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2247

[Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Python
    • Labels:
      None

      Description

      This is a backtrace loading libparquet.so on Ubuntu 14.04 using boost 1.66.1 from conda-forge. Both libarrow and libparquet contain boost_regex statically linked.

      In [1]: import ctypes
      
      In [2]: ctypes.CDLL('libparquet.so')
      
      Program received signal SIGSEGV, Segmentation fault.
      0x00007fffed4ad3fb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      (gdb) bt
      #0  0x00007fffed4ad3fb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
      #1  0x00007fffed74c1fc in boost::re_detail_106600::cpp_regex_traits_char_layer<char>::init() ()
         from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
      #2  0x00007fffed794803 in boost::object_cache<boost::re_detail_106600::cpp_regex_traits_base<char>, boost::re_detail_106600::cpp_regex_traits_implementation<char> >::do_get(boost::re_detail_106600::cpp_regex_traits_base<char> const&, unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
      #3  0x00007fffed79e62b in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::do_assign(char const*, char const*, unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
      #4  0x00007fffee58561b in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign (this=0x7fffffff3780, 
          p1=0x7fffee600602 "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)", 
          p2=0x7fffee60064a "", f=0) at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
      #5  0x00007fffee5855a7 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::assign (this=0x7fffffff3780, 
          p=0x7fffee600602 "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)", f=0)
          at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
      #6  0x00007fffee5683f3 in boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::basic_regex (this=0x7fffffff3780, 
          p=0x7fffee600602 "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)", f=0)
          at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
      #7  0x00007fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
      Python Exception <class 'gdb.error'> There is no member named _M_dataplus.: 
          this=0x7fffee8f1fb8 <parquet::ApplicationVersion::PARQUET_251_FIXED_VERSION>, created_by=)
          at ../src/parquet/metadata.cc:452
      #8  0x00007fffee41c271 in __cxx_global_var_init.1(void) () at ../src/parquet/metadata.cc:35
      #9  0x00007fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
         from /home/wesm/local/lib/libparquet.so
      #10 0x00007ffff7dea1da in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7fffffff5d88, 
          env=env@entry=0x7fffffff5da0) at dl-init.c:78
      #11 0x00007ffff7dea2c3 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, 
          l=<optimized out>) at dl-init.c:36
      #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fffffff5d88, env=0x7fffffff5da0)
          at dl-init.c:126
      

      This seems to be caused by static initializations in libparquet:

      https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34

      We should see if removing these static initializations makes the problem go away. If not, then statically-linking boost_regex in both libraries is not advisable.

      For this reason and more, I really wish that Arrow and Parquet shared a common build system and monorepo structure – it would make handling these toolchain and build-related issues much simpler.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pitrou Antoine Pitrou
                Reporter:
                wesmckinn Wes McKinney
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: