Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1397

Allow building safely with custom toolchains

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None

      Description

      Casey uncovered several issues when building Kudu with the Impala toolchain; this report attempts to capture them.

      The first and most important issue was a random SIGSEGV during a flush:

      (gdb) bt
      #0 0x0000000000e82540 in kudu::CopyCellData<kudu::ColumnBlockCell, kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
      at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:79
      #1 0x0000000000e80e33 in kudu::CopyCell<kudu::ColumnBlockCell, kudu::ColumnBlockCell, kudu::Arena> (src=..., dst=0x7ff9c637d5e0, dst_arena=0x0)
      at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:103
      #2 0x0000000000e7f647 in kudu::CopyRow<kudu::RowBlockRow, kudu::RowBlockRow, kudu::Arena> (src_row=..., dst_row=0x7ff9c637d870, dst_arena=0x0)
      at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/common/row.h:119
      #3  0x0000000000e76773 in kudu::tablet::FlushCompactionInput (input=0x3894f00, snap=..., out=0x7ff9c637dbf0)
          at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/compaction.cc:768
      #4  0x0000000000e23f5a in kudu::tablet::Tablet::DoCompactionOrFlush (this=0x395a840, input=..., mrs_being_flushed=0)
          at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:1221
      #5  0x0000000000e202b2 in kudu::tablet::Tablet::FlushInternal (this=0x395a840, input=..., old_ms=...) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:744
      #6  0x0000000000e1f8f6 in kudu::tablet::Tablet::FlushUnlocked (this=0x395a840) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet.cc:678
      #7  0x0000000000f1b3a3 in kudu::tablet::FlushMRSOp::Perform (this=0x38b9340) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/tablet_peer_mm_ops.cc:127
      #8  0x0000000000ea19d7 in kudu::MaintenanceManager::LaunchOp (this=0x3904360, op=0x38b9340) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/tablet/maintenance_manager.cc:360
      #9  0x0000000000ea6502 in boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>::operator() (this=0x3d492a0, p=0x3904360, a1=0x38b9340)
          at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
      #10 0x0000000000ea6163 in boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, boost::_bi::value<kudu::MaintenanceOp*> >::operator()<boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list0> (this=0x3d492b0, f=..., a=...) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
      #11 0x0000000000ea5bed in boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, boost::_bi::value<kudu::MaintenanceOp*> > >::operator() (this=0x3d492a0) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #12 0x0000000000ea57ec in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::MaintenanceManager, kudu::MaintenanceOp*>, boost::_bi::list2<boost::_bi::value<kudu::MaintenanceManager*>, boost::_bi::value<kudu::MaintenanceOp*> > >, void>::invoke (function_obj_ptr=...) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
      #13 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3c01838) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
      #14 0x0000000001d73aa4 in kudu::FunctionRunnable::Run (this=0x3c01830) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:47
      #15 0x0000000001d73062 in kudu::ThreadPool::DispatchThread (this=0x38c8340, permanent=true) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/threadpool.cc:321
      #16 0x0000000001d76740 in boost::_mfi::mf1<void, kudu::ThreadPool, bool>::operator() (this=0x38f2d60, p=0x38c8340, a1=true)
          at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
      #17 0x0000000001d76375 in boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list0> (this=0x38f2d70, f=...,
          a=...) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind.hpp:313
      #18 0x0000000001d75eb7 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >::operator() (this=0x38f2d60)
          at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #19 0x0000000001d759e9 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, kudu::ThreadPool, bool>, boost::_bi::list2<boost::_bi::value<kudu::ThreadPool*>, boost::_bi::value<bool> > >, void>::invoke (function_obj_ptr=...) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:153
      #20 0x0000000001c4205e in boost::function0<void>::operator() (this=0x3918028) at /home/casey/Code/native-toolchain/build/boost-1.57.0/include/boost/function/function_template.hpp:767
      #21 0x0000000001d6ba4d in kudu::Thread::SuperviseThread (arg=0x3918000) at /home/casey/Code/native-toolchain/source/kudu/incubator-kudu-0.8.0-RC1/src/kudu/util/thread.cc:580
      #22 0x00007ff9c7bfadc5 in start_thread () from /lib64/libpthread.so.0
      #23 0x00007ff9c6aca21d in clone () from /lib64/libc.so.6
      

      Todd traced this to a build issue with codegen. Specifically, when using our thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's using the same libstdc++ used by the rest of the Kudu build. It turns out there's no such guarantee, and depending on the version discrepancy, there may be a variety of issues, including at least one alignment change that could result in the kind of corruption that Casey is seeing.

      Let's walk through the various scenarios at play:

      1. When building Kudu on a platform whose system libstdc++ supports C++11, libstdc++ is expected to be found in /usr regardless of the chosen compiler, be it the system's gcc, clang, or thirdparty's clang.
      2. On el6, we call scl enable devtoolset-3 before building Kudu. This puts a special build of gcc 4.9.2 on the PATH whose libstdc++ comes from /opt/rh/devtoolset-3/usr rather than from the system itself. To avoid discrepancies, we patch thirdparty clang to use that same path when searching for headers and libraries, so we end up with the same libstdc++ for Kudu as for emitted LLVM IR.
      3. On OSX, C++ supports comes by the way of libc++, with a location deep within XCode. This location is built into the system clang, which is also the compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it can't find libc++ by default. However, Kudu adds -cxx-isystem <this XCode path> during the codegen build. In this way, the libc++ used in emitting LLVM IR is the same as what's used in the rest of Kudu.
      4. Building with the Impala toolchain is similar to the el6 case except without the patch to thirdparty's clang. Nor can it be patched in the same way; the toolchain location varies from system to system. Without the patch, thirdparty's clang ends up using the system's libstdc++, which isn't guaranteed to be the same as the version in the toolchain, and can lead to the issues described above. This needs to be addressed.

      Separately, Casey ran into a build-time issue when building Kudu with the Impala toolchain on a platform that doesn't provide Python 2.7 (I think it was an el6 VM). On these platforms, Kudu builds its own Python 2.7 before building LLVM, as the latter depends on the former to build. The Python build failed with the following:

      17:22:35 /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/bin/gcc -pthread -mno-avx2 -Wl,-rpath,/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64,-rpath,'RIGIN/../lib64',-rpath,'RIGIN/../lib' -L/data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/build/gcc-4.9.2/lib64 -Xlinker -export-dynamic -o python \
      17:22:35 			Modules/python.o \
      17:22:35 			libpython2.7.a -lpthread -ldl  -lutil   -lm  
      17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tmpnam':
      17:22:35 /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7631: warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
      17:22:35 libpython2.7.a(posixmodule.o): In function `posix_tempnam':
      17:22:35 /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-centos-6/toolchain/source/kudu/incubator-kudu-0.8.0-RC1/thirdparty/python-2.7.10/./Modules/posixmodule.c:7578: warning: the use of `tempnam' is dangerous, better use `mkstemp'
      17:22:35 ./python -E -S -m sysconfig --generate-posix-vars ;\
      17:22:35 	if test $? -ne 0 ; then \
      17:22:35 		echo "generate-posix-vars failed" ; \
      17:22:35 		rm -f ./pybuilddir.txt ; \
      17:22:35 		exit 1 ; \
      17:22:35 	fi
      17:22:35 Traceback (most recent call last):
      17:22:35   File "./setup.py", line 33, in <module>
      17:22:35     COMPILED_WITH_PYDEBUG = ('--with-pydebug' in sysconfig.get_config_var("CONFIG_ARGS"))
      17:22:35 TypeError: argument of type 'NoneType' is not iterable
      17:22:35 make: *** [sharedmods] Error 1
      

      I investigated this briefly; there's something about the combination of the Python build logic and the environment variables emitted by the toolchain that causes CONFIG_ARGS to not get used stored properly by sysconfig.

      For now Casey has worked around this second issue by forcing the build of Kudu to use Python 2.7 from the Impala toolchain, but we should get to the bottom of this second issue as well.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              adar Adar Dembo
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: