Casey uncovered several issues when building Kudu with the Impala toolchain; this report attempts to capture them.
The first and most important issue was a random SIGSEGV during a flush:
Todd traced this to a build issue with codegen. Specifically, when using our thirdparty clang to convert precompiled.cc into LLVM IR, we expect that it's using the same libstdc++ used by the rest of the Kudu build. It turns out there's no such guarantee, and depending on the version discrepancy, there may be a variety of issues, including at least one alignment change that could result in the kind of corruption that Casey is seeing.
Let's walk through the various scenarios at play:
- When building Kudu on a platform whose system libstdc++ supports C++11, libstdc++ is expected to be found in /usr regardless of the chosen compiler, be it the system's gcc, clang, or thirdparty's clang.
- On el6, we call scl enable devtoolset-3 before building Kudu. This puts a special build of gcc 4.9.2 on the PATH whose libstdc++ comes from /opt/rh/devtoolset-3/usr rather than from the system itself. To avoid discrepancies, we patch thirdparty clang to use that same path when searching for headers and libraries, so we end up with the same libstdc++ for Kudu as for emitted LLVM IR.
- On OSX, C++ supports comes by the way of libc++, with a location deep within XCode. This location is built into the system clang, which is also the compiler used to build Kudu. We don't patch thirdparty clang as on el6, so it can't find libc++ by default. However, Kudu adds -cxx-isystem <this XCode path> during the codegen build. In this way, the libc++ used in emitting LLVM IR is the same as what's used in the rest of Kudu.
- Building with the Impala toolchain is similar to the el6 case except without the patch to thirdparty's clang. Nor can it be patched in the same way; the toolchain location varies from system to system. Without the patch, thirdparty's clang ends up using the system's libstdc++, which isn't guaranteed to be the same as the version in the toolchain, and can lead to the issues described above. This needs to be addressed.
Separately, Casey ran into a build-time issue when building Kudu with the Impala toolchain on a platform that doesn't provide Python 2.7 (I think it was an el6 VM). On these platforms, Kudu builds its own Python 2.7 before building LLVM, as the latter depends on the former to build. The Python build failed with the following:
I investigated this briefly; there's something about the combination of the Python build logic and the environment variables emitted by the toolchain that causes CONFIG_ARGS to not get used stored properly by sysconfig.
For now Casey has worked around this second issue by forcing the build of Kudu to use Python 2.7 from the Impala toolchain, but we should get to the bottom of this second issue as well.