Impala is currently built against native libraries from the native-toolchain. The native-toolchain is built for supported operating systems and binaries are published to s3 so that they can be downloaded when building Impala. s3 hosts the compiled binaries of the supported libraries for all versions (including patch versions) for all supported OSes.
However, there are a few issues with the way this works today:
- Our jenkins job to build/publish the toolchain will overwrite existing published binaries with the most recent build. In theory, recompiling the same version of the same library shouldn't result in different bits, but it's possible. When we build a library of a specific version and patch level, we should keep that version forever. We can introduce a build version (e.g. library.version.patch_version.build_version) if necessary. We should be able to know exactly which bits were used for any given Impala build.
- s3 (native-toolchain bucket) is the only place the artifacts live, i.e. they aren't backed up anywhere. A mistake could rm all published binaries, and we would have to rebuild everything, possibly resulting in a slightly different binary (see #1). We should have a strategy for backing up or checking in the artifacts.