Currently, our binary tarball is built containing our own built jars, scripts, etc. as well as some extra dependencies which we assume are not available in either the Hadoop lib directory or the ZooKeeper lib directory.
These assumptions are tenuous, since we do not know what environment a user is going to be running in, and which jars they already have installed on their system, provided by their classpath or otherwise. Nor do we know whether the specific versions of the dependencies we're bundling for the user are compatible with what they have on their system.
What we are trying to do is make things convenient for a user, by performing integration/packager/dependency convergence tasks on behalf of the user... all based on poorly defined assumptions.
This bundling also adds an extra burden on us, as the upstream project, to maintain complex LICENSE/NOTICE files for the bundled tarball artifact we produce, and it's very easy for these legal files to unintentionally become out of sync when we change a version of a dependency we are bundling.
We should not bundle any dependencies inside our binary tarball. For convenience, we can instead provide a script which allows the user to easily download the dependencies we're currently assuming they will need (the same ones we're currently packaging for them). This will provide nearly the same convenience as we are currently providing, but in a way which does not require burdensome maintenance on our LICENSE/NOTICE files, and in a way that the user could easily customize this script to download the dependencies they actually need, if our assumptions aren't valid for their environment.