debhelper can automate a lot of common things in debian package creation.
The current packages use an old style of debhelper, that often is unnecessarily complicated, making it harder to fix things.
For example, current Hadoop (0.23.3) does not compile on Debian because of the new GCC version. The fix is a simple "include <unistd.h>" in the HadoopPipes.cc file.
Modern Debian packaging with "quilt" has an excellent mechanism for managing such patches. However, in order to use this with the current Bigtop packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. manually add quilt patching to the debian/rules targets. 3. making sure the .debian.tar.gz is also copied instead of the old .diff.gz
You will be surprised how many things debhelper does well on its own with a rules file consisting just of little more than the automagic:
Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with setting up classpaths and choosing the JVM. It can do all of bigtop-utils and more, and it is used by other Java packages. IMHO it should be preferred instead.
If the packaging would be more Debian-standard, it would be alot easier to get the packages at some point accepted into Debian mainline. It may even be desirable to build the various hadoop components (-commmon, -yarn etc.) independently if they are isolated well enough upstream.
Don't get me wrong. I think the packages are pretty good already. In particularly I like the split into namenode and datanode packages and the use of update-alternatives, for example. I just found it rather hard to get a grip of the process and to get my fixes into the package. For example, I had to manually set JAVA_HOME before building, some build dependencies were missing (cmake, but it probably is a new requirement), some paths have changed (probably the yarn promotion to a top level project?)
I understand that you want to have as much common code for all distributions as possible, as opposed to having per-distribution packaging. However, if every project uses its own specific version of java-wrappers and build process, things will not really be better than if it is at least consistent across the various distributions.
But ideally, there should be very little packaging code needed anyway, and most things be done by an appropriate installation process upstream.
And seriously, /usr/lib/hadoop/lib is a *mess. There even is a package in there with a "" in the file name. Plus, a lot of these jars are available in Debian, and could be shared across packages if the packages would accept them to be managed by the distribution instead of shipping their own...
Even within the bigtop packages this leads to a totally unnecessary overlap:
995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-18.104.22.168.jar
995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-22.214.171.124.jar
995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-126.96.36.199.jar