Description
This JIRA tracks 2 things:
1. There seems to be something going on in our assembly generation logic because of which are two assembly jars.
Something like:
spark-assembly_2.10-1.0.0-SNAPSHOT.jar
and
spark-assembly_2.10-1.0.0-SNAPSHOT-hadoop2.0.5-alpha.jar
The former is pretty bogus and doesn't contain any class files and should be gotten rid of. The latter contains all the good stuff. It essentially is the uber jar generated by the maven-shade-plugin
2. The current bigtop-dist profile that builds the maven assembly (a .tar.gz file) using the maven-assembly-plugin includes the bogus jar and not the legit spark-assembly jar. We should get rid of the first one from this assembly (which would happen when we fix #1) and put the legit uber jar in it.
3. Also, the bigtop-dist profile is meant to exclude the hadoop related jars from the distribution. It does a good job of doing so for org.apache.hadoop jars but misses the avro and zookeeper jars that are also provided by hadoop land.