Description
Currently the examples bring in a lot of external dependencies, ballooning the size of the Spark distribution packages.
I'd like to propose two things to slim down these dependencies:
- make all non-Spark, and also Spark Streaming, dependencies "provided". This means, especially for streaming connectors, that launching examples becomes more like launching real applications (where you need to figure out how to provide those dependencies, e.g. using --packages).
- audit examples and remove those that don't provide a lot of value. For example, HBase is working on full-featured Spark bindings, based on code that has already been in use for a while before being merged into HBase. The HBase example in Spark is very bare bones and, in comparison, not really useful and in fact a little misleading.