[SPARK-4048] Enhance and extend hadoop-provided profile - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.3.0
Component/s: Build
Labels:
None

Description

The hadoop-provided profile is used to not package Hadoop dependencies inside the Spark assembly. It works, sort of, but it could use some enhancements. A quick list:

It doesn't include all things that could be removed from the assembly
It doesn't work well when you're publishing artifacts based on it (~~SPARK-3812~~ fixes this)
There are other dependencies that could use similar treatment: Hive, HBase (for the examples), Flume, Parquet, maybe others I'm missing at the moment.
Unit tests, more specifically, those that use local-cluster mode, do not work when the assembly is built with this profile enabled.
The scripts to launch Spark jobs do not add needed "provided" jars to the classpath when this profile is enabled, leaving it for people to figure that out for themselves.
The examples assembly duplicates a lot of things in the main assembly.

Part of this task is selfish since we build internally with this profile and we'd like to make it easier for us to merge changes without having to keep too many patches on top of upstream. But those feel like good improvements to me, regardless.

Attachments

Issue Links

breaks

SPARK-5696 HiveThriftServer2Suite fails because of extra log4j.properties in the driver classpath

Resolved

depends upon

SPARK-3812 Adapt maven build to publish effective pom.

Resolved

SPARK-2706 Enable Spark to support Hive 0.13

Resolved

relates to

SPARK-5289 Backport publishing of repl, yarn into branch-1.2

Resolved

links to

[Github] Pull Request #2982 (vanzin)

Activity

People

Assignee:: Marcelo Masiero Vanzin

Reporter:: Marcelo Masiero Vanzin

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Oct/14 18:12

Updated:: 01/Jun/15 21:22

Resolved:: 09/Jan/15 01:15