[SPARK-23807] Add Hadoop 3 profile with relevant POM fix ups - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: Build
Labels:
None

Target Version/s:

2.4.0

Description

Hadoop 3, and particular Hadoop 3.1 adds:

Java 8 as the minimum (and currently sole) supported Java version
A new "hadoop-cloud-storage" module intended to be a minimal dependency POM for all the cloud connectors in the version of hadoop built against
The ability to declare a committer for any FileOutputFormat which supercedes the classic FileOutputCommitter -in both a job and for a specific FS URI
A shaded client JAR, though not yet one complete enough for spark.
Lots of other features and fixes.

The basic work of building spark with hadoop 3 is one of just doing the build with -Dhadoop.version=3.x.y; however that

Doesn't build on SBT (dependency resolution of zookeeper JAR)
Misses the new cloud features

The ZK dependency can be fixed everywhere by explicitly declaring the ZK artifact, instead of relying on curator to pull it in; this needs a profile to declare the right ZK version, obviously..

To use the cloud features spark the hadoop-3 profile should declare that the spark-hadoop-cloud module depends on —and only on— the hadoop/hadoop-cloud-storage module for its transitive dependencies on cloud storage, and a source package which is only built and tested when build against Hadoop 3.1+

Attachments

Issue Links

is depended upon by

SPARK-18673 Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

Resolved

links to

[Github] Pull Request #20923 (steveloughran)

GitHub Pull Request #20923

GitHub Pull Request #24045

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 28/Mar/18 16:25

Updated:: 14/Apr/20 16:46

Resolved:: 24/Apr/18 16:58