[FLINK-11086] Add support for Hadoop 3 - ASF JIRA

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.11.0
Component/s: Deployment / YARN
Labels:
- pull-request-available

Release Note:

Hide
Flink now supports Hadoop versions above Hadoop 3.0.0.

Note that the Flink project does not provide any updated "flink-shaded-hadoop-*" jars. Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH environment variable (recommended) or the lib/ folder.
Also, the "include-hadoop" Maven profile has been removed.

Show
Flink now supports Hadoop versions above Hadoop 3.0.0. Note that the Flink project does not provide any updated "flink-shaded-hadoop-*" jars. Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH environment variable (recommended) or the lib/ folder. Also, the "include-hadoop" Maven profile has been removed.

Description

All builds using maven 3.2.5 on commithash ed8ff14ed39d08cd319efe75b40b9742a2ae7558.

Attempted builds:

mvn clean install -Dhadoop.version=3.0.3
mvn clean install -Dhadoop.version=3.1.1

Integration tests with Hadoop input format datasource fail. Example stack trace, taken from hadoop.version 3.1.1 build:

testJobCollectionExecution(org.apache.flink.test.hadoopcompatibility.mapred.WordCountMapredITCase)  Time elapsed: 0.275 sec  <<< ERR
OR!
java.lang.NoClassDefFoundError: org/apache/flink/hadoop/shaded/com/google/re2j/PatternSyntaxException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.fs.Globber.doGlob(Globber.java:210)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:149)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2085)
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:269)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
        at org.apache.flink.api.java.hadoop.mapred.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:150)
        at org.apache.flink.api.java.hadoop.mapred.HadoopInputFormatBase.createInputSplits(HadoopInputFormatBase.java:58)
        at org.apache.flink.api.common.operators.GenericDataSourceBase.executeOnCollections(GenericDataSourceBase.java:225)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSource(CollectionExecutor.java:219)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:155)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:229)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:149)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:229)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:149)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:229)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:149)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:229)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:149)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:131)
        at org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:182)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:158)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:131)
        at org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:115)
        at org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:38)
        at org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:52)
        at org.apache.flink.test.hadoopcompatibility.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:121)
        at org.apache.flink.test.hadoopcompatibility.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:71)

Maybe hadoop 3.x versions could be added to test matrix as well?

Attachments

Issue Links

breaks

FLINK-18256 Hadoop dependencies are wrongly bundled into flink-orc

Closed

causes

FLINK-17701 Exclude jdk:tools dependency from all Hadoop dependencies for Java 9+ compatibility

Closed

FLINK-17938 Cannot run mvn clean verify flink-yarn-tests

Closed

relates to

FLINK-17978 Test Hadoop dependency change

Closed

links to

GitHub Pull Request #11983

Sub-Tasks

1.

Remove flink-shaded-hadoop-2-parent and submodules

Closed

Robert Metzger

Add support for Hadoop 3

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates