[BEAM-9315] HadoopFileSystemOptions unable to interpret HADOOP_CONF_DIR with multiple paths - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: 2.19.0
Fix Version/s: 2.20.0
Component/s: io-java-hadoop-file-system
Labels:
None
Environment:
Cloudera CDH 6.3.2 with Spark 2.4.0 (Scala 2.11)

Description

In certain Hadoop deployments the HADOOP_CONF_DIR environment variable could contain multiple paths. For example, when running spark-submit Cloudera 6.3 sets it as follows:

HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/conf/yarn-conf:/etc/hive/conf

Currently the class HadoopFileSystemOptions reads the content of the variable but treats it as a single path. When it contains multiple paths, this makes Beam unable to properly configure Hadoop, and so HDFS can't be accessed. At the moment, the only work arounds to make it work that I'm aware of are:

Override the HADOOP_CONF_DIR set by Cloudera for the Spark service, but I think it could cause problems with some other tools (maybe when using Hive from Spark, because I think that Spark wouldn't be able to find Hive config)
Pass HDFS configurations using the --hdfsConfigurations options, but it's inconvenient when there are a lot of config to set, and they would not be changed automatically when reconfigured in Cloudera Manager

In my opinion, to fix this the HadoopFileSystemOptions class should split the content of the HADOOP_CONF_DIR environment variable by colon (":") to detect all paths contained.

I have already fixed this and all tests on class HadoopFileSystemOptions pass successfully. I'm preparing a pull request.

Attachments

Issue Links

links to

GitHub Pull Request #10866

Activity

People

Assignee:: Claudio Venturini

Reporter:: Claudio Venturini

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Feb/20 11:56

Updated:: 16/May/20 14:16

Resolved:: 14/Feb/20 21:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m