[HUDI-539] RO Path filter does not pick up hadoop configs from the spark context - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.5.1
Fix Version/s: 0.6.0
Component/s: Common Core
Labels:
- bug-bash-0.6.0
- pull-request-available
Environment:
Spark version : 2.4.4
Hadoop version : 2.7.3
Databricks Runtime: 6.1

Description

Hi,
I'm trying to use hudi to write to one of the Azure storage container file systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file schemes. The issue I'm facing is that in HoodieROTablePathFilter it tries to get a file path passing in a blank hadoop configuration. This manifests as java.io.IOException: No FileSystem for scheme: abfss because it doesn't have any of the configuration in the environment.

The problematic line is

https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96

 Stacktrace
 java.io.IOException: No FileSystem for scheme: abfss
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
 at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
 at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)

Attachments

Issue Links

links to

GitHub Pull Request #1413

GitHub Pull Request #1415

GitHub Pull Request #1784

Activity

People

Assignee:: Vinoth Chandar

Reporter:: Sam Somuah

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Jan/20 15:47

Updated:: 04/Jul/20 14:17

Resolved:: 03/Jul/20 20:45

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m