Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-539

RO Path filter does not pick up hadoop configs from the spark context

    XMLWordPrintableJSON

Details

    Description

      Hi,
      I'm trying to use hudi to write to one of the Azure storage container file systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file schemes. The issue I'm facing is that in HoodieROTablePathFilter it tries to get a file path passing in a blank hadoop configuration. This manifests as java.io.IOException: No FileSystem for scheme: abfss because it doesn't have any of the configuration in the environment.

      The problematic line is

      https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96

       

       Stacktrace
       java.io.IOException: No FileSystem for scheme: abfss
       at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
       at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
       at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
       at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
       at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
       at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
       at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)

       

      Attachments

        Activity

          People

            vinoth Vinoth Chandar
            ssomuah Sam Somuah
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m