Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31935

Hadoop file system config should be effective in data source options

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.6, 3.0.0
    • 2.4.7, 3.0.1, 3.1.0
    • SQL
    • None

    Description

      Data source options should be propagated into the hadoop configuration of method `checkAndGlobPathIfNecessary`

      From org.apache.hadoop.fs.FileSystem.java:

        public static FileSystem get(URI uri, Configuration conf) throws IOException {
          String scheme = uri.getScheme();
          String authority = uri.getAuthority();
      
          if (scheme == null && authority == null) {     // use default FS
            return get(conf);
          }
      
          if (scheme != null && authority == null) {     // no authority
            URI defaultUri = getDefaultUri(conf);
            if (scheme.equals(defaultUri.getScheme())    // if scheme matches default
                && defaultUri.getAuthority() != null) {  // & default has authority
              return get(defaultUri, conf);              // return default
            }
          }
          
          String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
          if (conf.getBoolean(disableCacheName, false)) {
            return createFileSystem(uri, conf);
          }
      
          return CACHE.get(uri, conf);
        }
      

      With this, we can specify URI schema and authority related configurations for scanning file systems.

      Attachments

        Activity

          People

            Gengliang.Wang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: