[SPARK-47008] Spark to support S3 Express One Zone Storage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- pull-request-available

Description

Hadoop 3.4.0 adds support for AWS S3 Express One Zone Storage.

Most of this is transparent. However, one aspect which can surface as an issue is that these stores report prefixes in a listing when there are pending uploads, even when there are no files underneath

This leads to a situation where a listStatus of a path returns a list of file status entries which appears to contain one or more directories -but a listStatus on that path raises a FileNotFoundException: there is nothing there.

~~HADOOP-18996~~ handles this in all of hadoop code, including FileInputFormat,

A filesystem can now be probed for inconsistent directoriy listings through fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")

If true, then treewalking code SHOULD NOT report a failure if, when walking into a subdirectory, a list/getFileStatus on that directory raises a FileNotFoundException.

Although most of this is handled in the hadoop code, but there some places where treewalking is done inside spark These need to be identified and make resilient to failure on the recurse down the tree

SparkHadoopUtil list methods ,
especially listLeafStatuses used by OrcFileOperator
org.apache.spark.util.Utils#fetchHcfsFile

org.apache.hadoop.fs.FileUtil.maybeIgnoreMissingDirectory() can assist here, or the logic can be replicated. Using the hadoop implementation would be better from a maintenance perspective

Attachments

Issue Links

depends upon

HADOOP-18996 S3A to provide full support for S3 Express One Zone

Resolved

HADOOP-18948 S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete

Resolved

links to

GitHub Pull Request #46678

GitHub Pull Request #48497

Activity

People

Assignee:: Unassigned

Reporter:: Steve Loughran

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Feb/24 14:45

Updated:: 17/Oct/24 01:34