[HADOOP-19199] Include FileStatus when opening a file from FileSystem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.4.0
Fix Version/s: None
Component/s: fs
Labels:
- pull-request-available

Description

The FileSystem abstract class prevents that if you have information about the FileStatus of a file, you use it to open that file, which means that in the implementations of the open method, they have to request the FileStatus of the same file again, making unnecessary requests.

A very clear example is seen in today's latest version of the parquet-hadoop implementation, where:

https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java

Although to create the implementation you had to consult the file to know its FileStatus, when opening it only the path is included, since the FileSystem implementation is the only thing it allows you to do. This implies that the implementation will surely, in its open function, verify that the file exists or what information the file has and perform the same operation again to collect the FileStatus.

This would simply be resolved by taking the latest current version:

https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java

and including the following:

public FSDataInputStream open(FileStatus f) throws IOException

{ return this.open(f.getPath(), this.getConf().getInt("io.file.buffer.size", 4096)); }

This would imply that it is backward compatible with all current Filesystems, but since it is in the implementation it could be used when this information is already known.

Attachments

Issue Links

depends upon

PARQUET-2493 HadoopInputFile to pass down FileStatus when opening file.

Open

HADOOP-19200 Reduce the number of headObject when opening a file with the s3 file system

Resolved

SPARK-48571 Reduce the number of accesses to S3 object storage

Open

duplicates

HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

Resolved

relates to

HADOOP-19131 WrappedIO to export modern filesystem/statistics APIs in a reflection friendly form

Resolved

HADOOP-19200 Reduce the number of headObject when opening a file with the s3 file system

Resolved

links to

GitHub Pull Request #6877

(1 relates to, 1 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Oliver Caballero Alvarez

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Jun/24 11:07

Updated:: 10/Dec/24 14:37

Resolved:: 10/Jun/24 14:07