Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha4
-
None
-
Reviewed
Description
FileSystem contains several methods that act as convenience wrappers over calling getFileStatus and retrieving a single property of the returned FileStatus. These methods have a habit of fostering inefficient call patterns in applications, resulting in multiple redundant getFileStatus calls. For HDFS, this translates into wasteful NameNode RPC traffic. For file systems backed by cloud object stores, this translates into wasteful HTTP traffic. This issue proposes to deprecate these methods and instead encourage applications to call getFileStatus and then reuse the same FileStatus instance as needed.
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-13427 Eliminate needless uses of FileSystem#{exists(), isFile(), isDirectory()}
- Resolved
- is depended upon by
-
HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries
- Resolved
- is related to
-
SPARK-16736 remove redundant FileSystem status checks calls from Spark codebase
- Resolved
- relates to
-
HADOOP-12876 [Azure Data Lake] Support for process level FileStatus cache to optimize GetFileStatus frequent operations
- Resolved
-
HADOOP-13204 Über-jira: S3a phase III: scale and tuning
- Resolved
-
HIVE-10223 Consolidate several redundant FileSystem API calls.
- Closed
-
HIVE-14323 Reduce number of FS permissions and redundant FS operations
- Closed
-
PIG-4442 Eliminate redundant RPC call to get file information in HPath.
- Closed