[HIVE-27884] LLAP: Reuse FileSystem objects from cache across different tasks in the same LLAP daemon - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.1.0
Component/s: None
Labels:
- pull-request-available

Description

Originally, when the task runner was added to ~~HIVE-10028~~ (here), the FileSystem.closeAllForUGI was commented out for some reasons, and then, in the scope of ~~HIVE-9898~~ it was simply added back, here

A FileSystem.close call basically does the following:
1. delete all paths that were marked as delete-on-exit.
2. removes the instance from the cache

I saw that we call

FileSystem.closeAllForUGI

at the end of all task attempts, so we almost completely disable hadoop's filesystem cache during a long-running LLAP daemon lifecycle

some investigations on azure showed that creating a filesystem can be quite expensive, as it involves the recreation of a whole object hierarchy like:

AzureBlobFileSystem -> AzureBlobFileSystemStore --> AbfsClient -> TokenProvider(MsiTokenProvider)

which ends up pinging the token auth endpoint of azure, leading to e.g. a HTTP response 429

the other area that's really affected by this patch is the aws sdk v2, where we discovered performance degradation (github issues also imply this problem), look at:

this screenshot is just for reference: I mean it doesn't prove the perf degradation (because it was only visible with wall clock profiling), but the problematic codepath (which was introduced in aws sdk v2) is visible here

additionally: deleteOnExit, please refer to HIVE-28335

We need to check whether we can remove this closeAllForUGI in LLAP, additionally check and remove all deleteOnExit calls that belong to hadoop FileSystem objects (doesn't necessarily apply to java.io.File.deleteOnExit calls):

grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
...
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        // in recent hadoop versions, use deleteOnExit to clean tmp files.
ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:        autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:        fileSystem.deleteOnExit(dir);
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java:      tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:        parentDir.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:        tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/ObjectContainer.java:        tmpFile.deleteOnExit();
ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java:        autoDelete = fs.deleteOnExit(outPath);

I believe deleteOnExit is fine if we don't want to bother with removing temp files, however, these deletions might want to go to a more hive-specific scope if we want to really reuse cached filesystems in a safe manner.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screenshot 2024-07-30 at 10.18.13.png
30/Jul/24 08:18
1.66 MB
László Bodor

Issue Links

is related to

HIVE-9898 LLAP: Sort out issues with UGI and cached FileSystems

Closed

HIVE-10028 LLAP: Create a fixed size execution queue for daemons

Closed

HADOOP-17377 ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata Service

Open

HIVE-131 insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Closed

HIVE-13391 add an option to LLAP to use keytab to authenticate to read data

Closed

relates to

HIVE-28335 Review deleteOnExitUsage

Open

links to

GitHub Pull Request #4882

(1 relates to, 1 links to)

Activity

People

Assignee:: László Bodor

Reporter:: László Bodor

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Nov/23 18:47

Updated:: 16/Sep/24 13:14

Resolved:: 16/Sep/24 08:35