[FLINK-25029] Hadoop Caller Context Setting In Flink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: FileSystems
Labels:
- pull-request-available
- stale-assigned

Description

For a given HDFS operation (e.g. delete file), it's very helpful to track which upper level job issues it. The upper level callers may be specific Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode (NN) is abused/spammed, the operator may want to know immediately which MR job should be blamed so that she can kill it. To this end, the caller context contains at least the application-dependent "tracking id".

The above is the main effect of the Caller Context. HDFS Client set Caller Context, then name node get it in audit log to do some work.

Now the Spark and hive have the Caller Context to meet the HDFS Job Audit requirement.

In my company, flink jobs often cause some problems for HDFS, so we did it for preventing some cases.

If the feature is general enough. Should we support it, then I can submit a PR for this.

Attachments

Issue Links

is related to

FLINK-16809 Support setting CallerContext on YARN deployments

Open

relates to

FLINK-25224 Upgrade the minimal supported hadoop version to 2.8.5

Closed

FLINK-25339 Moving to the hadoop-free flink runtime.

Open

links to

GitHub Pull Request #17958

Activity

People

Assignee:: chenfengLiu

Reporter:: chenfengLiu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Nov/21 06:24

Updated:: 21/Dec/23 04:22