Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It will be very useful to normalize the audit format across various Hadoop components.
A common audit format will help both tools parse the audit record consistently across sub-projects and will be easier for humans to interpret audit details.
If a new common audit format is devised it will be useful to consider the following W's of audit
1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp,
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated.
There are perhaps 2 ways to achieve the goal of normalized audit records.
1. A common audit facility - as components can start to uptake this common audit facility, their audit records start adopting to the normalized audit record format.
2. Change each component to produce audit record in a common format.
Approach 1 appears to be more doable.