Currently for audit to HDFS, Ranger writes to the file and then transfers the entire file to HDFS on regular interval. This adds additional write operation to the local disk.
The proposal is to write a more intelligent audit writer, audits are sent to destination in real time (or batches) and if the destination is down, then write to local file. When the destination is available, then first send the audit logs from the file system and after it is caught up, resume real-time streaming.
This design also need to address use cases where the destination is slower than the audit producer. In which case, if the internal queue reaches a certain threshold, then the audit will be written to local file till the destination till the in-memory queue is drained.
The design should be generic enough to support any type of destination. By default, the implementation for the following destinations should be provided:
3. Local File
4. Log4J (with any supported appender)
Additional good to have destinations are :