Type: New Feature
Affects Version/s: None
Fix Version/s: 2.4.0
Logging is a critical part of any system's infrastructure. It is the most direct way of observing what is happening with a system. In the case of issues, it helps us diagnose the problem quickly which in turn helps lower the MTTR.
Kafka supports application logging via the log4j library and outputs messages in various log levels (TRACE, DEBUG, INFO, WARN, ERROR). Log4j is a rich library that supports fine-grained logging configurations (e.g use INFO-level logging in kafka.server.ReplicaManager and use DEBUG-level in kafka.server.KafkaApis).
This is statically configurable through the log4j.properties file which gets read once at broker start-up.
A problem with this static configuration is that we cannot alter the log levels when a problem arises. It is severely impractical to edit a properties file and restart all brokers in order to gain visibility of a problem taking place in production.
It would be very useful if we support dynamically altering the log levels at runtime without needing to restart the Kafka process.
Log4j itself supports dynamically altering the log levels in a programmatic way and Kafka exposes a JMX API that lets you alter them. This allows users to change the log levels via a GUI (jconsole) or a CLI (jmxterm) that uses JMX.
There is one problem with changing log levels through JMX that we hope to address and that is Ease of Use:
- Establishing a connection - Connecting to a remote process via JMX requires configuring and exposing multiple JMX ports to the outside world. This is a burden on users, as most production deployments may stand behind layers of firewalls and have policies against opening ports. This makes opening the ports and connections in the middle of an incident even more burdensome
- Security - JMX and tools around it support authentication and authorization but it is an additional hassle to set up credentials for another system.
- Manual process - Changing the whole cluster's log level requires manually connecting to each broker. In big deployments, this is severely impractical and forces users to build tooling around it.
Ideally, Kafka would support dynamically changing log levels and address all of the aforementioned concerns out of the box.
We propose extending the IncrementalAlterConfig/DescribeConfig Admin API with functionality for dynamically altering the broker's log level.
This approach would also pave the way for even finer-grained logging logic (e.g log DEBUG level only for a certain topic) and would allow us to leverage the existing AlterConfigPolicy for custom user-defined validation of log-level changes.
These log-level changes will be temporary and reverted on broker restart - we will not persist them anywhere.