We have a low-garbage trading system. When a production incident occurs we want to temporarily increase visibility into what the system is doing by increasing the log level. However, we want to do this selectively: we want the ability to turn on detailed logging only for certain orders, for certain instruments, for a certain user, or for some other attributes.
The system processes many messages, say in the order of hundreds of thousands of messages per second. When a message is processed it passes through many components of the system, and when each of these components do logging, the message context information like order ID, instrument etc. should be available so the log event can be filtered in the normal case and enabled in case of a production incident.
We envision this "verbose logging" filtering to be switched on and off manually by an operations team. In addition to manual operation, it would be nice if the original verbosity can be restored automatically when some condition no longer holds. For example, log at some verbose level
- for some period of time (5 minutes)
- for some fixed number of events (10,000 log messages)
- any other user-specified condition
There are two problems to solve:
- efficient and garbage-free log event "tagging"
- dynamic and garbage-free filtering
Problem 1: Log Event "Tagging"
- Because of the volume of events we want the mechanism for tagging log events to be garbage-free.
- Our order IDs and instrument IDs etc are modeled as primitives (long and int values).
Current options in Log4j for "tagging" a log event:
- ThreadContext map. (This almost does what we need.) Data in this map is copied into each Log Event, and downstream components can query the data with the LogEvent.getContextMap() : Map<String, String> method. The drawback of the ThreadContext API is that values must be Strings. We would like to be able to store long primitives in the ThreadContext map. We can also take this opportunity to provide a garbage-free version of the ThreadContext (
LOG4J2-1349) and the LogEvent properties map.
- Direct user access to the LogEvent properties map.
LOG4J2-1010suggests changing the Logger API to accomplish this, but there may be alternatives.
- Update: the idea raised in
LOG4J2-1010to make initialization of the LogEvent's contextMap data pluggable is relevant for this ticket also: It means that context data can come from anywhere, not necessarily only from Log4j API classes like ThreadContext. Applications can have their own custom ThreadContext-like class with data that is copied into every log event, without requiring changes to the Log4j API.
- Markers: (Not a good option, mentioned here for completeness.) String-based, name attribute only (no key-value), hierarchical (may have parents). Current implementation caches all Markers forever, with an option to clear the whole cache. Unclear how to use this mechanism to tag a log event with key-value pairs like order ID, user ID and/or product ID.
- Update: one way to work around this is to create a KeyValueMarker implementation of the Marker interface. A single instance could contain multiple tags (key-value pairs of arbitrary type). Instances cannot be cached by name in the MarkerManager, but instead I imagine we can use a pool of KeyValueMarkers to reuse (so we would need to call release() or something when the log event is fully processed). This feels hacky though...
Problem 2: Dynamic and garbage-free filtering
It looks like installing a global filter on the Configuration is a good way to accomplish this. It is straightforward to do this with existing Log4j APIs:
Installing and removing a filter is rare and it is okay to create temporary objects when doing this. Once the filter is installed, it should not create temporary objects during steady state processing.
For the filter implementation, something like ThreadContextMapFilter is very close to what we need.
- ThreadContextMapFilter is not garbage-free: it creates a new iterator every time the filter() method is called.
- Auto-removal of the Filter after some time has passed or some number of events have been processed. It may be a useful addition to Log4j to provide a CompositeFilter that automatically removes itself from a Configuration when some condition is satisfied.