Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.0
-
None
Description
Currently, we have two extremes of logging - either INFO wich logs almost nothing, or DEBUG, which will pollute logs with too verbose messages.
We should create a 'troubleshooting' logger, which should be easily enabled (via a system property, for example) and log all stability-critical node and cluster events:
- Connection events (both communication and discovery), handshake status
- ALL ignored messages and skipped actions (even those we assume are safe to ignore)
- Partition exchange stages and timings
- Verbose discovery state changes (this should make it easy to understand the reason for 'Node has not been connected to the topology')
- Transaction failover stages and actions
- All unlogged exceptions
- Responses that took more than N milliseconds when in normal they should return right away
- Long discovery SPI messages processing times
- Managed service deployment stages
- Marshaller mappings registration and notification
- Binary metadata registration and notification
- Continuous query registration / notification
(add more)
The amount of logging should be chosen accurately so that it would be safe to enable this logger in production clusters.
Attachments
Issue Links
- contains
-
IGNITE-7195 GridToStringBuilder should limit large collections output to first N elements
- Resolved
- is related to
-
IGNITE-5630 Exception for node disconnect
- Resolved
-
IGNITE-5664 Implement self-check facilities for Ignite
- Open
- supercedes
-
IGNITE-5332 Add toString() to GridNearAtomicAbstractSingleUpdateRequest and it's inheritors
- Resolved
- links to