Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
We see a flood of errors logged by the ZipkinSpanReceiver when our Zipkin Collector service is not running - about one error every second or two, by each of our processes that are instrumented with HTrace and configured to send traces to Zipkin. Exacerbating the problem for us, it seems that with commons-logging, every line of the exception stack trace includes a prefix like "2015-06-29 09:03:25 zipkinSpanReceiver-0 STDIO [ERROR]", so that Splunk parses it as a separate error message. Here [1] is an example log file. It would be nice if this error logging could be rate-limited to something like no more than one per minute, or possibly only the initial occurrence logged until a successful send occurs to reset the state.