[HTRACE-200] Reduce rate of logged errors if Zipkin Collector service is down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: zipkin
Labels:
None

Description

We see a flood of errors logged by the ZipkinSpanReceiver when our Zipkin Collector service is not running - about one error every second or two, by each of our processes that are instrumented with HTrace and configured to send traces to Zipkin. Exacerbating the problem for us, it seems that with commons-logging, every line of the exception stack trace includes a prefix like "2015-06-29 09:03:25 zipkinSpanReceiver-0 STDIO [ERROR]", so that Splunk parses it as a separate error message. Here [1] is an example log file. It would be nice if this error logging could be rate-limited to something like no more than one per minute, or possibly only the initial occurrence logged until a successful send occurs to reset the state.

[1] http://pastebin.com/AieewfhF

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Olson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Jun/15 15:36

Updated:: 14/Oct/15 01:03