Uploaded image for project: 'HTrace'
  1. HTrace
  2. HTRACE-200

Reduce rate of logged errors if Zipkin Collector service is down

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.0
    • None
    • zipkin
    • None

    Description

      We see a flood of errors logged by the ZipkinSpanReceiver when our Zipkin Collector service is not running - about one error every second or two, by each of our processes that are instrumented with HTrace and configured to send traces to Zipkin. Exacerbating the problem for us, it seems that with commons-logging, every line of the exception stack trace includes a prefix like "2015-06-29 09:03:25 zipkinSpanReceiver-0 STDIO [ERROR]", so that Splunk parses it as a separate error message. Here [1] is an example log file. It would be nice if this error logging could be rate-limited to something like no more than one per minute, or possibly only the initial occurrence logged until a successful send occurs to reset the state.

      [1] http://pastebin.com/AieewfhF

      Attachments

        Activity

          People

            Unassigned Unassigned
            noslowerdna Andrew Olson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: