Uploaded image for project: 'Log4cxx'
  1. Log4cxx
  2. LOGCXX-500

Logging in Timing-Critical Applications

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.10.0
    • 0.11.0
    • Core
    • None
    • Patch

    Description

      The following has been arrived on the mailing list, providing it here as well mainly to additionally collect the given patches etc.:

      Hello All,

      I'd like to share some experience as well as some patches with regard
      to using log4cxx in timing-critical application. First few words about
      our requirements: it's a service which must generate some network
      packets with up to hundred of microseconds precision. Thus, it's very
      important to have predictable code timing. One can argue that log4cxx
      is not very well suited for such applications, but surprisingly it
      works pretty well after some light tuning.

      So, what were the issues?
      Basically from library user's point of view they looked the same: one
      of a sudden logging done with LOG4CXX_DEBUG() macro could take
      unexpectedly long time to complete. For example the same trace which
      takes several μs for 99% of the time would take hundreds microseconds
      and even few milliseconds sometimes. After further investigation this
      has been traced down to few of the root-causes:

      1. Asyns logger (which we have been using of course) has internal queue
      to pass log entries to background disk-writer thread. This queue is
      mutex-protected which might seem fine unless you think a little bit
      more about it. First of all, someone calling LOG4CXX_DEBUG() to simply
      put something into the log might not expect to be blocked inside
      waiting for a mutex at all. Second point is that, although there were
      measures taken to minimize time disk-thread holds that lock,
      OS-schedulers often work in a way that thread which is blocked on a
      mutex gets de-scheduled. With normal OS-scheduler quantum that means
      that the logging thread can be preempted for milliseconds.

      2. There are some mutexes protecting internal states of both loggers
      and appenders. This means that two separate threads calling
      LOG4CXX_DEBUG() can block each other. Even if they are using different
      loggers they would block on appender! This has the same consequences
      for execution timing and the performance as described above.

      3. std::stringstream class constructor has some internal locks on it's
      own. Unfortunately each MessageBuffer has it's own instance of this
      class. And also unfortunately MessageBuffer is created inside
      LOG4CXX_DEBUG() macro. There is optimization to not create stringstream
      for logging simple strings, but as soon as your log statement has
      single '<<' operator it's created.

      4. Dynamic memory allocations. Unfortunately there are still quite few
      of them even though memory pool is used in some other places. Thus,
      hidden calls to new and malloc induce unpredictable delays.

      So, what we did to mitigate these problems?

      1. Natural solution for this issue was to use atomic queue. There are
      few of them available, but we made use of boost::lockfree::queue as it
      can serve as a drop-in replacement allowing us to keep all present
      functionality.

      2. After looking more into the code it has appeared that two concurrent
      calls to LOG4CXX_DEBUG() from within different threads are not harmful
      because internal structures of logger and appender are not being
      changed there. What only really requires protection is concurrency
      between logging and configuring. Thus, we came to a solution -
      read-write locks where logging calls act as readers and
      configuration/exiting calls are writers. With such approach multiple
      threads calling LOG4CXX_DEBUG() became free of any contention.

      3. This problem also has one simple solution - make one static
      std::stringstream object per thread using std::thread_local.
      Unfortunately we found one drawback - thread_local memory is not
      released if thread is not detached or joined. As there is some code
      which does neither of this we made static stringstream a xml file
      configuration option. Also, there could be an issue with using multiple
      MessageBuffer instances from within single thread, but LOG4CXX_DEBUG()
      is not doing that.

      4. At this time we didn't do anything to address dynamic memory
      allocation issue.

      So, if you want to give our patch a try it is attached. It's based on
      log4cxx 0.10.0-12 as found in Debian. There are new SHARED_MUTEX and
      LOCK_R/LOCK_W macros defined in mutex.h and synchronized.h for easy
      switching between RW and convenient mutexes for benchmarking. Also,
      there is an test application which spawns two threads doing logging
      with some small sleep between iterations. It prints to stdout if trace
      statement took more than 500 microseconds. It might look familiar too
      you because it's based on one of the examples from StackOverflow.

      In our testing modified log4cxx has about 20% better performance
      overall, but more importantly for us it has much less cases when log
      statement takes excessive time to complete. The second part is only
      true for CPUs with >2 cores where all threads can physically run in
      parallel. Unfortunately we still see rare cases with prolonged logging
      calls, we have traced that down to dynamic memory allocation issue.

      Any thoughts?

      Denys Smolianiuk

      Senior SW Engineer
      Harmonic Inc.

      http://mail-archives.apache.org/mod_mbox/logging-dev/201808.mbox/<1534433474.4460.1%40smtp-mail.outlook.com>

      Attachments

        1. config.xml
          2 kB
          Thorsten Schöning
        2. main.cpp
          2 kB
          Thorsten Schöning
        3. non_blocking.diff
          47 kB
          Thorsten Schöning
        4. non_blocking_wo_debian_control.diff
          46 kB
          Thorsten Schöning

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tschoening Thorsten Schöning
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment