Log4cxx
  1. Log4cxx
  2. LOGCXX-129

Asyncappender is full of race conditions (improper use of condition variables)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Appender
    • Labels:
      None
    • Environment:
      all

      Description

      as title says.

      As a result, when used under any reasonable load, the dispatch() thread and all the append() threads will become blocked on their condition vars due to missed wakeup. Under lower load, sort of works, because the chances are the dispatcher will get woken up by a different append(). But as soon as the queue gets full, a missed wake up will occur, and all threads attempting to log will block.

      This is MAJOR problem, considering the website suggests current CVS head instead of 0.9.7. I was going to fix it myself and apply a patch, but needed to get my live systems working asap, so I've rolled back to 0.9.7. This works fine.

      I might come back at some point and submit a patch, but looking at the code, it seems that Curt Arnold was the one who ported over to APR threads, and he was the one who wrote the asyncappender, so I'm afraid I don't particularly trust the rest of the project now.

        Issue Links

          Activity

          Robert Wyrobek created issue -
          Hide
          Curt Arnold added a comment -

          It is definitely possible I bungled the port of the existing AsyncAppender to APR threads. There were never the types of unit tests that you would like to have to do such a bit of surgery. As I remember it, the pre-APR AsyncAppender had two distinct implementations, one for pthread and a Windows threads. The approach on the APR thread rework was to remove the Windows specific implementation and map the pthread calls to the equivalent APR thread calls as the semantics of APR thread calls and pthread calls appear to be nearly identical. Obviously the "nearly" could bite back.

          I was not involved in the development of AsyncAppender prior to the APR port or the log4j AsyncAppender. There have been several issues recently raised against the log4j AsyncAppender and another possibility is that those bugs also found their way into log4cxx. Since you seem to be satisfied with the log4cxx 0.9.7 implementation, I'd guess that you aren't getting bit by bugs shared between log4j and log4cxx.

          You did not identify on what platform(s) you have observed the problems. The potential differences between the 0.9.7 and SVN trunk code would be most pronounced on Windows where native Windows thread calls were replaced by calls to APR's emulation of pthread semantics.

          The approach that I would take to the problem would be to:

          1) write log4j unit tests that would recreate blocking problem on log4cxx.
          2) fix log4j if they don't pass
          3) port unit tests to log4cxx
          4) fix log4cxx

          Any additional analysis of the problems that you have would be helpful.

          Show
          Curt Arnold added a comment - It is definitely possible I bungled the port of the existing AsyncAppender to APR threads. There were never the types of unit tests that you would like to have to do such a bit of surgery. As I remember it, the pre-APR AsyncAppender had two distinct implementations, one for pthread and a Windows threads. The approach on the APR thread rework was to remove the Windows specific implementation and map the pthread calls to the equivalent APR thread calls as the semantics of APR thread calls and pthread calls appear to be nearly identical. Obviously the "nearly" could bite back. I was not involved in the development of AsyncAppender prior to the APR port or the log4j AsyncAppender. There have been several issues recently raised against the log4j AsyncAppender and another possibility is that those bugs also found their way into log4cxx. Since you seem to be satisfied with the log4cxx 0.9.7 implementation, I'd guess that you aren't getting bit by bugs shared between log4j and log4cxx. You did not identify on what platform(s) you have observed the problems. The potential differences between the 0.9.7 and SVN trunk code would be most pronounced on Windows where native Windows thread calls were replaced by calls to APR's emulation of pthread semantics. The approach that I would take to the problem would be to: 1) write log4j unit tests that would recreate blocking problem on log4cxx. 2) fix log4j if they don't pass 3) port unit tests to log4cxx 4) fix log4cxx Any additional analysis of the problems that you have would be helpful.
          Hide
          Nathan Zorn added a comment -

          I can recreate this same issue when using the AsyncAppender. This is with log4cxx 0.9.8 on Fedora Core 3, gcc 3.4.2, amd64.

          This "deadlock" occurs quite frequently, so if you need backtraces or any other information I can provide this.

          It isn't ideal for me to switch back to 0.9.7.

          thanks

          Show
          Nathan Zorn added a comment - I can recreate this same issue when using the AsyncAppender. This is with log4cxx 0.9.8 on Fedora Core 3, gcc 3.4.2, amd64. This "deadlock" occurs quite frequently, so if you need backtraces or any other information I can provide this. It isn't ideal for me to switch back to 0.9.7. thanks
          Hide
          Curt Arnold added a comment -

          The log4j AsyncAppender from which the log4cxx was derived had many issues and needed to be rewritten. http://issues.apache.org/bugzilla/show_bug.cgi?id=38137 tracked the rewrite in log4j. My proposed resolution would be to port the new tests and implementation from log4j to log4cxx and replace the current implementation.

          Show
          Curt Arnold added a comment - The log4j AsyncAppender from which the log4cxx was derived had many issues and needed to be rewritten. http://issues.apache.org/bugzilla/show_bug.cgi?id=38137 tracked the rewrite in log4j. My proposed resolution would be to port the new tests and implementation from log4j to log4cxx and replace the current implementation.
          Curt Arnold made changes -
          Field Original Value New Value
          Link This issue blocks LOGCXX-62 [ LOGCXX-62 ]
          Hide
          Curt Arnold added a comment -

          Marking issue as closed as code is ported and no known issues at the time.

          Show
          Curt Arnold added a comment - Marking issue as closed as code is ported and no known issues at the time.
          Curt Arnold made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.10.0 [ 10782 ]
          Resolution Fixed [ 1 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          579d 8h 37m 1 Curt Arnold 02/Oct/07 05:53

            People

            • Assignee:
              Curt Arnold
              Reporter:
              Robert Wyrobek
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development