Log4cxx
  1. Log4cxx
  2. LOGCXX-132

various segmentation faults in multithreaded application

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Description

      Have been experiencing various segmentation faults in log4cxx in a multithreaded application.

      NOTE: I will be attaching my test application, gdb, and valgrind output as soon as I determine the best way to do this through JIRA.

      1. threadtest.tar.gz
        27 kB
        Eric Wyles
      2. 04_30_2006_stack1.txt
        13 kB
        Eric Wyles
      3. 04_30_2006_stack2.txt
        8 kB
        Eric Wyles
      4. 04_30_2006_stack3.txt
        10 kB
        Eric Wyles

        Issue Links

          Activity

          Eric Wyles created issue -
          Hide
          Eric Wyles added a comment -

          The attached archive contains my testapp and the details of the issues I have found. Where possible, I have recreated the issue running under both gdb and valgrind, but in some cases this was not possible.

          Test Application Files:
          loggerthread.cpp
          threadtestcase.cpp
          Makefile

          Issue #1:
          1_gdb.txt
          1_valgrind.txt

          Issue #2:
          2_gdb.txt
          2_valgrind1.txt
          2_valgrind2.txt

          Issue #3:
          3_gdb.txt
          3_valgrind1.txt
          3_valgrind2.txt

          Issue #4:
          4_gdb.txt

          Issue #5:
          5_gdb.txt

          Issue #6:
          6_gdb.txt

          Issue #7:
          7_gdb.txt

          Issue #8:
          8_valgrind.txt

          Show
          Eric Wyles added a comment - The attached archive contains my testapp and the details of the issues I have found. Where possible, I have recreated the issue running under both gdb and valgrind, but in some cases this was not possible. Test Application Files: loggerthread.cpp threadtestcase.cpp Makefile Issue #1: 1_gdb.txt 1_valgrind.txt Issue #2: 2_gdb.txt 2_valgrind1.txt 2_valgrind2.txt Issue #3: 3_gdb.txt 3_valgrind1.txt 3_valgrind2.txt Issue #4: 4_gdb.txt Issue #5: 5_gdb.txt Issue #6: 6_gdb.txt Issue #7: 7_gdb.txt Issue #8: 8_valgrind.txt
          Eric Wyles made changes -
          Field Original Value New Value
          Attachment threadtest.tar.gz [ 12325019 ]
          Curt Arnold made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Curt Arnold added a comment -

          I've been able to reproduce a SegFault and attempting to find the root cause. The behavior is likely only observed when multiple threads call Logger::getLogger() or LogManager::shutdown() is called while other threads are logging. The problems need to be fixed, but you may review your application and see if you'd workaround the problem by eliminating the getLogger() calls in threaded code.

          Show
          Curt Arnold added a comment - I've been able to reproduce a SegFault and attempting to find the root cause. The behavior is likely only observed when multiple threads call Logger::getLogger() or LogManager::shutdown() is called while other threads are logging. The problems need to be fixed, but you may review your application and see if you'd workaround the problem by eliminating the getLogger() calls in threaded code.
          Hide
          Eric Wyles added a comment -

          Sure, I can certainly try working around the issue. I may need some guidance though...

          My actual application is much like the testapp but with a lot more classes. There are at least 16-20 threads running constantly doing different things. Do you have a recommended way for me to eliminate these getLogger() calls from my threads but at the same time allow each of my classes the have differently named loggers?

          I could remove all the getLogger() calls from the classes, but that would not be ideal for me since we use the logger names to control our logging level.

          I can definitely try this out in my test application just to see if it avoids the segmentation faults, but if this is the recommended workaround for my actual application I'll need some guidance about how to make this work. Would it be sufficient to wrap the getLogger() calls in a function of my own that would have synchronization?

          Show
          Eric Wyles added a comment - Sure, I can certainly try working around the issue. I may need some guidance though... My actual application is much like the testapp but with a lot more classes. There are at least 16-20 threads running constantly doing different things. Do you have a recommended way for me to eliminate these getLogger() calls from my threads but at the same time allow each of my classes the have differently named loggers? I could remove all the getLogger() calls from the classes, but that would not be ideal for me since we use the logger names to control our logging level. I can definitely try this out in my test application just to see if it avoids the segmentation faults, but if this is the recommended workaround for my actual application I'll need some guidance about how to make this work. Would it be sufficient to wrap the getLogger() calls in a function of my own that would have synchronization?
          Hide
          Curt Arnold added a comment -

          I'm still digging into it, seems that Logger::getLogger() itself is thread-safe, but can negatively affect previously obtained loggers.

          Changing:

          class FooBar {
          void method()

          { static LoggerPtr logger(Logger::getLogger("FooBar")); ... }

          }

          to:

          class FooBar {
          private static Logger logger(Logger::getLogger("FooBar"));
          void method()

          { ... }

          }

          would push the getLogger calls to app start up time.

          Show
          Curt Arnold added a comment - I'm still digging into it, seems that Logger::getLogger() itself is thread-safe, but can negatively affect previously obtained loggers. Changing: class FooBar { void method() { static LoggerPtr logger(Logger::getLogger("FooBar")); ... } } to: class FooBar { private static Logger logger(Logger::getLogger("FooBar")); void method() { ... } } would push the getLogger calls to app start up time.
          Hide
          Eric Wyles added a comment -

          I'll give this one a shot and let you know.

          Out of all the segmentation faults I have seen, most of them have been related to Logger::getLogger()

          However, if you look through those files that I attached, you will see a couple of instances where the segmentation faults seemed to be in some character set encoding/decoding type methods that I don't believe are related to Logger::getLogger() at all.

          The Logger::getLogger() problems were far more common, so once those are taken care of we may have some of these character set issues left over.

          Show
          Eric Wyles added a comment - I'll give this one a shot and let you know. Out of all the segmentation faults I have seen, most of them have been related to Logger::getLogger() However, if you look through those files that I attached, you will see a couple of instances where the segmentation faults seemed to be in some character set encoding/decoding type methods that I don't believe are related to Logger::getLogger() at all. The Logger::getLogger() problems were far more common, so once those are taken care of we may have some of these character set issues left over.
          Hide
          Eric Wyles added a comment -

          I wasn't able to get the syntax quite right to declare it exactly as you stated (ie, as a Logger instead of LoggerPtr). But, if I remove the LoggerPtrs from my methods and just use 1 LoggerPtr that is a private member variable of the class, the segmentation faults seem to be gone.

          At the very least, they are not as frequent. I've been running in a loop for 5 minutes or so with no occurrances. Previously I would get a segmentation fault almost immediately.

          Let me know if there is anything else I can try.

          Show
          Eric Wyles added a comment - I wasn't able to get the syntax quite right to declare it exactly as you stated (ie, as a Logger instead of LoggerPtr). But, if I remove the LoggerPtrs from my methods and just use 1 LoggerPtr that is a private member variable of the class, the segmentation faults seem to be gone. At the very least, they are not as frequent. I've been running in a loop for 5 minutes or so with no occurrances. Previously I would get a segmentation fault almost immediately. Let me know if there is anything else I can try.
          Hide
          Curt Arnold added a comment -

          I derived a unit test from the attached sample test (tests/src/threadtest.cpp) and have added it to the project and have addressed two issues that may have been responsible for some of the observed segmentation faults.

          The implementation of ObjectPtrT::operator=() in include/log4cxx/helpers/objptr.h was not atomic. In particular, releaseRef() was called on the existing value while the existing value was still visible which could result in jumping to a destructed object. I've modified the template to use APR's apr_atomic_casptr (doesn't seem to be a plain exchange ptr method) to exchange the old pointer and new pointer values and delayed release until after the pointers are exchanged.

          Hierarchy::shutdown did not have a synchronization block. Without it ThreadTest::testMultiThreadAndShutdown would fail as loggers would be requested while the hierarchy was in a partially shutdown state.

          The Pool parameter was removed from Thread::run() as the pool was expected to last as long as the object. If you did (as I did in an earlier version of ThreadTest):

          Thread thread;

          { Pool p; thread.run(p, runnable, 0); }

          thread.join();

          You would get a crash on the join as APR threads created in the run method would have been deleted when the pool went out of scope.

          The ThreadTest still fails on Windows due to Hierarchy::getLogger() apparently returning null loggers.

          Show
          Curt Arnold added a comment - I derived a unit test from the attached sample test (tests/src/threadtest.cpp) and have added it to the project and have addressed two issues that may have been responsible for some of the observed segmentation faults. The implementation of ObjectPtrT::operator=() in include/log4cxx/helpers/objptr.h was not atomic. In particular, releaseRef() was called on the existing value while the existing value was still visible which could result in jumping to a destructed object. I've modified the template to use APR's apr_atomic_casptr (doesn't seem to be a plain exchange ptr method) to exchange the old pointer and new pointer values and delayed release until after the pointers are exchanged. Hierarchy::shutdown did not have a synchronization block. Without it ThreadTest::testMultiThreadAndShutdown would fail as loggers would be requested while the hierarchy was in a partially shutdown state. The Pool parameter was removed from Thread::run() as the pool was expected to last as long as the object. If you did (as I did in an earlier version of ThreadTest): Thread thread; { Pool p; thread.run(p, runnable, 0); } thread.join(); You would get a crash on the join as APR threads created in the run method would have been deleted when the pool went out of scope. The ThreadTest still fails on Windows due to Hierarchy::getLogger() apparently returning null loggers.
          Hide
          Eric Wyles added a comment -

          Can I expect to find the changes in tomorrow's tar ball at http://littletux.homelinux.org/log4cxx/ ?

          If so, I'll plan to try breaking things again first thing tomorrow morning.

          Thanks for the good work!

          Show
          Eric Wyles added a comment - Can I expect to find the changes in tomorrow's tar ball at http://littletux.homelinux.org/log4cxx/ ? If so, I'll plan to try breaking things again first thing tomorrow morning. Thanks for the good work!
          Hide
          Curt Arnold added a comment -

          Gump is also failing the no_wchar_t build. I've seen two different types of stack traces. One traces back to line 190 in hierarchy.cpp where the logger smart pointer at that point contains a null value instead of the logger created at line 178. I don't see a mechanism for the change yet.

          The other stack trace went to APRCharsetDecoder::decode where the out.append() around line 100 would raise a size exception. I experimented with a Mutux and synchronized blocks in case apr_xlate_conv_buffer was not thread-safe which appeared to eliminate that particular seg fault. Committed changes on rev 398236.

          Show
          Curt Arnold added a comment - Gump is also failing the no_wchar_t build. I've seen two different types of stack traces. One traces back to line 190 in hierarchy.cpp where the logger smart pointer at that point contains a null value instead of the logger created at line 178. I don't see a mechanism for the change yet. The other stack trace went to APRCharsetDecoder::decode where the out.append() around line 100 would raise a size exception. I experimented with a Mutux and synchronized blocks in case apr_xlate_conv_buffer was not thread-safe which appeared to eliminate that particular seg fault. Committed changes on rev 398236.
          Eric Wyles made changes -
          Attachment 04_30_2006_stack1.txt [ 12326084 ]
          Eric Wyles made changes -
          Attachment 04_30_2006_stack2.txt [ 12326085 ]
          Eric Wyles made changes -
          Attachment 04_30_2006_stack3.txt [ 12326086 ]
          Hide
          Eric Wyles added a comment -

          Looks like we have made some good progress, but there are still some segmentation faults hanging around. They are less frequent now, and my count of unique stack traces is down to 3 now (was previously 8). One of the stack traces is in the new log4cxx::helpers::ObjectPtrBase::exchange() method, so I wonder if we might have consolidated most of the previous problems into 1 problem now

          I have attached 3 files containing gdb output from opening my core file and running 'bt' and then 'thread apply all bt' – there is one file for each stack trace.

          I still have the core files if I need to run anything else against them.

          Show
          Eric Wyles added a comment - Looks like we have made some good progress, but there are still some segmentation faults hanging around. They are less frequent now, and my count of unique stack traces is down to 3 now (was previously 8). One of the stack traces is in the new log4cxx::helpers::ObjectPtrBase::exchange() method, so I wonder if we might have consolidated most of the previous problems into 1 problem now I have attached 3 files containing gdb output from opening my core file and running 'bt' and then 'thread apply all bt' – there is one file for each stack trace. I still have the core files if I need to run anything else against them.
          Hide
          Curt Arnold added a comment -

          All the recent stack traces appear to be symptoms of the same underlying problem, that somehow between the logger creation at line 178 in hierarchy and line 190, a recently constructed logger gets deallocated before it should have ever been publically visible. In exchange() seg fault occurs when trying to set the parent of a null logger, the others occur when some action is done on a null logger.

          Show
          Curt Arnold added a comment - All the recent stack traces appear to be symptoms of the same underlying problem, that somehow between the logger creation at line 178 in hierarchy and line 190, a recently constructed logger gets deallocated before it should have ever been publically visible. In exchange() seg fault occurs when trying to set the parent of a null logger, the others occur when some action is done on a null logger.
          Curt Arnold made changes -
          Link This issue is duplicated by LOGCXX-137 [ LOGCXX-137 ]
          Hide
          Curt Arnold added a comment -

          I've made a small tweak on Windows (use InterlockedExchangePointer instead of apr_atomic_casptr) and removed tests/src/threadtest.cpp in revision 406131.

          I removed the test since the idioms of:

          void someMethod()

          { static LoggerPtr l = Logger::getLogger("org.example.foo"); l.debug("Hello, world"); }

          and

          void someMethod()

          { static LoggerPtr l(Logger::getLogger("org.example.foo")); l.debug("Hello, world"); }

          are not thread-safe without some explicit synchronization. I think the same problem would occur for any type, for example, an std::string. This is most pronounced with the Microsoft C++ compiler where threadtest will fail immediately with one thread in the initialization of l while other threads that detected that initialization had started immediately proceeded to the l.debug which would fail with a null pointer exception.

          Better options would be to place the logger as a static class member (which would get initialized during module load), remove the static so each invocation accesses its own copy of l or adding an explicit synchronization block.

          Show
          Curt Arnold added a comment - I've made a small tweak on Windows (use InterlockedExchangePointer instead of apr_atomic_casptr) and removed tests/src/threadtest.cpp in revision 406131. I removed the test since the idioms of: void someMethod() { static LoggerPtr l = Logger::getLogger("org.example.foo"); l.debug("Hello, world"); } and void someMethod() { static LoggerPtr l(Logger::getLogger("org.example.foo")); l.debug("Hello, world"); } are not thread-safe without some explicit synchronization. I think the same problem would occur for any type, for example, an std::string. This is most pronounced with the Microsoft C++ compiler where threadtest will fail immediately with one thread in the initialization of l while other threads that detected that initialization had started immediately proceeded to the l.debug which would fail with a null pointer exception. Better options would be to place the logger as a static class member (which would get initialized during module load), remove the static so each invocation accesses its own copy of l or adding an explicit synchronization block.
          Curt Arnold made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Eric Wyles added a comment -

          Thanks, Curt. Removing 'static' did in fact solve my problems.

          Show
          Eric Wyles added a comment - Thanks, Curt. Removing 'static' did in fact solve my problems.

            People

            • Assignee:
              Curt Arnold
              Reporter:
              Eric Wyles
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development