Uploaded image for project: 'Log4cxx'
  1. Log4cxx
  2. LOGCXX-186

Garbage characters in log files when log requests from multiple threads with hyperthreading enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.10.0
    • Appender
    • None
    • Ubuntu, Intel, multi-processor hyperthreading

    Description

      On Jun 29, 2007, at 8:09 AM, pete@mu.org wrote:

      Hi, I've been using the subversion log4cxx-10.0.0, which is currently at revision 551872.

      I'm running on an ubuntu linux box running kernel 2.6.20-16, and g++ 4.1.2

      I've been seeing severe race conditions between threads when logging to a FileAppender.

      Basically, when multiple threads write to the same log file, some of the data is corrupt.

      Here is a little program will show the issue.

      I've been pouring over the docs, and the mailing list archives, but I can't find a clear explanation as to why this shouldn't work.

      Any ideas what I'm doing wrong? Thanks, Pete

      ----------------------------------------------------------------------------
      #include <pthread.h>

      #include <iostream>
      #include <sstream>
      #include <vector>

      #include <log4cxx/logger.h>

      const int NUMTHREADS = 100 ;
      const int NUMWRITES = 100 ;

      log4cxx::LoggerPtr thread_logger = log4cxx::Logger::getLogger("ThreadManager");
      using std::cerr;
      using std::endl;

      void *Run( void *ptr ){

      int thread_number = reinterpret_cast<int>(ptr);

      for (int i=0 ; i< NUMWRITES ; i++)

      { std::ostringstream msg; msg << "Thread[" << *thread_number << "], Write [" << ( i + 1 ) << "]"; LOG4CXX_DEBUG(thread_logger, msg.str()); }

      pthread_exit(NULL);

      }

      int
      main(int argc, char *argv[]) {

      std::vector<pthread_t> threads(NUMTHREADS);

      log4cxx::LoggerPtr rootlogger = log4cxx::Logger::getRootLogger();

      int thread_number = 0;
      int *thread_cnt = 0;
      while (thread_number++ < NUMTHREADS)

      { thread_cnt = new int(thread_number); pthread_create(&threads[thread_number -1], NULL, Run, (void *) thread_cnt); }

      thread_number = NUMTHREADS;

      while (--thread_number >= 0)

      { pthread_join(threads[thread_number], NULL); }

      return 0;

      }
      ----------------------------------------------------------------------------

      Here is my log4cxx.properties:

      log4j.debug=true

      1. Loggers
        log4j.rootLogger=DEBUG, rootlog
        log4j.logger.ThreadManager=DEBUG, ThreadManager
      1. Additivity
        log4j.additivity.ThreadManager=false
      1. rootLogger
        log4j.appender.rootlog.File=root.log
        log4j.appender.rootlog.Append=false
        log4j.appender.rootlog=org.apache.log4j.FileAppender
        log4j.appender.rootlog.layout=org.apache.log4j.PatternLayout
        log4j.appender.rootlog.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
      1. ThreadManager
        log4j.appender.ThreadManager=org.apache.log4j.FileAppender
        log4j.appender.ThreadManager.File=ThreadManager.log
        log4j.appender.ThreadManager.Append=false
        log4j.appender.ThreadManager.layout=org.apache.log4j.PatternLayout
        log4j.appender.ThreadManager.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

      On Jun 29, 2007, at 10:53 PM, pete@mu.org wrote:

      Nevermind. This is a known bug.

      http://issues.apache.org/jira/browse/LOGCXX-129

      I should of just stayed with 97 which was working fine.

      Thanks,

      Pete

      On Jun 29, 2007, at 11:26 PM, Curt Arnold wrote:

      LOGCXX-129 only reported problems with AsyncAppender which was badly flawed in log4j 1.2.13 and earlier and deadlock reports in log4j started getting more frequent. The log4cxx AsyncAppender was pretty much a straight port of the early log4j equivalent and would be expected to have the same problems under stress. The new log4j AsyncAppender should be ported over to log4cxx and that is one of the remaining issues on a push to release.

      However, your test case does not appear to involve AsyncAppender and the anticipated resolution for LOGCXX-129 would have no impact. If you have any stack traces or other information to help localize the problem, they would be appreciated. Otherwise, it will be my top issue to investigate when I shift to log4cxx development tomorrow.

      On Jun 29, 2007, at 11:37 PM, pete@mu.org wrote:

      If you just build the small program that I sent and link it to the subversion log4cxx, then check out the log files it creates. They're full of junk along with valid lines too.

      It does it everytime. I'm working on a threaded project that uses 100's
      of threads, and it was always working great with log4cxx, then I
      upgraded to subversion source, and it started leaving garbage in the
      logs.

      Tonight, I finally decided to go back and reinstall the standard
      log4cxx-9 libs, and link with them. It completely fixed my problem.

      Have you tried to run that program I sent you and link it against the
      newest subversion log4cxx?

      Please try and and let me know what you find in the log files.

      That's the best example you could have of the problem.

      Thanks,

      Pete

      On Jun 30, 2007, at 12:32 AM, Curt Arnold wrote:

      I appreciate the test code, but have not had a chance to attempt to build and run it. I can't rapidly switch projects without endlessly churning so when I'm active on log4j development, I usually only answer questions or give hints until I finish whatever I'm currently working on. I expect to do so tomorrow morning (US Central Time, UTC-5). If you have any other insights in the interim please post them. I do appreciate the specifics on platform and compiler, that helps a huge amount.

      What indicates to you that the problem is in FileAppender and not in PatternLayout? I would think that PatternLayout would be a might more likely source of problems. Have you tried other appenders (telnet might be reasonable) and not had the problem? Have you tried a custom layout (all the stock layouts are preconfigured PatternLayout's)? Does the same issue still appear if you simplify the pattern (maybe down to just %m or %m%n)?

      On Jun 30, 2007, at 8:32 AM, pete@mu.org wrote:
      I'm still very new to log4cxx internals and all the options. I just
      learned enough to get it to do what I needed---back with log4cxx9.

      So, I'm afraid I don't have any insight at all into what might be
      causing the race condition. And I haven't used any other appenders or
      layouts.

      I tried to just put %m%n in the pattern, per your request, and I've attached the results of the root log so you can see what I'm talking about. As you'll see
      there are lots of valid lines, then just junk, then more good lines.
      See attached.

      I've also added my log4cxx.properties. If you want to make any changes
      and define some different appenders or whatever, and mail it back, I can run it for you and send you the results.

      Please let me know if there is anything else I can do.

      Thanks,

      Pete

      On Jul 1, 2007, at 1:34 AM, Curt Arnold wrote:

      I have been unable to reproduce the problem on Ubuntu Linux 6.06-1 and gcc 4.0.3 running under VMWare Fusion on Mac OS/X either in single processor or double processor mode. That is the glorious thing about race conditions, the slightly little change might hide them. Thanks for the output when you were seeing corruption, it at least suggests that the problem isn't in the layout since all the bogus characters appear between perfectly formatted messages. That suggests that the problem is in the FileAppender or probably more likely in the CharsetEncoder or CharsetDecoder.

      I won't be able to work on it until Monday, however it would be helpful to know if setting LOG4CXX_LOCALE_ENCODING_UTF8=1 (which will replace the APR encoders/decoders with straight byte copies) or the equivalent manual hack to src/charsetencoder.cpp and src/charsetdecoder.cpp changes the behavior.

      I've also run the helgrind tool from valgrind 2.2.0 and generated a report of possible race conditions in the executed code, but I haven't had time to review them yet.

      On Jul 1, 2007, at 5:49 AM, pete@mu.org wrote:

      Hmm., that's really odd. I can reproduce it on my two boxes with pent4
      with hyperthreading enabled, but I can't on my weenie single lapetop cpu.

      I'll try this and get back to you.

      LOG4CXX_LOCALE_ENCODING_UTF8=1

      Thanks,

      Pete

      Attachments

        1. helgrind.pid21835
          91 kB
          Curt Arnold
        2. root.log
          469 kB
          Curt Arnold

        Issue Links

          Activity

            People

              carnold@apache.org Curt Arnold
              carnold@apache.org Curt Arnold
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: