Log4cxx
  1. Log4cxx
  2. LOGCXX-194

Garbage in log files when appenders are defined in multiple levels of the logger hierarchy

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Appender
    • Labels:
      None

      Description

      When appenders are defined in multiple levels of the logger hierarchy, corruption of the log output occurs.

      To illustrate, create a log4cxx.properties file wih the following loggers defined:

      log4j.rootLogger=debug, apndr1
      log4j.logger.subLogger=debug, apndr2

      'apndr1' and 'apndr2' can be any type of appender, although for the purposes of investigating this bug, FileAppenders and RollingFileAppenders were used with PatternLayout and XMLLayout layouts.

      Once configured, using multiple threads to create a heavy load on the logging system, send log events to the 'subLogger' logger.

      Some of the data in the resulting logging output will include garbage.

        Issue Links

          Activity

          Hide
          Myles Bunbury added a comment -

          This issue may or may not be the same as LOGCXX-186.

          The property file posted in LOGCXX-186 fulfills the known requirements to exhibit this bug; however, discussion on the cause of that problem has been leaning towards a initialization issue.

          Show
          Myles Bunbury added a comment - This issue may or may not be the same as LOGCXX-186 . The property file posted in LOGCXX-186 fulfills the known requirements to exhibit this bug; however, discussion on the cause of that problem has been leaning towards a initialization issue.
          Hide
          Myles Bunbury added a comment -

          The following is background information on this bug as posted to the log4cxx user mailing list:

          > ----Original Message----
          > From: Curt Arnold carnold@apache.org
          > Sent: Friday, September 28, 2007 6:11 PM
          > To: Log4CXX User
          > Subject: Re: Log4cxx wilting under heavy load
          >
          >
          > On Sep 28, 2007, at 4:27 PM, BUNBURY Myles wrote:
          >
          > > Log4cxx seems to wilt when under heavy load.
          > >
          > > With the syslog appender enabled, the application cores when under
          > > heavy load.
          > >
          > > When using a RollingFileAppender with an XMLLayout layout, my
          > > output file gets garbled:
          > >
          > > <<<snip>>>
          > >
          > > <log4j:event logger="Gateway.AuthenticationServices/
          > > ANS_AuthenticationService.cpp" timestamp="1191005220721"
          > > level="INFO" thread="0x61eb3b90">
          > > <log4j:message><![CDATA[No username/password token or SAML
          > > assertion to process.]]></log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.MediationPipeline/
          > > MED_PipelineMessage.cpp" timestamp="1191005220721" level="INFO"
          > > thread="0x61eb3b90">
          > > <log4j:message><![CDATA[No session policy set for user/group .]]></
          > > log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.MediationPipeline/
          > > MED_PipelineMessage.cpp" timestamp="1191005220721" level="DEBUG"
          > > thread="0x61eb3b90">
          > > <log4j:message><![CDATA[Using session #2 with session policy
          > > DefaultPolicy.]]></log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp"
          > > timestamp="1191005220729" level="WARN" thread="0x5ecaeb90">
          > > <log4j:message><![CDATA[Concurrent session limit reached.
          > > Transaction is rejected.]]></log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp"
          > > timestamp="1191005220729" level="ERROR" thread="0x5ecaeb90">
          > > <log4j:message><![CDATA[User session rejected - have exceeded the
          > > maximum number of concurrent sessions based on the current license
          > > (0 concurrent sessions allowed)]]></log4j:message>
          > > </log4j:event>
          > > ¦« " ´T#
          > > ýÍÊ^˜ÌÊ^Ä °Ñ"
          > > ôÌÊ^µ ‡- <T#
          > > ÜÑ"
          > > XÍÊ^ <T#
          > > ÜÑ"
          > > À4‡- dÐÊ^ÍÊ^§c ´T#
          > > ð‡- °Ñ"
          > > @ÍÊ^1Ö†- T#
          > > ¸Ñ"
          > > XÍÊ^ ôO ÔÑ"
          > > |ó³ hÍÊ^
          > > Þ$z XÍÊ^p÷³ xÍÊ^/&L4#
          > > z ¸ÍÊ^Z;'°Ñ"
          > > ¸ÍÊ^ü(H4#
          > > ðÍÊ^¸ÍÊ^13 Ì€ðÍÊ^ðÍÊ^
          > >
          > > Þ$z ÏÊ^ÄydÐÊ^ðÍÊ^ ðÍÊ^
          > > ÏÊ^PÏÊ^XÏÊ^h - ‹-
          > > 1191'À‹- 20729&ôï˜- €¼ÏÊ^(ÎÊ^®Ý‹- €È
          > > €—ÎÊ^H€¼ÏÊ^H €@ € ôï˜- 5 À
          > > € €À€ ¼ÏÊ^xÎÊ^ €
          > > Ðõ‹- - ‹- ôO À€ ˆÎÊ^@ €‹‚& 
          > > - ‹- X~—- ¨ÎÊ^5
          > > ´ dÐÊ^Þ$¸ÎÊ^ H& çÎÊ^ÏÊ^¥!øÎÊ°#
          > > ÏÊ^§c %
          > > ôï˜- èÎÊ^ðÆ0ÐÊ^'ÏÊ^øÎÊ^ÙÃ'ÏÊ^'ÏÊ^
          > > (ÏÊ^žÇ'ÏÊ^'ÏÊ^SÏÊ^ͦ!\sA
          > > dÐÊ^(ÏÊ^%LRÏÊ^´ÏÊ^XÏÊ^ÖÇRÏÊ^SÏÊ^RÏÊ^¶0 \sA
          > > kÏÊ^XÏÊ^¶&$HsA
          > > dÐÊ^ÈÏÊ^¦«0ÐÊ^Þ
          > > $xÏÊ^íÃHÐÊ^ÿÿÿÿˆÏÊ^ÄHÐÊ^l4¸ÏÊ^6ÉHÐÊ^
          > > ó ôO l4z ÈÏÊ^e4 HÐÊ^dÐÊ^ÈÏÊ^uÉHÐÊ^Þ
          > > $ˆÐÊ^@š!HÐÊ^l4 HÒÊ^ÑÊ^ÿÿÿÿ'â³ Z;'°Ñ"
          > > j 8ÐÊ^ü(0x5ecaeb90Ê^13 4€pÐÊ^pÐÊ^ Þ$z
          > > HÐÊ^Í&ä6%
          > > 4ÐÊ^4ÐÊ^ ˆÐÊ^
          > > *à6%
          > > LÐÊ^LÐÊ^ Üí#
          > > Ì€˜ÐÊ^<log4j:event logger="Gateway.SessionInfra/
          > > SI_SessionManager.cpp" timestamp="1191005220729" level="WARN"
          > > thread="0x600b0b90">
          > > <log4j:message><![CDATA[Concurrent session limit reached.
          > > Transaction is rejected.]]></log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.MediationPipeline/
          > > MED_PipelineMessage.cpp" timestamp="1191005220729" level="INFO"
          > > thread="0x5ecaeb90">
          > > <log4j:message><![CDATA[Auth'ed user hitting dend. Transaction is
          > > rejected.]]></log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.MediationPipeline/
          > > MED_PipelineMessage.cpp" timestamp="1191005220729" level="INFO"
          > > thread="0x5ecaeb90">
          > > <log4j:message><![CDATA[Auth'ed user hitting deny-all ws]]></
          > > log4j:message>
          > > </log4j:event>
          > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp"
          > > timestamp="1191005220729" level="ERROR" thread="0x600b0b90">
          > > <log4j:message><![CDATA[User session rejected - have exceeded the
          > > maximum number of concurrent sessions based on the current license
          > > (0 concurrent sessions allowed)]]></log4j:message>
          > > </log4j:event>
          > >
          > > <<<snip>>>
          > >
          > > This output was generated with ~150 threads throwing out error
          > > messages in rapid succession.
          > >
          > > I have not noted these problems when treading lightly on the system.
          > >
          > > At this point I'm suspicious that log4cxx is overflowing some sort
          > > of output buffer that's fills up due to the volume. I don't
          > believe
          > > it is a multithreading issue as I serialized the logging calls in
          > > my code with a mutex and did not note any change.
          > >
          > > Has anyone else noted something similiar?
          > >
          > > How is log4cxx supposed to behave when the logging volume exceeds
          > > its capacity to output those events?
          > >
          >
          > There was a similar symptoms reported in https://issues.apache.org/
          > jira/browse/LOGCXX-186. That ended up being due to attempting to
          > configure log4cxx simultaneously on multiple threads. I assume that
          > is not the case with you, but could you confirm that the problem
          > persists if you configure and attempt a logging request prior to
          > launching threads (assuming that your app is the one launching the
          > threads).
          >
          > If that is not the case, then the previous bug report might be
          > instructive to try to identify and fix the problem. The problems
          > might be related, but perhaps not. I would focus on trying to
          > identify the one that is most significant to you and then see if any
          > resolution fixed the other.
          >
          > I would suggest:
          >
          > 1. Make sure that you are using the most current SVN HEAD.
          >
          > 2. Making sure that configuration and some initial log
          > requests occur
          > before threads are launched. (previously mentioned).
          >
          > 3. Wrap the failing appender in the soon to be committed rewrite of
          > AsyncAppender (if not in the next hour or so, at least by mid-day
          > tomorrow). AsyncAppender has been badly flawed. I'm just about the
          > commit a port of the log4j 1.2.14 AsyncAppender (which replaced an
          > earlier log4j implementation that was badly flawed too).
          > That should
          > have the affect of bringing all appends on the underlying appender
          > down to a common thread. Plus it would be a great stress
          > test of all
          > my new code (thanks for volunteering for testing).
          >
          > 4. Simplify, simplify. There are a couple of places that corruption
          > may be occurring. It could be an Appender problem, a Layout
          > problem,
          > a CharsetEncoder problem, a CharsetDecoder problem or some
          > combination. Trying to isolate the problem makes resolution a lot
          > easier.
          > a) Setting LOG4CXX_LOGCHAR_IS_UTF8 would replace an conversion
          > between the local code page and UTF-8 with a straight copy.
          > Eliminating a lot of complexity and potential source of problems.
          > b) Replace the layout (if there is one) with something
          > simpler.
          > First try simplifying the pattern as much as possible like %m%n.
          > c) See if the problem occurs with the simpler appenders
          > (FileAppender instead of RollingFileAppender) or other unrelated
          > appenders.
          >
          > 5. Attempt to recreate the problem in a sample app that you can
          > attach to a bug report. Please specify all specifics of the
          > configurations (operating system, compiler, type and number of
          > processors, etc) that have been tested and whether they fail
          > consistently, have not been observed to fail or somewhere in between.
          >
          > 6. If you are running on a x86 Unix, valgrind could be useful to
          > identify potential problems. I used the helgrind tool in earlier
          > versions of valgrind (its been broken for a while, so had to
          > build it
          > from source) to identify potential race conditions. Other tools
          > might be useful to identify the problem.

          Firstly, the relevant portions of the log4cxx.properties file might be a useful reference:


          log4j.appender.file=org.apache.log4j.RollingFileAppender
          log4j.appender.file.File=$

          {WSG_HOME}/grizzlyGW.log
          log4j.appender.file.MaxFileSize=10000KB
          log4j.appender.file.MaxBackupIndex=10
          log4j.appender.file.layout=org.apache.log4j.PatternLayout
          log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %5p [%c] (%F:%L) %t: %m%n

          log4j.appender.XMLFile=org.apache.log4j.RollingFileAppender
          log4j.appender.XMLFile.File=${WSG_HOME}

          /grizzlyGW.xml
          log4j.appender.XMLFile.MaxFileSize=10000KB
          log4j.appender.XMLFile.MaxBackupIndex=10
          log4j.appender.XMLFile.layout=org.apache.log4j.xml.XMLLayout

          log4j.appender.file2=org.apache.log4j.FileAppender
          log4j.appender.file2.File=$

          {WSG_HOME}/grizzlyGW2.log
          log4j.appender.file2.layout=org.apache.log4j.PatternLayout
          log4j.appender.file2.layout.ConversionPattern=%m%n

          log4j.appender.XMLFileGateway=org.apache.log4j.RollingFileAppender
          log4j.appender.XMLFileGateway.File=${WSG_HOME}

          /grizzlyGW2.xml
          log4j.appender.XMLFileGateway.MaxFileSize=10000KB
          log4j.appender.XMLFileGateway.MaxBackupIndex=10
          log4j.appender.XMLFileGateway.layout=org.apache.log4j.xml.XMLLayout

          log4j.rootLogger=off, XMLFile
          log4j.logger.Gateway=warn, file2

          The following were all done without syncronizing log messages as I mentioned trying before. The log4cxx code was based on revision #572744, with the modifications I mentioned in LOGCXX-191 (pre-"official" fix). As the modifications relate to the syslog appender, and are relatively minor, I doubt that they are influencing the results outlined below since syslog is not involved in my tests.

          I tried forcing a log message in the same thread that configured the logging system, immediately after the configuration was done and before any threads started pounding away. I did not note any change in behaviour, so I went back to what I had before.

          I simplified things down to a FileAppender appender with a PatternLayout layout set to '%m%n':
          log4j.rootLogger=off
          log4j.logger.Gateway=debug, file2

          The corrupt output was not visible in the log.

          I readded the XMLFile appender to the root logger:
          log4j.rootLogger=off, XMLFile
          log4j.logger.Gateway=debug, file2

          The corrupt output was visible in both the XML and plaintext logs.

          I removed the simplified plaintext appender:
          log4j.rootLogger=off, XMLFile
          log4j.logger.Gateway=debug

          The corrupt output was not visible in the log.

          I removed the simplified plaintext appender:
          log4j.rootLogger=off, XMLFile
          log4j.logger.Gateway=debug, XMLFileGateway

          The corrupt output was visible in both XML logs.

          At this point, I'm starting to think it's got something to do with either having multiple appenders on the go, or simply something bad in the XML layout that garbles things for subsequent appenders.

          I tried two plaintext appenders. I reduced the 'file' appender to a simple FileAppender with a pattern layout of '%m%n':
          log4j.rootLogger=off, file
          log4j.logger.Gateway=debug, file2

          The corrupt output was visible in both plaintext logs.

          At this point, it seems to me that the verdict is in: there's a bug that corrupts data when multiple appenders (regardless of type) are present.

          For amusement, I tried combining appenders on a single logger:
          log4j.rootLogger=off, file, file2
          log4j.logger.Gateway=debug

          The corrupt output was not visible in the logs.

          For more amusement, I tried combining appenders on a single logger other than the root logger:
          log4j.rootLogger=off
          log4j.logger.Gateway=debug, file, file2

          The corrupt output was not visible in the logs.

          Since I'm guessing the 'OFF' level isn't used all that commonly, I tried enabling the root logger with the appenders on separate loggers:
          log4j.rootLogger=info, file
          log4j.logger.Gateway=debug, file2

          The corrupt output was visible in both plaintext logs.

          These extra tests confim my ealier verdict as to the cause.

          I will open a bug based on this information. I will post the bug number to the user group when I have it. Further information on this problem will be logged there.

          Next steps:
          -Verify that the problem still exists in a clean copy of SVN-HEAD. (I expect it does.)
          -Determine whether the core dumps I experienced when using the syslog appender are related to this same issue.

          Show
          Myles Bunbury added a comment - The following is background information on this bug as posted to the log4cxx user mailing list: — > ---- Original Message ---- > From: Curt Arnold carnold@apache.org > Sent: Friday, September 28, 2007 6:11 PM > To: Log4CXX User > Subject: Re: Log4cxx wilting under heavy load > > > On Sep 28, 2007, at 4:27 PM, BUNBURY Myles wrote: > > > Log4cxx seems to wilt when under heavy load. > > > > With the syslog appender enabled, the application cores when under > > heavy load. > > > > When using a RollingFileAppender with an XMLLayout layout, my > > output file gets garbled: > > > > <<<snip>>> > > > > <log4j:event logger="Gateway.AuthenticationServices/ > > ANS_AuthenticationService.cpp" timestamp="1191005220721" > > level="INFO" thread="0x61eb3b90"> > > <log4j:message><![CDATA[No username/password token or SAML > > assertion to process.]]></log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.MediationPipeline/ > > MED_PipelineMessage.cpp" timestamp="1191005220721" level="INFO" > > thread="0x61eb3b90"> > > <log4j:message><![CDATA [No session policy set for user/group .] ]></ > > log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.MediationPipeline/ > > MED_PipelineMessage.cpp" timestamp="1191005220721" level="DEBUG" > > thread="0x61eb3b90"> > > <log4j:message><![CDATA[Using session #2 with session policy > > DefaultPolicy.]]></log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp" > > timestamp="1191005220729" level="WARN" thread="0x5ecaeb90"> > > <log4j:message><![CDATA[Concurrent session limit reached. > > Transaction is rejected.]]></log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp" > > timestamp="1191005220729" level="ERROR" thread="0x5ecaeb90"> > > <log4j:message><![CDATA[User session rejected - have exceeded the > > maximum number of concurrent sessions based on the current license > > (0 concurrent sessions allowed)]]></log4j:message> > > </log4j:event> > > ¦« " ´T# > > ýÍÊ^˜ÌÊ^Ä °Ñ" > > ôÌÊ^µ ‡- <T# > > ÜÑ" > > XÍÊ^ <T# > > ÜÑ" > > À4‡- dÐÊ^ÍÊ^§c ´T# > > ð‡- °Ñ" > > @ÍÊ^1Ö†- T# > > ¸Ñ" > > XÍÊ^ ôO ÔÑ" > > |ó³ hÍÊ^ > > Þ$z XÍÊ^p÷³ xÍÊ^/&L4# > > z ¸ÍÊ^Z;'°Ñ" > > ¸ÍÊ^ü(H4# > > ðÍÊ^¸ÍÊ^13 Ì€ðÍÊ^ðÍÊ^ > > > > Þ$z ÏÊ^ÄydÐÊ^ðÍÊ^ ðÍÊ^ > > ÏÊ^PÏÊ^XÏÊ^h - ‹- > > 1191'À‹- 20729&ôï˜- €¼ÏÊ^(ÎÊ^®Ý‹- €È > > €—ÎÊ^H€¼ÏÊ^H €@ € ôï˜- 5 À > > € €À€ ¼ÏÊ^xÎÊ^ € > > Ðõ‹- - ‹- ôO À€ ˆÎÊ^@ €‹‚&  > > - ‹- X~—- ¨ÎÊ^5 > > ´ dÐÊ^Þ$¸ÎÊ^ H& çÎÊ^ÏÊ^ ¥!øÎÊ °# > > ÏÊ^§c % > > ôï˜- èÎÊ^ðÆ0ÐÊ^'ÏÊ^øÎÊ^ÙÃ'ÏÊ^'ÏÊ^ > > (ÏÊ^žÇ'ÏÊ^'ÏÊ^SÏÊ^ͦ!\sA > > dÐÊ^(ÏÊ^%LRÏÊ^´ÏÊ^XÏÊ^ÖÇRÏÊ^SÏÊ^RÏÊ^¶0 \sA > > kÏÊ^XÏÊ^¶&$HsA > > dÐÊ^ÈÏÊ^¦«0ÐÊ^Þ > > $xÏÊ^íÃHÐÊ^ÿÿÿÿˆÏÊ^ÄHÐÊ^l4¸ÏÊ^6ÉHÐÊ^ > > ó ôO l4z ÈÏÊ^e4 HÐÊ^dÐÊ^ÈÏÊ^uÉHÐÊ^Þ > > $ˆÐÊ^@š!HÐÊ^l4 HÒÊ^ÑÊ^ÿÿÿÿ'â³ Z;'°Ñ" > > j 8ÐÊ^ü(0x5ecaeb90Ê^13 4€pÐÊ^pÐÊ^ Þ$z > > HÐÊ^Í&ä6% > > 4ÐÊ^4ÐÊ^ ˆÐÊ^ > > *à6% > > LÐÊ^LÐÊ^ Üí# > > Ì€˜ÐÊ^<log4j:event logger="Gateway.SessionInfra/ > > SI_SessionManager.cpp" timestamp="1191005220729" level="WARN" > > thread="0x600b0b90"> > > <log4j:message><![CDATA[Concurrent session limit reached. > > Transaction is rejected.]]></log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.MediationPipeline/ > > MED_PipelineMessage.cpp" timestamp="1191005220729" level="INFO" > > thread="0x5ecaeb90"> > > <log4j:message><![CDATA[Auth'ed user hitting dend. Transaction is > > rejected.]]></log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.MediationPipeline/ > > MED_PipelineMessage.cpp" timestamp="1191005220729" level="INFO" > > thread="0x5ecaeb90"> > > <log4j:message><![CDATA [Auth'ed user hitting deny-all ws] ]></ > > log4j:message> > > </log4j:event> > > <log4j:event logger="Gateway.SessionInfra/SI_SessionManager.cpp" > > timestamp="1191005220729" level="ERROR" thread="0x600b0b90"> > > <log4j:message><![CDATA[User session rejected - have exceeded the > > maximum number of concurrent sessions based on the current license > > (0 concurrent sessions allowed)]]></log4j:message> > > </log4j:event> > > > > <<<snip>>> > > > > This output was generated with ~150 threads throwing out error > > messages in rapid succession. > > > > I have not noted these problems when treading lightly on the system. > > > > At this point I'm suspicious that log4cxx is overflowing some sort > > of output buffer that's fills up due to the volume. I don't > believe > > it is a multithreading issue as I serialized the logging calls in > > my code with a mutex and did not note any change. > > > > Has anyone else noted something similiar? > > > > How is log4cxx supposed to behave when the logging volume exceeds > > its capacity to output those events? > > > > There was a similar symptoms reported in https://issues.apache.org/ > jira/browse/ LOGCXX-186 . That ended up being due to attempting to > configure log4cxx simultaneously on multiple threads. I assume that > is not the case with you, but could you confirm that the problem > persists if you configure and attempt a logging request prior to > launching threads (assuming that your app is the one launching the > threads). > > If that is not the case, then the previous bug report might be > instructive to try to identify and fix the problem. The problems > might be related, but perhaps not. I would focus on trying to > identify the one that is most significant to you and then see if any > resolution fixed the other. > > I would suggest: > > 1. Make sure that you are using the most current SVN HEAD. > > 2. Making sure that configuration and some initial log > requests occur > before threads are launched. (previously mentioned). > > 3. Wrap the failing appender in the soon to be committed rewrite of > AsyncAppender (if not in the next hour or so, at least by mid-day > tomorrow). AsyncAppender has been badly flawed. I'm just about the > commit a port of the log4j 1.2.14 AsyncAppender (which replaced an > earlier log4j implementation that was badly flawed too). > That should > have the affect of bringing all appends on the underlying appender > down to a common thread. Plus it would be a great stress > test of all > my new code (thanks for volunteering for testing). > > 4. Simplify, simplify. There are a couple of places that corruption > may be occurring. It could be an Appender problem, a Layout > problem, > a CharsetEncoder problem, a CharsetDecoder problem or some > combination. Trying to isolate the problem makes resolution a lot > easier. > a) Setting LOG4CXX_LOGCHAR_IS_UTF8 would replace an conversion > between the local code page and UTF-8 with a straight copy. > Eliminating a lot of complexity and potential source of problems. > b) Replace the layout (if there is one) with something > simpler. > First try simplifying the pattern as much as possible like %m%n. > c) See if the problem occurs with the simpler appenders > (FileAppender instead of RollingFileAppender) or other unrelated > appenders. > > 5. Attempt to recreate the problem in a sample app that you can > attach to a bug report. Please specify all specifics of the > configurations (operating system, compiler, type and number of > processors, etc) that have been tested and whether they fail > consistently, have not been observed to fail or somewhere in between. > > 6. If you are running on a x86 Unix, valgrind could be useful to > identify potential problems. I used the helgrind tool in earlier > versions of valgrind (its been broken for a while, so had to > build it > from source) to identify potential race conditions. Other tools > might be useful to identify the problem. Firstly, the relevant portions of the log4cxx.properties file might be a useful reference: — log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.File=$ {WSG_HOME}/grizzlyGW.log log4j.appender.file.MaxFileSize=10000KB log4j.appender.file.MaxBackupIndex=10 log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %5p [%c] (%F:%L) %t: %m%n log4j.appender.XMLFile=org.apache.log4j.RollingFileAppender log4j.appender.XMLFile.File=${WSG_HOME} /grizzlyGW.xml log4j.appender.XMLFile.MaxFileSize=10000KB log4j.appender.XMLFile.MaxBackupIndex=10 log4j.appender.XMLFile.layout=org.apache.log4j.xml.XMLLayout log4j.appender.file2=org.apache.log4j.FileAppender log4j.appender.file2.File=$ {WSG_HOME}/grizzlyGW2.log log4j.appender.file2.layout=org.apache.log4j.PatternLayout log4j.appender.file2.layout.ConversionPattern=%m%n log4j.appender.XMLFileGateway=org.apache.log4j.RollingFileAppender log4j.appender.XMLFileGateway.File=${WSG_HOME} /grizzlyGW2.xml log4j.appender.XMLFileGateway.MaxFileSize=10000KB log4j.appender.XMLFileGateway.MaxBackupIndex=10 log4j.appender.XMLFileGateway.layout=org.apache.log4j.xml.XMLLayout log4j.rootLogger=off, XMLFile log4j.logger.Gateway=warn, file2 — The following were all done without syncronizing log messages as I mentioned trying before. The log4cxx code was based on revision #572744, with the modifications I mentioned in LOGCXX-191 (pre-"official" fix). As the modifications relate to the syslog appender, and are relatively minor, I doubt that they are influencing the results outlined below since syslog is not involved in my tests. I tried forcing a log message in the same thread that configured the logging system, immediately after the configuration was done and before any threads started pounding away. I did not note any change in behaviour, so I went back to what I had before. I simplified things down to a FileAppender appender with a PatternLayout layout set to '%m%n': log4j.rootLogger=off log4j.logger.Gateway=debug, file2 The corrupt output was not visible in the log. I readded the XMLFile appender to the root logger: log4j.rootLogger=off, XMLFile log4j.logger.Gateway=debug, file2 The corrupt output was visible in both the XML and plaintext logs. I removed the simplified plaintext appender: log4j.rootLogger=off, XMLFile log4j.logger.Gateway=debug The corrupt output was not visible in the log. I removed the simplified plaintext appender: log4j.rootLogger=off, XMLFile log4j.logger.Gateway=debug, XMLFileGateway The corrupt output was visible in both XML logs. At this point, I'm starting to think it's got something to do with either having multiple appenders on the go, or simply something bad in the XML layout that garbles things for subsequent appenders. I tried two plaintext appenders. I reduced the 'file' appender to a simple FileAppender with a pattern layout of '%m%n': log4j.rootLogger=off, file log4j.logger.Gateway=debug, file2 The corrupt output was visible in both plaintext logs. At this point, it seems to me that the verdict is in: there's a bug that corrupts data when multiple appenders (regardless of type) are present. For amusement, I tried combining appenders on a single logger: log4j.rootLogger=off, file, file2 log4j.logger.Gateway=debug The corrupt output was not visible in the logs. For more amusement, I tried combining appenders on a single logger other than the root logger: log4j.rootLogger=off log4j.logger.Gateway=debug, file, file2 The corrupt output was not visible in the logs. Since I'm guessing the 'OFF' level isn't used all that commonly, I tried enabling the root logger with the appenders on separate loggers: log4j.rootLogger=info, file log4j.logger.Gateway=debug, file2 The corrupt output was visible in both plaintext logs. These extra tests confim my ealier verdict as to the cause. I will open a bug based on this information. I will post the bug number to the user group when I have it. Further information on this problem will be logged there. Next steps: -Verify that the problem still exists in a clean copy of SVN-HEAD. (I expect it does.) -Determine whether the core dumps I experienced when using the syslog appender are related to this same issue.
          Hide
          Curt Arnold added a comment -

          What platforms and compilers have you tried? If it was a platform-specific issue, I'd hate to stress a Unix implementation if it is Windows specific, or vice-versa. Likewise, if you are using a older libc, then testing on a later libc might be a waste of time. Or 32-bit vs 64-bit.

          Have you observed the problem on both single and multiple processor systems, or just one or the other?

          Am I reading the problem correctly in that the problem is not observed unless the load is "heavy"?

          Show
          Curt Arnold added a comment - What platforms and compilers have you tried? If it was a platform-specific issue, I'd hate to stress a Unix implementation if it is Windows specific, or vice-versa. Likewise, if you are using a older libc, then testing on a later libc might be a waste of time. Or 32-bit vs 64-bit. Have you observed the problem on both single and multiple processor systems, or just one or the other? Am I reading the problem correctly in that the problem is not observed unless the load is "heavy"?
          Hide
          Myles Bunbury added a comment -

          Confirmed that when the syslog appender is involved, rather than just outputting garbage like the file appenders do, the application crashes.

          Priority has therefore been changed from 'major' to 'critical' to reflect this increased impact.

          Show
          Myles Bunbury added a comment - Confirmed that when the syslog appender is involved, rather than just outputting garbage like the file appenders do, the application crashes. Priority has therefore been changed from 'major' to 'critical' to reflect this increased impact.
          Hide
          Myles Bunbury added a comment -

          The environment I've been running these tests in includes:
          -Redhat Linux 5
          -gcc v4.1.1 20070105 (Red Hat 4.1.1-52)
          -32-bit
          -libc v2.5
          -Linux kernel v2.6.18-8.1.8
          -2 dual core, non-hyperthreaded Intel processors

          I don't currently have alternate environments to test this issue in.

          Thusfar I have only noted the problem when the load is "heavy". That said, when I initially started investigating this issue, my first guess was a multithreading issue. To examine that theory, I wrapped the various logging macros in logger.h in a mutex, thereby serializing all the logging calls. I did not note any change in behaviour (ie. it still produced garbage & crashes). Although I haven't repeated this earlier experiment yet in light of the new tidbits of information I have uncovered since, that fact that the problem still occured when logging was serialized indicates to me at least that "heavy" traffic merely exposes the problem faster and/or with higher probability, rather than being a prerequisite.

          Show
          Myles Bunbury added a comment - The environment I've been running these tests in includes: -Redhat Linux 5 -gcc v4.1.1 20070105 (Red Hat 4.1.1-52) -32-bit -libc v2.5 -Linux kernel v2.6.18-8.1.8 -2 dual core, non-hyperthreaded Intel processors I don't currently have alternate environments to test this issue in. Thusfar I have only noted the problem when the load is "heavy". That said, when I initially started investigating this issue, my first guess was a multithreading issue. To examine that theory, I wrapped the various logging macros in logger.h in a mutex, thereby serializing all the logging calls. I did not note any change in behaviour (ie. it still produced garbage & crashes). Although I haven't repeated this earlier experiment yet in light of the new tidbits of information I have uncovered since, that fact that the problem still occured when logging was serialized indicates to me at least that "heavy" traffic merely exposes the problem faster and/or with higher probability, rather than being a prerequisite.
          Hide
          Curt Arnold added a comment -

          Did you do the test bypassing character encoding and decoding by setting LOG4CXX_LOCALE_ENCODING_UTF8 = 1 in log4cxx.h as suggested? That would bypass use of iconv to convert to and from the locale encoding to the internal UTF-8 encoding and replace it with a straight memcpy. If it disappears with LOG4CXX_LOCALE_ENCODING_UTF8 set then something in the log4cxx/apr-util/iconv stack is potentially the problem. If it doesn't, then we can rule out charset encoding/decoding as the problem.

          Any chance you could attach a sample app that exhibits the problem on your system?

          Show
          Curt Arnold added a comment - Did you do the test bypassing character encoding and decoding by setting LOG4CXX_LOCALE_ENCODING_UTF8 = 1 in log4cxx.h as suggested? That would bypass use of iconv to convert to and from the locale encoding to the internal UTF-8 encoding and replace it with a straight memcpy. If it disappears with LOG4CXX_LOCALE_ENCODING_UTF8 set then something in the log4cxx/apr-util/iconv stack is potentially the problem. If it doesn't, then we can rule out charset encoding/decoding as the problem. Any chance you could attach a sample app that exhibits the problem on your system?
          Hide
          Myles Bunbury added a comment -

          I have not yet tried LOG4CXX_LOCALE_ENCODING_UTF8 = 1.

          I do not currently have a sample app to upload.

          Show
          Myles Bunbury added a comment - I have not yet tried LOG4CXX_LOCALE_ENCODING_UTF8 = 1. I do not currently have a sample app to upload.
          Hide
          Myles Bunbury added a comment -

          Confirmed problem still exists in a clean SVN-HEAD checkout of revision #581230.

          Show
          Myles Bunbury added a comment - Confirmed problem still exists in a clean SVN-HEAD checkout of revision #581230.
          Hide
          Myles Bunbury added a comment -

          It might also be worth mentioning that as I'm stuck behind a proxy server and haven't yet figured out how to supply ANT with the configuration necessary to get out to the internet, I've manually placed the following files in the ./target/lib directory for building log4cxx:
          -apr-1.2.9.tar.gz
          -apr-util-1.2.8.tar.gz
          -cppunit-1.10.2.tar.gz

          Show
          Myles Bunbury added a comment - It might also be worth mentioning that as I'm stuck behind a proxy server and haven't yet figured out how to supply ANT with the configuration necessary to get out to the internet, I've manually placed the following files in the ./target/lib directory for building log4cxx: -apr-1.2.9.tar.gz -apr-util-1.2.8.tar.gz -cppunit-1.10.2.tar.gz
          Hide
          Myles Bunbury added a comment -

          There's a note in log4cxx.h that it's an autogenerated file, and to edit log4cxx.h.in instead. I added:
          #define LOG4CXX_LOCALE_ENCODING_UTF8 1
          to the .h.in file instead and recompiled. Curiously, log4cxx.h was not regenerated.

          So instead, I did the following.

          In ./src/main/include/log4cxx/log4cxx.hw and ./src/main/include/log4cxx/private/log4cxx_private.h.in I changed:
          #define LOG4CXX_LOCALE_ENCODING_UTF8 0
          to
          #define LOG4CXX_LOCALE_ENCODING_UTF8 1

          In ./src/main/include/log4cxx/log4cxx.h.in, ./src/main/include/log4cxx/log4cxx.h, ./src/main/include/log4cxx/private/log4cxx_private.h, and ./src/main/include/log4cxx/private/log4cxx_private.h.in I added:
          #define LOG4CXX_LOCALE_ENCODING_UTF8 1

          I note after the fact that if I manually deleted log4cxx.h, it did get regenerated with the content of log4cxx.h.in.

          Anyway, with LOG4CXX_LOCALE_ENCODING_UTF8 set to 1, I could NOT reproduce the problem.

          What exactly is the impact of setting LOG4CXX_LOCALE_ENCODING_UTF8 to 1? What are the reasons why one wouldn't want to just leave it set at 1?

          Show
          Myles Bunbury added a comment - There's a note in log4cxx.h that it's an autogenerated file, and to edit log4cxx.h.in instead. I added: #define LOG4CXX_LOCALE_ENCODING_UTF8 1 to the .h.in file instead and recompiled. Curiously, log4cxx.h was not regenerated. So instead, I did the following. In ./src/main/include/log4cxx/log4cxx.hw and ./src/main/include/log4cxx/private/log4cxx_private.h.in I changed: #define LOG4CXX_LOCALE_ENCODING_UTF8 0 to #define LOG4CXX_LOCALE_ENCODING_UTF8 1 In ./src/main/include/log4cxx/log4cxx.h.in, ./src/main/include/log4cxx/log4cxx.h, ./src/main/include/log4cxx/private/log4cxx_private.h, and ./src/main/include/log4cxx/private/log4cxx_private.h.in I added: #define LOG4CXX_LOCALE_ENCODING_UTF8 1 I note after the fact that if I manually deleted log4cxx.h, it did get regenerated with the content of log4cxx.h.in. Anyway, with LOG4CXX_LOCALE_ENCODING_UTF8 set to 1, I could NOT reproduce the problem. What exactly is the impact of setting LOG4CXX_LOCALE_ENCODING_UTF8 to 1? What are the reasons why one wouldn't want to just leave it set at 1?
          Hide
          Myles Bunbury added a comment - - edited

          CORRECTION

          I looked over the test code that I had put in to serialize logging in one of my early tests. Turns out I missed serializing one of the logging macros, thus rendering the test results suspect.

          After correcting this and double checking the serialization of the rest of the logging macros, I reran the test using a clean copy of SVN-HEAD (revision #581230). This time, I could NOT reproduce the problem, so it's likely a multithreading problem after all.

          Show
          Myles Bunbury added a comment - - edited CORRECTION I looked over the test code that I had put in to serialize logging in one of my early tests. Turns out I missed serializing one of the logging macros, thus rendering the test results suspect. After correcting this and double checking the serialization of the rest of the logging macros, I reran the test using a clean copy of SVN-HEAD (revision #581230). This time, I could NOT reproduce the problem, so it's likely a multithreading problem after all.
          Hide
          Curt Arnold added a comment -

          LOG4CXX_LOCALE_ENCODING_UTF8 declares at compile time that the character type will be UTF-8 (an 8-bit encoding of the Unicode character set). log4cxx uses UTF-8 internally when logchar is char (generally every platform but Windows), so if the external charset is known to match the internal encoding, then calls to iconv to convert charsets can be replaced with plain moves.

          If you type "locale" at a command prompt and LC_CTYPE ends with ".UTF-8" then your default locale uses UTF-8 encoding of character data. If it is "C", then you are using US-ASCII and that can be safely copied to UTF-8. If you know that your deployment platform will be using a UTF-8 encoding, then setting the define should only be beneficial.

          The downsides of setting the defineis if your locale is not UTF-8 or US-ASCII, then non US-ASCII characters will be misinterpreted, typically as invalid UTF-8 sequences. So if you were logging user input and the user typed some accented character, it would not be rendered correctly.

          This the corruption likely the same problem as LOGCXX-175 which I'm researching now.

          Show
          Curt Arnold added a comment - LOG4CXX_LOCALE_ENCODING_UTF8 declares at compile time that the character type will be UTF-8 (an 8-bit encoding of the Unicode character set). log4cxx uses UTF-8 internally when logchar is char (generally every platform but Windows), so if the external charset is known to match the internal encoding, then calls to iconv to convert charsets can be replaced with plain moves. If you type "locale" at a command prompt and LC_CTYPE ends with ".UTF-8" then your default locale uses UTF-8 encoding of character data. If it is "C", then you are using US-ASCII and that can be safely copied to UTF-8. If you know that your deployment platform will be using a UTF-8 encoding, then setting the define should only be beneficial. The downsides of setting the defineis if your locale is not UTF-8 or US-ASCII, then non US-ASCII characters will be misinterpreted, typically as invalid UTF-8 sequences. So if you were logging user input and the user typed some accented character, it would not be rendered correctly. This the corruption likely the same problem as LOGCXX-175 which I'm researching now.
          Hide
          Curt Arnold added a comment -

          Believe this is a dup of LOGCXX-175. Would appreciate your testing if the modified charsetencodertestcase.cpp fails with the old charsetencoder.cpp on your platform. If you have been running unit tests during your build, the simplest way would be to just:

          cd src/test
          svn update
          cd ../..
          ant - or - make check

          After testing if the test detects problem, then update the entire directory tree and see if your initial reported problem goes away.

          Show
          Curt Arnold added a comment - Believe this is a dup of LOGCXX-175 . Would appreciate your testing if the modified charsetencodertestcase.cpp fails with the old charsetencoder.cpp on your platform. If you have been running unit tests during your build, the simplest way would be to just: cd src/test svn update cd ../.. ant - or - make check After testing if the test detects problem, then update the entire directory tree and see if your initial reported problem goes away.
          Hide
          Myles Bunbury added a comment -

          See my comments in LOGCXX-175 regarding the charsetencodertestcase.cpp test.

          Show
          Myles Bunbury added a comment - See my comments in LOGCXX-175 regarding the charsetencodertestcase.cpp test.
          Hide
          Myles Bunbury added a comment -

          I can no longer reproduce this problem using revision #581595.

          Show
          Myles Bunbury added a comment - I can no longer reproduce this problem using revision #581595.

            People

            • Assignee:
              Curt Arnold
              Reporter:
              Myles Bunbury
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development