Log4j 2
  1. Log4j 2
  2. LOG4J2-763

Async loggers convert message parameters toString at log record writing not at log statement execution

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.2
    • Component/s: None
    • Labels:
      None

      Description

      http://javaadventure.blogspot.com/2014/07/log4j-20-async-loggers-and-immutability.html

      When using parameterized messages, the toString() method of the log messages is not called when the log message is enqueued, rather after the log message has been dequeued for writing. If any of the message parameters are mutable, they can thus have changed state before the log message is written, thus resulting in the logged message content being incorrect.

      From the blog post, code that demonstrates the problem:

      import org.apache.logging.log4j.LogManager;
      import org.apache.logging.log4j.Logger;
      import java.util.concurrent.atomic.AtomicLong;
      
      public class App {
          private static final AtomicLong value = new AtomicLong();
          public String toString() {
              return Long.toString(value.get());
          }
          public long next() {
              return value.incrementAndGet();
          }
      
          public static void main(String[] args) {
              for (int i = 0; i < 32; i++) {
                  new Thread() {
                      final Logger logger = LogManager.getLogger(App.class);
                       final App instance = new App();
                      @Override
                       public void run() {
                           for (int i = 0; i < 100000; i++) {
                               logger.warn("{} == {}", instance.next(), instance);
                           }
                       }
                  }.start();
              }
          }
      }
      

      Here is the first few lines of logging output

      2014-07-28 15:59:45,729 WARN t.App [Thread-13] 13 == 13 
      2014-07-28 15:59:45,730 WARN t.App [Thread-29] 29 == 29 
      2014-07-28 15:59:45,729 WARN t.App [Thread-15] 15 == 15 
      2014-07-28 15:59:45,729 WARN t.App [Thread-6] 6 == 6 
      2014-07-28 15:59:45,730 WARN t.App [Thread-30] 30 == 30 
      2014-07-28 15:59:45,729 WARN t.App [Thread-20] 20 == 20 
      2014-07-28 15:59:45,729 WARN t.App [Thread-8] 8 == 8 
      2014-07-28 15:59:45,730 WARN t.App [Thread-28] 28 == 28 
      2014-07-28 15:59:45,729 WARN t.App [Thread-19] 19 == 19 
      2014-07-28 15:59:45,729 WARN t.App [Thread-18] 18 == 18 
      2014-07-28 15:59:45,729 WARN t.App [Thread-5] 5 == 6 
      2014-07-28 15:59:45,731 WARN t.App [Thread-13] 33 == 37 
      2014-07-28 15:59:45,731 WARN t.App [Thread-8] 39 == 39 
      2014-07-28 15:59:45,731 WARN t.App [Thread-28] 40 == 41 
      2014-07-28 15:59:45,731 WARN t.App [Thread-18] 42 == 43 
      2014-07-28 15:59:45,731 WARN t.App [Thread-5] 43 == 43
      

      To make my previous code work with Asynchronous loggers (other than by fixing the mutable state) I would need to log like this:

      if (logger.isWarnEnabled()) {
          logger.warn("{} == {}", instance.next(), instance.toString());
      }
      

        Issue Links

          Activity

          Hide
          Ralph Goers added a comment -

          Thanks Stephen.

          We actually read your blog and commented on it on the private list (along with a few other blogs mentioning Log4j 2). Remko suggested that this may be solved by having SimpleMessage and ObjectMessage call toString() on the constructor argument. Since SimpleMessage takes a String already nothing probably needs to be done with it.

          However, I am not sure it is that simple. ParamterizedMessage takes an array of objects for substitution. These objects should not be passed to another thread or the same problem will result. So really, the Message's getFormattedMessage method really needs to be called and the result should be passed as part of the log event.

          Show
          Ralph Goers added a comment - Thanks Stephen. We actually read your blog and commented on it on the private list (along with a few other blogs mentioning Log4j 2). Remko suggested that this may be solved by having SimpleMessage and ObjectMessage call toString() on the constructor argument. Since SimpleMessage takes a String already nothing probably needs to be done with it. However, I am not sure it is that simple. ParamterizedMessage takes an array of objects for substitution. These objects should not be passed to another thread or the same problem will result. So really, the Message's getFormattedMessage method really needs to be called and the result should be passed as part of the log event.
          Hide
          Remko Popma added a comment -

          Ralph, I think ParameterizedMessage is already okay; its constructor accepts an Object[] array, and creates a String[] array by calling the deepToString method on each of the elements of the object parameter array. This happens in the constructor, so in the application thread, not in the async background thread. The ParameterizedMessage.getFormattedMessage method called by the background thread uses the resulting String[] array.

          So ParameterizedMessage already takes a String snapshot of each of the parameters when the message is constructed.

          Similar logic probably needs to be added to:

          • ObjectMessage
          • FormattedMessage
          • LocalizedMessage
          • MapMessage (hmm... this Message is mutable by design it seems. That may be a problem...)
          • StringFormattedMessage
          • StructuredDataMessage (subclass of MapMessage, also mutable)
          Show
          Remko Popma added a comment - Ralph, I think ParameterizedMessage is already okay; its constructor accepts an Object[] array, and creates a String[] array by calling the deepToString method on each of the elements of the object parameter array. This happens in the constructor, so in the application thread, not in the async background thread. The ParameterizedMessage.getFormattedMessage method called by the background thread uses the resulting String[] array. So ParameterizedMessage already takes a String snapshot of each of the parameters when the message is constructed. Similar logic probably needs to be added to: ObjectMessage FormattedMessage LocalizedMessage MapMessage (hmm... this Message is mutable by design it seems. That may be a problem...) StringFormattedMessage StructuredDataMessage (subclass of MapMessage, also mutable)
          Hide
          Ralph Goers added a comment -

          Well, I still have misgivings about this approach as it means that Messages have to be very careful how they are constructed, but it looks to me like it would work.

          In the case of MapMessage, if it is not a SortedMap it is copied into one. We could just always copy it. Since it is a Map<String, String> it can still be manipulated but I think it is OK to allow updates to the Map associated with the Message as that should only be done in special cases within the logging system. FWIW, I am doing that in one of my projects in a LoggerWrapper. We just need to document what is thread safe and what isn't.

          Show
          Ralph Goers added a comment - Well, I still have misgivings about this approach as it means that Messages have to be very careful how they are constructed, but it looks to me like it would work. In the case of MapMessage, if it is not a SortedMap it is copied into one. We could just always copy it. Since it is a Map<String, String> it can still be manipulated but I think it is OK to allow updates to the Map associated with the Message as that should only be done in special cases within the logging system. FWIW, I am doing that in one of my projects in a LoggerWrapper. We just need to document what is thread safe and what isn't.
          Hide
          Remko Popma added a comment -

          True, Messages need to be careful to either cache the formatted message, or the parameters needed to create a formatted message when requested.

          I don't mind calling getFormattedMessage on the log event before passing it to the async background thread. Currently neither AsyncLoggers nor AsyncAppenders do this, but this can easily be changed.

          We still need to modify a few Message implementations to make sure that they cache the formatted message rather than recalculating it from scratch every time getFormattedMessage is called.

          On a related note: the formattedMessage field is marked as transient in a number of Message implementations. This may cause issues unless they also cache the parameters in a non-transient String[] array and getFormattedMessage uses the String[] parameters and not the original Object[] parameters.

          Show
          Remko Popma added a comment - True, Messages need to be careful to either cache the formatted message, or the parameters needed to create a formatted message when requested. I don't mind calling getFormattedMessage on the log event before passing it to the async background thread. Currently neither AsyncLoggers nor AsyncAppenders do this, but this can easily be changed. We still need to modify a few Message implementations to make sure that they cache the formatted message rather than recalculating it from scratch every time getFormattedMessage is called. On a related note: the formattedMessage field is marked as transient in a number of Message implementations. This may cause issues unless they also cache the parameters in a non-transient String[] array and getFormattedMessage uses the String[] parameters and not the original Object[] parameters.
          Hide
          Remko Popma added a comment - - edited

          Strange: I thought ParameterizedMessage was okay because it already takes a snapshot of the parameters when the LogEvent is created, but looking again at Stephen's blog post, he is using... parameterized messages.

          Looking at the resulting log output, I can't actually see what is wrong with it.
          Stephen, what did you expect to see in the log output?

          The suggested solution:

          if (logger.isWarnEnabled()) {
              logger.warn("{} == {}", instance.next(), instance.toString());
          }
          

          I don't think this will produce different results, does it? The only difference I can see is that the window of opportunity for another thread to increment the counter is narrower now than in the original code. But it will still not guarantee that you see the same number...

          The problem is that the value field is static, so shared by all App instances. Giving each Thread its own App instance does not help since they all still share the same AtomicLong value field.

          Show
          Remko Popma added a comment - - edited Strange: I thought ParameterizedMessage was okay because it already takes a snapshot of the parameters when the LogEvent is created, but looking again at Stephen's blog post, he is using... parameterized messages. Looking at the resulting log output, I can't actually see what is wrong with it. Stephen, what did you expect to see in the log output? The suggested solution: if (logger.isWarnEnabled()) { logger.warn( "{} == {}" , instance.next(), instance.toString()); } I don't think this will produce different results, does it? The only difference I can see is that the window of opportunity for another thread to increment the counter is narrower now than in the original code. But it will still not guarantee that you see the same number... The problem is that the value field is static, so shared by all App instances. Giving each Thread its own App instance does not help since they all still share the same AtomicLong value field.
          Hide
          Ralph Goers added a comment -

          Actually, looking at the test in the blog and copied here, unless I am mistaken the test is at fault, not Log4j.

          Yes, every thread is creating a new App instance. However, I don't see how that accomplishes anything as the AtomicLong is static so every thread shares it. So thread A will call instance.next() and then before it goes to populate the next parameter another thread can call instance.next() on its App object and cause the value being passed to Log4j to change in Thread A.

          Show
          Ralph Goers added a comment - Actually, looking at the test in the blog and copied here, unless I am mistaken the test is at fault, not Log4j. Yes, every thread is creating a new App instance. However, I don't see how that accomplishes anything as the AtomicLong is static so every thread shares it. So thread A will call instance.next() and then before it goes to populate the next parameter another thread can call instance.next() on its App object and cause the value being passed to Log4j to change in Thread A.
          Hide
          Remko Popma added a comment - - edited

          My conclusions:

          In this particular case the problem seems to be with the test case. Logging messages with parameters (like logger.info("some {} message", someParameter); ) is thread-safe even if the someParameter object changes after the logger call.

          Other messages can be improved. Especially logging an object directly (like logger.info(someParameter); ) can give unexpected results if the parameter object changes before the async background thread has written the log message. Similarly for some other message implementations mentioned above. (We should also review which fields should be marked transient on the message classes.)

          Apart from that, are there any documentation improvements we can make?
          Any tips for using parameterized messages with async logging? I think it should "just work", but are there any scenarios that fall through the cracks that we should mention in documentation?

          Show
          Remko Popma added a comment - - edited My conclusions: In this particular case the problem seems to be with the test case. Logging messages with parameters (like logger.info("some {} message", someParameter); ) is thread-safe even if the someParameter object changes after the logger call. Other messages can be improved. Especially logging an object directly (like logger.info(someParameter); ) can give unexpected results if the parameter object changes before the async background thread has written the log message. Similarly for some other message implementations mentioned above. (We should also review which fields should be marked transient on the message classes.) Apart from that, are there any documentation improvements we can make? Any tips for using parameterized messages with async logging? I think it should "just work", but are there any scenarios that fall through the cracks that we should mention in documentation?
          Hide
          Remko Popma added a comment -

          I've taken the simple approach of calling getFormattedMessage in the constructor to cache the formatted message string. For most message implementation classes this is sufficient.

          For ObjectMessage I added a String field to capture the value of obj.toString() at message construction time.

          MapMessage and StructuredDataMessage are mutable by design. The simplest thing to do here is to document that they should not be modified after they have been logged.

          Please see attached patch.
          If there are no objections to this approach I will commit this (still need to do unit tests and site changes to mention users should take care with MapMessage and StructuredDataMessage with async logging.)

          Show
          Remko Popma added a comment - I've taken the simple approach of calling getFormattedMessage in the constructor to cache the formatted message string. For most message implementation classes this is sufficient. For ObjectMessage I added a String field to capture the value of obj.toString() at message construction time. MapMessage and StructuredDataMessage are mutable by design. The simplest thing to do here is to document that they should not be modified after they have been logged. Please see attached patch. If there are no objections to this approach I will commit this (still need to do unit tests and site changes to mention users should take care with MapMessage and StructuredDataMessage with async logging.)
          Hide
          Remko Popma added a comment -

          Improved FormattedMessage, StringFormattedMessage, LocalizedMessage, MessageFormatMessage and ObjectMessage for asynchronous logging to ensure the formatted message does not change even if parameters are modified by the application. Added JUnit tests.

          ParameterizedMessage and ThreadDumpMessage were already safe.

          Improved docs for MapMessage and StructuredDataMessage: these are mutable by design and users should take care when logging these messages asynchronously. Improved site docs.

          Fixed in revision 1617051. Please verify and close.

          Show
          Remko Popma added a comment - Improved FormattedMessage, StringFormattedMessage, LocalizedMessage, MessageFormatMessage and ObjectMessage for asynchronous logging to ensure the formatted message does not change even if parameters are modified by the application. Added JUnit tests. ParameterizedMessage and ThreadDumpMessage were already safe. Improved docs for MapMessage and StructuredDataMessage: these are mutable by design and users should take care when logging these messages asynchronously. Improved site docs. Fixed in revision 1617051. Please verify and close.
          Hide
          Remko Popma added a comment - - edited

          Re-opening: the current implementation solves the problem but may have some performance impact. It is better to avoid creating the formatted string in the constructor because messages may be filtered out (e.g. because their log level is too low) and this formatted message string may end up not being used.

          A better solution is to revert to lazy initialization and have async components call the Message.getFormattedMessage() method before passing the log event to another thread.

          In addition to reverting,

          • ObjectMessage should cache its formatted message string and initialize this cached string in getFormattedMessage.
          • ParameterizedMessage should not take string snapshots in its constructor but should delay that work until the getFormattedMessage method is called. - I created LOG4J2-772 for this as I may not have time for this refactoring at the moment and it is not as urgent.
          Show
          Remko Popma added a comment - - edited Re-opening: the current implementation solves the problem but may have some performance impact. It is better to avoid creating the formatted string in the constructor because messages may be filtered out (e.g. because their log level is too low) and this formatted message string may end up not being used. A better solution is to revert to lazy initialization and have async components call the Message.getFormattedMessage() method before passing the log event to another thread. In addition to reverting, ObjectMessage should cache its formatted message string and initialize this cached string in getFormattedMessage . ParameterizedMessage should not take string snapshots in its constructor but should delay that work until the getFormattedMessage method is called. - I created LOG4J2-772 for this as I may not have time for this refactoring at the moment and it is not as urgent.
          Hide
          Remko Popma added a comment -

          Use the mechanism described in the comment above to solve the issue. Committed in revision 1617291. Please verify and close.

          Show
          Remko Popma added a comment - Use the mechanism described in the comment above to solve the issue. Committed in revision 1617291. Please verify and close.
          Hide
          Remko Popma added a comment - - edited

          Added below serialization JUnit tests in revision 1617465.

          // add to ObjectMessageTest
              @Test
              public void testSerializeWithSerializableParam() {
                  BigDecimal big = BigDecimal.valueOf(123.456);
                  ObjectMessage msg = new ObjectMessage(big);
                  ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg));
                  assertEquals(msg, other);
              }
              
              
              @Test
              public void testDeserializeNonSerializableParamNotEqualIfToStringDiffers() {
                  ObjectMessageTest nonSerializable = new ObjectMessageTest();
                  assertFalse(nonSerializable instanceof Serializable);
                  ObjectMessage msg = new ObjectMessage(nonSerializable);
                  ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg));
                  assertNotEquals("Expected different: toString is different", msg, other);
              }
              
              @Test
              public void testDeserializeNonSerializableParamEqualIfToStringSame() {
                  class NonSerializable {
                      public boolean equals(Object other) {
                          return other instanceof NonSerializable; // a very lenient equals()
                      }
                  }
                  NonSerializable nonSerializable = new NonSerializable();
                  assertFalse(nonSerializable instanceof Serializable);
                  ObjectMessage msg = new ObjectMessage(nonSerializable);
                  ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg));
          
          // TODO this fails: msg.obj.equals(other.obj) is false...
          // TODO our equals() implementation does not match the serialization mechanism
                  assertEquals(msg, other);
              }
          
          // new helper class (src/test/java/org.apache.log4j.util?)
          public class SerialUtil {
              public static byte[] serialize(Serializable obj) {
                  try {
                      ByteArrayOutputStream bas = new ByteArrayOutputStream(8192);
                      ObjectOutputStream oos = new ObjectOutputStream(bas);
                      oos.writeObject(obj);
                      oos.flush();
                      return bas.toByteArray();
                  } catch (Exception ex) {
                      throw new IllegalStateException("Could not serialize", ex);
                  }
              }
              
              @SuppressWarnings("unchecked")
              public static <T> T deserialize(byte[] data) {
                  try {
                      ByteArrayInputStream bas = new ByteArrayInputStream(data);
                      ObjectInputStream ois = new ObjectInputStream(bas);
                      return (T) ois.readObject();
                  } catch (Exception ex) {
                      throw new IllegalStateException("Could not deserialize", ex);
                  }
              }
          }
          
          Show
          Remko Popma added a comment - - edited Added below serialization JUnit tests in revision 1617465. // add to ObjectMessageTest @Test public void testSerializeWithSerializableParam() { BigDecimal big = BigDecimal.valueOf(123.456); ObjectMessage msg = new ObjectMessage(big); ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg)); assertEquals(msg, other); } @Test public void testDeserializeNonSerializableParamNotEqualIfToStringDiffers() { ObjectMessageTest nonSerializable = new ObjectMessageTest(); assertFalse(nonSerializable instanceof Serializable); ObjectMessage msg = new ObjectMessage(nonSerializable); ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg)); assertNotEquals( "Expected different: toString is different" , msg, other); } @Test public void testDeserializeNonSerializableParamEqualIfToStringSame() { class NonSerializable { public boolean equals( Object other) { return other instanceof NonSerializable; // a very lenient equals() } } NonSerializable nonSerializable = new NonSerializable(); assertFalse(nonSerializable instanceof Serializable); ObjectMessage msg = new ObjectMessage(nonSerializable); ObjectMessage other = SerialUtil.deserialize(SerialUtil.serialize(msg)); // TODO this fails: msg.obj.equals(other.obj) is false ... // TODO our equals() implementation does not match the serialization mechanism assertEquals(msg, other); } // new helper class (src/test/java/org.apache.log4j.util?) public class SerialUtil { public static byte [] serialize(Serializable obj) { try { ByteArrayOutputStream bas = new ByteArrayOutputStream(8192); ObjectOutputStream oos = new ObjectOutputStream(bas); oos.writeObject(obj); oos.flush(); return bas.toByteArray(); } catch (Exception ex) { throw new IllegalStateException( "Could not serialize" , ex); } } @SuppressWarnings( "unchecked" ) public static <T> T deserialize( byte [] data) { try { ByteArrayInputStream bas = new ByteArrayInputStream(data); ObjectInputStream ois = new ObjectInputStream(bas); return (T) ois.readObject(); } catch (Exception ex) { throw new IllegalStateException( "Could not deserialize" , ex); } } }
          Hide
          Remko Popma added a comment -

          Fixed ObjectMessage.equals() and JUnit test in revision 1617497.

          Show
          Remko Popma added a comment - Fixed ObjectMessage.equals() and JUnit test in revision 1617497.
          Hide
          Steven Grossman added a comment -

          I appreciate the simplicity introduced by this patch, but it comes with a significant limitation of utility. Formatting a message can be an expensive operation (e.g. String.format) and it is undesirable for this to be done in the calling thread when using an async logger.

          Would you consider bringing back the previous behavior via a config/system property?

          Show
          Steven Grossman added a comment - I appreciate the simplicity introduced by this patch, but it comes with a significant limitation of utility. Formatting a message can be an expensive operation (e.g. String.format) and it is undesirable for this to be done in the calling thread when using an async logger. Would you consider bringing back the previous behavior via a config/system property?

            People

            • Assignee:
              Remko Popma
              Reporter:
              Stephen Connolly
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development