Bug 44644 - Default ContentType for Layout is text/html which is FALSE
Summary: Default ContentType for Layout is text/html which is FALSE
Status: RESOLVED FIXED
Alias: None
Product: Log4j - Now in Jira
Classification: Unclassified
Component: Layout (show other bugs)
Version: 1.2
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: log4j-dev
URL:
Keywords: PatchAvailable, RFC
Depends on:
Blocks:
 
Reported: 2008-03-20 02:57 UTC by Илья Казначеев
Modified: 2008-10-15 21:58 UTC (History)
0 users



Attachments
The source file (UTF8) (964 bytes, application/octet-stream)
2008-08-28 08:16 UTC, Илья Казначеев
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Илья Казначеев 2008-03-20 02:57:54 UTC
Default ContentType (Layout.getContentType() method) is "text/plain", inherited by SimpleLayout.
Default ContentType for HTMLLayout is "text/html".

This is FALSE, because it is not. These layouts' messages are backed by java strings, and java strings are mandatorily unicode. Thus, the correct ContentType for layouts is `text/plain; charset="UTF-8"' and `text/html; charset="UTF-8"' respectively.

This becomes more and more important as various java libs internationalize their error messages. XALAN xslt processor does that, and Postgres database does that also. Trying to log these errors via SMTPAppender yields a crop of question-marks, which is not really useful since you probably wanted these to bring answers rather than questions.

If this can cause errors due to applications expecting no charset= field there, you can fix just SMTPAppender which really trashes its mails.
For now, I've just did
new SimpleLayout() {
 @Override
 public String getContentType()
 {
  return "text/plain; charset=\"UTF-8\"";
 }
}
But I think this bug should be addressed at your level.
Comment 1 Thorbjørn Ravn Andersen 2008-08-02 14:41:15 UTC
If I understand you correctly the problem is that the HTMLLayout class does not have a character set part in its content type?

One thing is that characters are unicode internally, but the encoding might very well be single bytes (like ISO-Latin-1) so it is not necessarily so that it is UTF-8.

Can you provide a simple test case showing an incorrectly encoded message?
Comment 2 Илья Казначеев 2008-08-28 08:15:03 UTC
Surely I do!

import org.apache.log4j.Logger;
import org.apache.log4j.SimpleLayout;
import org.apache.log4j.net.SMTPAppender;

public class testlog {

        private static Logger logger =
        Logger.getLogger("testlog");

        public static void main(String[] args)
        {
                        SMTPAppender appender = new SMTPAppender();
                        appender.setSMTPHost("alt1.gmail-smtp-in.l.google.com");
                        appender.setFrom(args[0]);
                        appender.setName("CHARSETMAIL");
                        appender.setSubject("Encoding FAIL");
                        appender.setTo(args[0]);
                        appender.setLayout(new SimpleLayout() {
                                @Override
                                public String getContentType()
                                {
                                        return "text/plain; charset=\"UTF-8\"";
                                }
                        });
                        appender.activateOptions();
                        logger.addAppender(appender);

                logger.error("Μπορῶ νὰ φάω σπασμένα γυαλιὰ χωρὶς νὰ πάθω τίποτα. \n Ek get etið gler án þess að verða sár. \n Я магу есці шкло, яно мне не шкодзіць.\n\n");
        }

}

Try running it as java -cp .:log4j-1.2.14.jar:mail.jar:activation.jar testlog {whoeveryouare}@gmail.com
With and without the anonymous subclass.

With it, you'll get proper message with greek and cyrillic; without it, you'll get question marks instead of those characters.
Comment 3 Илья Казначеев 2008-08-28 08:16:01 UTC
Created attachment 22495 [details]
The source file (UTF8)

This file can be used as a test case.
Comment 4 Curt Arnold 2008-08-28 09:23:26 UTC
The layout does have the correct values for contentType.  See http://en.wikipedia.org/wiki/MIME for a summary of the various fields of the MIME type (a little easier to read than the IETF RFC's).  The problem is with the SMTPAppender does not properly encode the message into the form specified by the RFC.  Could you try this and see if works for you?  Will have to dig a little deeper to see what to do when a contentType other than "text/plain" is specified.


ndex: src/main/java/org/apache/log4j/net/SMTPAppender.java
===================================================================
--- src/main/java/org/apache/log4j/net/SMTPAppender.java	(revision 686147)
+++ src/main/java/org/apache/log4j/net/SMTPAppender.java	(working copy)
@@ -322,10 +322,15 @@
 	}
       }
       t = layout.getFooter();
-      if(t != null)
-	sbuf.append(t);
-      part.setContent(sbuf.toString(), layout.getContentType());
-
+      if(t != null) {
+	     sbuf.append(t);
+      }
+      String contentType = layout.getContentType();
+      if (contentType == null || contentType.equals("text/plain")) {
+         part.setText(sbuf.toString(), "UTF-8");
+      } else {
+         part.setContent(sbuf.toString(), contentType);
+      }
       Multipart mp = new MimeMultipart();
       mp.addBodyPart(part);
       msg.setContent(mp);

Comment 5 Илья Казначеев 2008-08-28 10:12:04 UTC
I'll try it tomorrow, but I suppose you should support text/* (notably text/html) the same way.
Comment 6 Curt Arnold 2008-10-15 21:58:45 UTC
Committed change in rev 705140 after much experimentation.  JavaMail documentation doesn't see to offer much guidance on creating messages containing non-ascii characters.  

Subjects containing non-ASCII characters were also mangled.