Apache 2.0 currently ships with "AddDefaultCharset iso-8859-1" in httpd.conf. This should be fixed (by commenting out or removing it, or replacing it with AddDefaultCharset Off) and the comment in httpd.conf should be corrected, for the following reasons: 1) Charset information is important, but no charset information is much preferable to wrong charset information (contrary to what the comment in httpd.conf says). 2) Many document formats have their own internal way to specify character encoding. It is often sufficient to rely on these. It is often easier, for document authors and administrators, to make sure these are correct, than to make sure that the served headers are correct. 3) In most parts of the world, including Europe and the Americas (because of windows-1252), there is rarely any server that contains only iso-8859-1 documents, and there is rarely any server administrator who knows the encodings of all the served documents (if s/he is even aware of character encoding issues). 4) Upgraders from Apache 1.3 to Apache 2.0 often overlook this setting, resulting in large numbers of files served wrongly with charset=iso-8859-1, and an increasing number of complaints to ISPs and Web hosters. Fixing this bug would make upgrading easier and more predictible, and would reduce complaints to hosters that they have difficulties to address because they are not familiar with character encoding issues. 5) In order to override the setting (and assuming users know how to do this), users have to have FileInfo permissions for their .htaccess files. httpd.conf as shipped contains an example of settings for UserDir directories and similar cases where the users are allowed some amount of configuration, but this is commented out. So the chance is high users don't have a chance to fix the problem, even if they know the correct encoding of their document and the correct way to set the HTTP header. 6) The oft-cited default of iso-8859-1 for HTTP is something that exists on only paper, but not at all in practice. If it were observed in practice, "AddCharsetDefault iso-8859-1" would be unnecessary. Because the default is not observed, this setting is harmful. 7) The comment in httpd.conf claims that this setting is a good start for internationalization. This ignores the fact that many hosts already contain a lot of internationalized documents. In connection with this, the documentation for AddDefaultCharset should be updated to clearly point out the potential dangers of using it (i.e. only use this if you know the character encoding of the majority of the documents on your server, and you know what the exceptions are and make sure they are set correctly).
see also bug 14513 http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513
The new default value causes corruption for people upgrading to the new version. The mislabeling of Windows-1252 as iso8859-1 can cause the euro symbol to be incorrect and result in erroneous financial transactions. The misleading Apache documentation and the change to apply the default charset causes subtle differences which have significant impact. It can also cause non- subtle differences. The fact that web standard calls for http charset to override the charset in the page, means that change will override even pages with self-documenting charset (ie pages that use the meta tag). The old behavior should be restored right away.
I'm not enough of an expert in this area to make a decision about it, but the problem with simply removing this directive is that it creates problems with cross-site scripting. See: http://httpd.apache.org/info/css-security/ and links from that page. In fact, AddDefaultCharset was originally added to deal with these problems, so simply removing it without addressing the CSS issue would not be smart. (See also bug 13986 that states that apache shouldn't set a default content-type by default. This issue should probably be addressed along side that one.)
OK, I think we need a clarification. We are not requesting the command AddDefaultCharset be eliminated. We are requesting that its use in the default configuration to set the charset to iso 8859-1 be eliminated. As for the security risk, the significant piece of the referenced document seems to be: "In addition, web pages should explicitly set a character set to an appropriate value in all dynamically generated pages. " We can all agree with this. The problem is iso 8859-1 is not an appropriate value for the majority of configurations. The article references that this used to be the default for some of the web standards and is no longer the case. It is because it is not the best choice in the majority of cases, even in English speaking markets these days, that it is no longer the default. Perhaps a better compromise solution is to at least ask the administrator what the value should be during the installation and provide a list of the most common encodings for them to choose from. Or default to UTF-8 and let people know clearly that is what you use.
SuSE 9.0 shipped Apache 2.0 with AddDefaultCharset utf8 As a result any other encoding mentioned in the hmtl/xhtml/xml-source sent to the server was ignored. That does not fit the behaviour of Apache talked about on the cross-site-scripting page; there it is told that option AddDefaultCharset is only activated if any page-specific encoding is missing. A mistake in logic, of the behaviour of option AddDefaultCharset ?
Apache has absolutely no interest in <meta> tags inside the html. That comment is talking about AddCharset and similar methods of setting the HTTP headers.
To Joshua Slive: Then the faulty behaviour is on the browser's side, insomuch a request is sent without an complete or appropriate header, i.e. including the encoding information. That was my first guess, at Mozilla. Of course it's presumed that option AddDefaultCharset only is activated if no encoding information is available.Or, to extend the view, if no valid/accepted encoding is sent in the request, given a list of encodings accepted by the server. Would that still help the CSS-problem?
Dietmar, I can't decipher what you are trying to say. But this is not the best place to discuss it. Please try the users@httpd or dev@httpd mailing list.
Is this issue still not resolved? I am Chinese and I am strongly on the side of the reporter. The problem, I suppose, arises from a problematic standard. AFAIK, the header sent from the server overrides that contained in a meta tag. Browsers I use all conform to this behaviour, and sorrows of non-Western Web developers grow. For Chinese, we routinely use <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> to mark a page as Chinese. And this method allows us to place an ISO-8859-1 page on the same server/directory without worrying about server configurations. I even do not know now how to achieve this effect if "AddDefaultCharset" is ever used. Security is important, but I do not think setting the default charset by the SERVER is the right way to go. Indeed, I think the suggestion to use a default charset has caused more problems than solved (see stories below). It is the server-side SCRIPT that should take care of this. And I do not think the comment in the conf file is correct: it really does harm, because setting it will PREVENT Web developer from specifying the charset in their pages, who should really be responsible for such issues. By the way, some stories. Several times I have been called by colleagues because they cannot make Apache display Chinese characters correctly on a newly installed box. I once translated the mission page for webstandards.org, and after a site migration it no longer displayed Chinese. After several emails the non-Western pages are moved to a special server or directory and it was OK. Now the page is archived at http://archive.webstandards.org/mission_gb2312.html And it is wrong AGAIN, along with other translations like Japanese! What is the use of security, if it makes things inaccessible? (Not to mention that it is a wrong response for a security issue. Even the page http://www.cert.org/tech_tips/malicious_code_mitigation.html#3 mentions only the use of a meta tag like the gb2312 example above.) To Dietmar: Your opinions about Accept-Charset are correct only if 1) A Chinese user can set his browser to accept only GB2312; 2) A Chinese user never need to view ISO-8859-1 pages, or the browser supports per-page configuration of Accept-Charset; and 3) If "Accept-Charset: gb2312" is sent to the server, the server will not send the default "charset=ISO-8859-1". I do not see any of them holds.
I agree that shipping with an AddDefaultCharset preset is unsatisfactory, and screws up users of servers with unresponsive admins. Can we simply remove it from the default config to deal with the case of authors having more clue than their sysops? I'd be happy with that, but I'm going to ask for review in other fora where folks have relevant expertise. Actually the solution is already available to users. There's a bunch of AddCharset directives in the default httpd.conf that serve precisely this purpose: AddCharset ISO-8859-1 .iso8859-1 .latin1 AddCharset ISO-8859-2 .iso8859-2 .latin2 .cen AddCharset ISO-8859-3 .iso8859-3 .latin3 etc. So a fix would be to correct errors and omissions in that list, and leave it to authors to control their charset using a suffix on the document name. Of course that's ugly, but at least it works. Also worth noting: mod_proxy_html 2.x will parse META elements in HTML and XHTML documents and convert them to real HTTP headers. See http://apache.webthing.com/mod_proxy_html/
I just wrote: > So a fix would be to correct errors and omissions in that list, and leave it > to authors to control their charset using a suffix on the document name. > Of course that's ugly, but at least it works. Hmmm, I neglected to add that the ugliness goes away if that's used with mod_negotiation: perhaps we shold ship with multivies on by default? The other crucial issue is of course to document it!
*** Bug 30860 has been marked as a duplicate of this bug. ***
In addition to the principal reasons given earlier there is also a pragmatic reason not to use AddDefaultCharset in the default httpd.conf. Sending the charset declaration triggers an obscure bug in MSIE with multipart forms, as documented at http://www.interactivetools.com/forum/gforum.cgi?post=34345;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed (at the bottom). I know that Microsoft should fix their browser but I spend a lot of time today debugging an old script that didn't work after upgrading to Apache 2 because of this. I think the right thing is not to trigger bugs in a product that is still used by so many users by shipping a httpd.conf that contains this as a default.
I'm surprised that this bug is still around. The only justification for that that I was able to find in the record is the pointer to the Client Side Scripting (CSS) issue. However, this is based on a shallow understanding of CSS. In order to avoid CSS, just setting whatever character encoding is not good enough. A solution requires that the client side gets the right character encoding. Of course, declaring iso-8859-1 as a default doesn't work for a huge amount of Web pages. So this default should be removed as quickly as possible, and the documentation for CSS should be updated to make more clear that it's not "declare an encoding" but "declare the right encoding" that is important (also for other reasons than just security). I can easily provide more information (e.g. a page that shows how use of the wrong encoding, such as declaring a page as iso-8859-1 that isn't iso-8859-1 can lead to attacks) if contacted directly.
This was supposed to be fixed a long time ago. It was for 1.3. I am verifying with the group and will remove it from the default config if there are no objections.
Fixed in HEAD (2.1.x), may be backported later to 2.0.x. svn rev 111582
*** Bug 33028 has been marked as a duplicate of this bug. ***