Bug 23421 - Remove AddDefaultCharset from httpd.conf as shipped
Summary: Remove AddDefaultCharset from httpd.conf as shipped
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.0-HEAD
Hardware: All All
: P3 major with 16 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL: http://cvs.apache.org/viewcvs.cgi/htt...
Keywords:
: 30860 33028 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-09-25 20:16 UTC by Martin Dürst
Modified: 2014-02-17 13:59 UTC (History)
6 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Dürst 2003-09-25 20:16:23 UTC
Apache 2.0 currently ships with "AddDefaultCharset iso-8859-1" in httpd.conf.
This should be fixed (by commenting out or removing it, or replacing it with
AddDefaultCharset Off) and the comment in httpd.conf should be corrected,
for the following reasons:

1) Charset information is important, but no charset information is much
   preferable to wrong charset information (contrary to what the comment
   in httpd.conf says).

2) Many document formats have their own internal way to specify character
   encoding. It is often sufficient to rely on these. It is often easier,
   for document authors and administrators, to make sure these are correct,
   than to make sure that the served headers are correct.

3) In most parts of the world, including Europe and the Americas (because of
   windows-1252), there is rarely any server that contains only iso-8859-1
   documents, and there is rarely any server administrator who knows the
   encodings of all the served documents (if s/he is even aware of character
   encoding issues).

4) Upgraders from Apache 1.3 to Apache 2.0 often overlook this setting,
   resulting in large numbers of files served wrongly with charset=iso-8859-1,
   and an increasing number of complaints to ISPs and Web hosters. Fixing
   this bug would make upgrading easier and more predictible, and would
   reduce complaints to hosters that they have difficulties to address
   because they are not familiar with character encoding issues.

5) In order to override the setting (and assuming users know how to do
   this), users have to have FileInfo permissions for their .htaccess files.
   httpd.conf as shipped contains an example of settings for UserDir
   directories and similar cases where the users are allowed some
   amount of configuration, but this is commented out. So the chance
   is high users don't have a chance to fix the problem, even if they
   know the correct encoding of their document and the correct way to
   set the HTTP header.

6) The oft-cited default of iso-8859-1 for HTTP is something that exists
   on only paper, but not at all in practice. If it were observed in
   practice, "AddCharsetDefault iso-8859-1" would be unnecessary.
   Because the default is not observed, this setting is harmful.

7) The comment in httpd.conf claims that this setting is a good start for
   internationalization. This ignores the fact that many hosts already
   contain a lot of internationalized documents.

In connection with this, the documentation for AddDefaultCharset should be
updated to clearly point out the potential dangers of using it (i.e.
only use this if you know the character encoding of the majority of
the documents on your server, and you know what the exceptions are and
make sure they are set correctly).
Comment 1 Martin Dürst 2003-09-25 20:19:48 UTC
see also bug 14513
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513
Comment 2 Tex Texin 2003-09-25 23:34:06 UTC
The new default value causes corruption for people upgrading to the new 
version. The mislabeling of Windows-1252 as iso8859-1 can cause the euro symbol 
to be incorrect and result in erroneous financial transactions.
The misleading Apache documentation and the change to apply the default charset 
causes subtle differences which have significant impact. It can also cause non-
subtle differences. The fact that web standard calls for http charset to 
override the charset in the page, means that change will override even pages 
with self-documenting charset (ie pages that use the meta tag). The old 
behavior should be restored right away.
Comment 3 Joshua Slive 2003-09-26 03:17:26 UTC
I'm not enough of an expert in this area to make a decision about it,
but the problem with simply removing this directive is that it
creates problems with cross-site scripting.  See:
http://httpd.apache.org/info/css-security/
and links from that page.

In fact, AddDefaultCharset was originally added to deal with these
problems, so simply removing it without addressing the CSS issue
would not be smart.

(See also bug 13986 that states
that apache shouldn't set a default content-type by default.
This issue should probably be addressed along side that one.)
Comment 4 Tex Texin 2003-09-26 03:40:31 UTC
OK, I think we need a clarification. We are not requesting the command AddDefaultCharset be eliminated. 
We are requesting that its use in the default configuration to set the charset to iso 8859-1 be eliminated.

As for the security risk, the significant piece of the referenced document seems to be:
"In addition, web pages should explicitly set a character set to an appropriate value in all dynamically generated pages. "

We can all agree with this. The problem is iso 8859-1 is not an appropriate value for the majority of configurations.
The article references that this used to be the default for some of the web standards and is no longer the case. 
It is because it is not the best choice in the majority of cases, even in English speaking markets these days, that it is no longer the default.
Perhaps a better compromise solution is to at least ask the administrator what the value should be during the installation and 
provide a list of the most common encodings for them to choose from.
Or default to UTF-8 and let people know clearly that is what you use.
Comment 5 Dietmar Temme 2004-05-05 13:51:41 UTC
SuSE 9.0 shipped Apache 2.0 with AddDefaultCharset utf8

As a result any other encoding mentioned in the hmtl/xhtml/xml-source sent to
the server was ignored.

That does not fit the behaviour of Apache talked about on the
cross-site-scripting page; there it is told that option AddDefaultCharset is
only activated if any page-specific encoding is missing. 

A mistake in logic, of the behaviour of option AddDefaultCharset ?
Comment 6 Joshua Slive 2004-05-05 17:48:31 UTC
Apache has absolutely no interest in <meta> tags inside the html.  That comment
is talking about AddCharset and similar methods of setting the HTTP headers.
Comment 7 Dietmar Temme 2004-05-24 13:55:51 UTC
To Joshua Slive: Then the faulty behaviour is on the browser's side, insomuch a
request is sent without an complete or appropriate header, i.e. including the
encoding information. That was my first guess, at Mozilla.

Of course it's presumed that option AddDefaultCharset only is activated if no
encoding information is available.Or, to extend the view, if no valid/accepted
encoding is sent in the request, given a list of encodings accepted by the server.

Would that still help the CSS-problem? 
Comment 8 Joshua Slive 2004-05-24 17:58:10 UTC
Dietmar, I can't decipher what you are trying to say.  But this is not the best
place to discuss it.  Please try the users@httpd or dev@httpd mailing list.
Comment 9 Wu Yongwei 2004-08-01 08:21:45 UTC
Is this issue still not resolved?  I am Chinese and I am strongly on the side of
the reporter.

The problem, I suppose, arises from a problematic standard.  AFAIK, the header
sent from the server overrides that contained in a meta tag.  Browsers I use all
conform to this behaviour, and sorrows of non-Western Web developers grow.  For
Chinese, we routinely use

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

to mark a page as Chinese.  And this method allows us to place an ISO-8859-1
page on the same server/directory without worrying about server configurations.
 I even do not know now how to achieve this effect if "AddDefaultCharset" is
ever used.

Security is important, but I do not think setting the default charset by the
SERVER is the right way to go.  Indeed, I think the suggestion to use a default
charset has caused more problems than solved (see stories below).  It is the
server-side SCRIPT that should take care of this.  And I do not think the
comment in the conf file is correct: it really does harm, because setting it
will PREVENT Web developer from specifying the charset in their pages, who
should really be responsible for such issues.

By the way, some stories.  Several times I have been called by colleagues
because they cannot make Apache display Chinese characters correctly on a newly
installed box.  I once translated the mission page for webstandards.org, and
after a site migration it no longer displayed Chinese.  After several emails the
non-Western pages are moved to a special server or directory and it was OK.  Now
the page is archived at 

http://archive.webstandards.org/mission_gb2312.html

And it is wrong AGAIN, along with other translations like Japanese!

What is the use of security, if it makes things inaccessible?

(Not to mention that it is a wrong response for a security issue.  Even the page
http://www.cert.org/tech_tips/malicious_code_mitigation.html#3
mentions only the use of a meta tag like the gb2312 example above.)

To Dietmar:

Your opinions about Accept-Charset are correct only if

1) A Chinese user can set his browser to accept only GB2312;
2) A Chinese user never need to view ISO-8859-1 pages, or the browser supports
per-page configuration of Accept-Charset; and
3) If "Accept-Charset: gb2312" is sent to the server, the server will not send
the default "charset=ISO-8859-1".

I do not see any of them holds.
Comment 10 Nick Kew 2004-08-01 10:42:47 UTC
I agree that shipping with an AddDefaultCharset preset is unsatisfactory,
and screws up users of servers with unresponsive admins.

Can we simply remove it from the default config to deal with the case of authors
having more clue than their sysops?  I'd be happy with that, but I'm going to
ask for review in other fora where folks have relevant expertise.

Actually the solution is already available to users.  There's a bunch of
AddCharset directives in the default httpd.conf that serve precisely this
purpose:
AddCharset ISO-8859-1  .iso8859-1  .latin1
AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
AddCharset ISO-8859-3  .iso8859-3  .latin3
etc.

So a fix would be to correct errors and omissions in that list, and leave it
to authors to control their charset using a suffix on the document name.
Of course that's ugly, but at least it works.

Also worth noting: mod_proxy_html 2.x will parse META elements in
HTML and XHTML documents and convert them to real HTTP headers.
See http://apache.webthing.com/mod_proxy_html/
Comment 11 Nick Kew 2004-08-01 10:50:30 UTC
I just wrote:

> So a fix would be to correct errors and omissions in that list, and leave it
> to authors to control their charset using a suffix on the document name.
> Of course that's ugly, but at least it works.

Hmmm, I neglected to add that the ugliness goes away if that's used with 
mod_negotiation: perhaps we shold ship with multivies on by default?
The other crucial issue is of course to document it!
Comment 12 Nick Kew 2004-08-26 12:30:29 UTC
*** Bug 30860 has been marked as a duplicate of this bug. ***
Comment 13 Sebastiaan Hoogeveen 2004-12-09 15:16:39 UTC
In addition to the principal reasons given earlier there is also a pragmatic
reason not to use AddDefaultCharset in the default httpd.conf. Sending the
charset declaration triggers an obscure bug in MSIE with multipart forms, as
documented at
http://www.interactivetools.com/forum/gforum.cgi?post=34345;sb=post_latest_reply;so=ASC;forum_view=forum_view_collapsed
(at the bottom).

I know that Microsoft should fix their browser but I spend a lot of time today
debugging an old script that didn't work after upgrading to Apache 2 because of
this. I think the right thing is not to trigger bugs in a product that is still
used by so many users by shipping a httpd.conf that contains this as a default.
Comment 14 Martin Dürst 2004-12-10 05:53:41 UTC
I'm surprised that this bug is still around. The only justification for that
that I was able to find in the record is the pointer to the Client Side
Scripting (CSS) issue. However, this is based on a shallow understanding
of CSS. In order to avoid CSS, just setting whatever character encoding
is not good enough. A solution requires that the client side gets the
right character encoding. Of course, declaring iso-8859-1 as a default
doesn't work for a huge amount of Web pages. So this default should be
removed as quickly as possible, and the documentation for CSS should be
updated to make more clear that it's not "declare an encoding" but
"declare the right encoding" that is important (also for other reasons
than just security).

I can easily provide more information (e.g. a page that shows how use
of the wrong encoding, such as declaring a page as iso-8859-1 that
isn't iso-8859-1 can lead to attacks) if contacted directly.
Comment 15 Roy T. Fielding 2004-12-10 11:23:29 UTC
This was supposed to be fixed a long time ago.  It was for 1.3.
I am verifying with the group and will remove it from the default
config if there are no objections.
Comment 16 Roy T. Fielding 2004-12-11 07:20:16 UTC
Fixed in HEAD (2.1.x), may be backported later to 2.0.x.

svn rev 111582
Comment 17 Joe Orton 2005-01-10 16:42:18 UTC
*** Bug 33028 has been marked as a duplicate of this bug. ***