Some popular JavaScript libraries have started to set cookie values in the browser directly and include ISO-8859-1 (Latin-1) characters in the range 0xA0-0xFF. When the Cookie header is parsed by Tomcat, the request fails with an IllegalArgumentException[1] from the connector without giving the application an opportunity to validate the cookie value received. RFC2616 (HTTP/1.1) allows header field-values to contain ISO-8859-1 characters which includes the range 0xA0-0xFF. RFC2109 (cookies) allows for "quoted-string" values which can contain TEXT octets (which includes those characters). This is different to cookie names which are defined as the more restricted "token" which only allows USASCII values. The original Netscape spec does not mention character encodings. [1] http://svn.apache.org/viewvc/tomcat/tc7.0.x/trunk/java/org/apache/tomcat/util/http/CookieSupport.java?revision=1200183&view=markup#l190
Created attachment 31139 [details] Fix to allow chars in the range 0xa0-0xff Patch allows characters in the range 0xA0-0xFF (so it continues to exclude controls both <0x20 and 0x80-0x9F). Added testcase for a Latin-1 character and test-suite passes. To keep it simple, this patch does not attempt to differentiate between quoted and unquoted values. It also does not attempt to deal with values containing UTF-8 encoded data.
This simple patch is not acceptable as it does not retain the limitation that cookie names must be tokens. Now might be the time to re-write the cookie parsing using the HttpParser. Given the 'fun' we have had with cookie processing in the past we need to be very careful about any changes we introduce. Now could be a good time to do this in 8.0.x and then back-port it once it is stable.
If we do revisit cookie parsing we should keep RFC6265 in mind as well as the fact that Tomcat moved to a strict adherence to the cookie specs in order to avoid a number of potential security issues.
I agree that this would be a good time for a larger cleanup. To keep things incremental I'll start with refining the patch (against trunk) to handle names and values separately.
Created attachment 31140 [details] Allow 0xa0-0xff in V0 values only Minimal patch allowing ISO-8859-1 characters in the range 0xa0-0xff for V0 values only. This refactors the check when processing tokens to allow 8-bit characters just for V0 values. They will still trigger an IllegalArgumentException if they appear in a name or in a V1 unquoted value. V1 quoted values already support them via a different code path. I discovered an issue (#55918) there where CTLs will not cause an IAE and will appear in the returned value. I've tagged the tests for that as @Ignored to be resolved in a different fix.
Patch applied to trunk as r1553187 to be included in release 8.0.0
The patch for this has been reverted from trunk
The new RFC6265 cookie parser (that also includes a new RFC2109 parser) correctly handles these values. I don't propose fixing the old parser.