If an XML document is posted through CGI the CGIServlet is prepending the content with the query parameters from the URL. Since the content is expected to be well formed XML (as specified by specifying "Content-Type: text/xml") the prepended query parameters cause the XML parsing to fail. Assuming a test.sh script has been configured for Tomcat which contains: java test <&0 >test.out where test.java simply echoes the input (there is probably an easier way to do this) as follows: public class test { public static void main(String[] args) { try { int c = System.in.read(); while(c > 0) { System.out.print((char)c); c = System.in.read(); } } catch(Exception e) {e.printStackTrace();} } } Then exercise the Tomcat server by launching a telnet session for the Tomcat host and port, such as "telnet localhost 8080", and enter the following: POST /test/cgi-bin/test.sh?a=123&b=xyz HTTP/1.1 Host: localhost:8080 Content-type: text/xml; charset=UTF-8 Content-length: 250 Connection: close <?xml version="1.0" encoding="UTF-8" ?> <Root> <Date>2005-10-27 00:00:00</Date> </Root> The resulting test.out file will contain the submitted XML document prepended with a line containing the query parameters (a=123&b=xyz). Unfortunately the XML parser that is recieving this data is expecting XML data and the query parameters cause an exception (since they are invalid XML). Note also that since the query parameters are included their size is also counted against the provided Content-length value which results in truncating the document when the Content-length is specified as the size of the XML document.
Just so I understand, you're requesting that we modify the CGIServlet to not echo query parameters and not count them in the content-length of the response?
In comparing Tomcat CGI implementation with Apache 2, when a POST of content type text/xml occurs for Apache the query parameters are not included as part of the content (posted XML document). Through scripts it is possible to identify and strip off the line containing the query parameters, but this seems unnecessary.
*** Bug 38085 has been marked as a duplicate of this bug. ***
Part of the problem is that CGIRunner.run has to deal with the feature of HttpServlet that does _not_ discriminate between POST and GET. I recall reading in passing somewhere, that HttpServlet will parse the POST contents into an internal data structure for enumeration via getParameterNames and retrieval via getParameterValue, if and only if the CONTENT_TYPE is "application/x-www-form-urlencoded". This act will likely drain the stdin and lose whatever original content was there. Additionally, the parameters parsed from the GET QUERY_STRING are thrown into the getParameterNames mix. In run(), when POST is observed, 1. stdin is read into a buffer named content. I speculated stdin is pre- drained if content type is urlencoded. 2. the HttpServlet parameters are written to the stream (S) that will be the cgis stdin. 3. the 'content' buffer is appended to S {buffer is empty if urlencoded and the whole thing if multipart form} One problem I see is that content length is not increased when content buffer is written. Maybe it gets measured elsewhere? My original foray into this bug is due to that fact I am deploying a cgi app (YaBB) that does not use standard parameter separators -- it's legacy nature has parameters being separated by & or ; . The cgi app was not getting the parameters it needed because HttpServlet encoded or escaped the semis and subsequent equals to their equivalent %{nn} forms. Since I can't recall the code numbers I'll use %.. For example: foo.pl?a=1&b=2;c=3;d=4 got HttpServlet parsed into parameters a=1 b=2%..c%..3%..d%..4 Things would be much cleaner if (presumably) HttpServlet did not automatically drain stdin, and then CGIServelt could output verbatim the POST contents to the cgis stdin and not fool around with looping over parsed out parameters.
Created attachment 17327 [details] Replacement for CGIServlet.java (since I don't know how to create a patch) Thanks to Mark T. and Martin D.
I have been reviewing this bug and some of the various unofficial CGI specs (if there was an RFC, this would be so much easier) including: http://cgi-spec.golux.com/ and http://hoohoo.ncsa.uiuc.edu/cgi/interface.html The current parameter handling is somewhat at odds with these specs. I intend to commit a patch over the next few days that will: - Only provide parameters on the command line for 'indexed' queries - Always provide the query string via the QUERY_STRING environment variable - Always provide any POST'd content un-modified via stdin - Never call getParameters() This should resolve the issues set out in this bug and more closely align the behaviour of the CGI Servlet with other CGI providers.
I have commited a patch that implements the changes outlined in my previous comment. This change fixes this and the related bug.