Bug 37285 - POST of document through CGI
Summary: POST of document through CGI
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Servlets:CGI (show other bugs)
Version: 5.5.12
Hardware: All Linux
: P2 major (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 38085 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-10-27 23:30 UTC by bkw
Modified: 2006-07-12 19:08 UTC (History)
1 user (show)



Attachments
Replacement for CGIServlet.java (since I don't know how to create a patch) (81.41 KB, text/plain)
2006-01-04 22:12 UTC, Richard A DeVenezia
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bkw 2005-10-27 23:30:55 UTC
If an XML document is posted through CGI the CGIServlet is prepending the 
content with the query parameters from the URL.  Since the content is expected 
to be well formed XML (as specified by specifying "Content-Type: text/xml") the 
prepended query parameters cause the XML parsing to fail.  Assuming a test.sh 
script has been configured for Tomcat which contains:

  java test <&0 >test.out

where test.java simply echoes the input (there is probably an easier way to do 
this) as follows:

  public class test {
    public static void main(String[] args) {
      try {
        int c = System.in.read();
        while(c > 0) {
          System.out.print((char)c);
          c = System.in.read();
        }
      } catch(Exception e) {e.printStackTrace();}
    }
  }

Then exercise the Tomcat server by launching a telnet session for the Tomcat 
host and port, such as "telnet localhost 8080", and enter the following:

  POST /test/cgi-bin/test.sh?a=123&b=xyz HTTP/1.1
  Host: localhost:8080
  Content-type: text/xml; charset=UTF-8
  Content-length: 250
  Connection: close

  <?xml version="1.0" encoding="UTF-8" ?>
  <Root>
  <Date>2005-10-27 00:00:00</Date>
  </Root>

The resulting test.out file will contain the submitted XML document prepended 
with a line containing the query parameters (a=123&b=xyz).  Unfortunately the 
XML parser that is recieving this data is expecting XML data and the query 
parameters cause an exception (since they are invalid XML).  Note also that 
since the query parameters are included their size is also counted against the 
provided Content-length value which results in truncating the document when the 
Content-length is specified as the size of the XML document.
Comment 1 Yoav Shapira 2005-11-18 16:56:51 UTC
Just so I understand, you're requesting that we modify the CGIServlet to not
echo query parameters and not count them in the content-length of the response?
Comment 2 bkw 2005-11-18 17:55:54 UTC
In comparing Tomcat CGI implementation with Apache 2, when a POST of content 
type text/xml occurs for Apache the query parameters are not included as part 
of the content (posted XML document).  Through scripts it is possible to 
identify and strip off the line containing the query parameters, but this seems 
unnecessary.
Comment 3 Mark Thomas 2006-01-02 14:54:03 UTC
*** Bug 38085 has been marked as a duplicate of this bug. ***
Comment 4 Richard A DeVenezia 2006-01-02 19:02:50 UTC
Part of the problem is that CGIRunner.run has to deal with the feature of 
HttpServlet that does _not_ discriminate between POST and GET.

I recall reading in passing somewhere, that HttpServlet will parse the POST 
contents into an internal data structure for enumeration via getParameterNames 
and retrieval via getParameterValue, if and only if the CONTENT_TYPE 
is "application/x-www-form-urlencoded".  This act will likely drain the stdin 
and lose whatever original content was there.

Additionally, the parameters parsed from the GET QUERY_STRING are thrown into 
the getParameterNames mix.


In run(), when POST is observed, 
1. stdin is read into a buffer named content.  I speculated stdin is pre-
drained if content type is urlencoded.
2. the HttpServlet parameters are written to the stream (S) that will be the 
cgis stdin.
3. the 'content' buffer is appended to S {buffer is empty if urlencoded and the 
whole thing if multipart form}

One problem I see is that content length is not increased when content buffer 
is written.  Maybe it gets measured elsewhere?

My original foray into this bug is due to that fact I am deploying a cgi app 
(YaBB) that does not use standard parameter separators -- it's legacy nature 
has parameters being separated by & or ; .

The cgi app was not getting the parameters it needed because HttpServlet 
encoded or escaped the semis and subsequent equals to their equivalent %{nn} 
forms.  Since I can't recall the code numbers I'll use %.. For example: 
foo.pl?a=1&b=2;c=3;d=4
got HttpServlet parsed into parameters
a=1
b=2%..c%..3%..d%..4

Things would be much cleaner if (presumably) HttpServlet did not automatically 
drain stdin, and then CGIServelt could output verbatim the POST contents to the 
cgis stdin and not fool around with looping over parsed out parameters.
Comment 5 Richard A DeVenezia 2006-01-04 22:12:02 UTC
Created attachment 17327 [details]
Replacement for CGIServlet.java (since I don't know how to create a patch)

Thanks to Mark T. and Martin D.
Comment 6 Mark Thomas 2006-07-09 23:37:06 UTC
I have been reviewing this bug and some of the various unofficial CGI specs (if
there was an RFC, this would be so much easier) including:
http://cgi-spec.golux.com/
and
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

The current parameter handling is somewhat at odds with these specs. I intend to
commit a patch over the next few days that will:
- Only provide parameters on the command line for 'indexed' queries
- Always provide the query string via the QUERY_STRING environment variable
- Always provide any POST'd content un-modified via stdin
- Never call getParameters()

This should resolve the issues set out in this bug and more closely align the
behaviour of the CGI Servlet with other CGI providers.
Comment 7 Mark Thomas 2006-07-13 02:08:16 UTC
I have commited a patch that implements the changes outlined in my previous
comment. This change fixes this and the related bug.