Bug 25015

Summary: CoyoteAdapter is breaking path info
Product: Tomcat 5 Reporter: John Trollinger <jakarta>
Component: Connector:CoyoteAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED WONTFIX    
Severity: normal    
Priority: P3    
Version: 5.0.15   
Target Milestone: ---   
Hardware: Other   
OS: other   

Description John Trollinger 2003-11-26 13:57:58 UTC
The CoyoteAdapter removes valid path info in the postParseRequest() method.  

take the following url

http://localhost/appname/servlet-name/extra;path/info;here/hi.jsp

when you call getPathInfo() on the request it returns /extra

this is because the CoyoteAdapters postParseRequest() method finds the 
first ';' and truncates the path.  According to RFC1738 this is a valid path.  

The class org.apache.tomcat.util.http.mapper.Mapper handles the path correctly 
but is never given the right path from the CoyoteAdapter.

thanks,

John
Comment 1 Remy Maucherat 2003-11-26 19:42:34 UTC
I do not agree with your analysis. ';' is a reserved character, and used to
define parameters. Please do not reopen this report.
Comment 2 John Trollinger 2003-11-26 20:22:26 UTC
Remy, 

Yes the ; is for parameters, but parameters are allowed in the path and are 
valid.  Why would it be correct behavior for tomcat to not send all the path 
info because the path had parameters in it?

John
Comment 3 Remy Maucherat 2003-11-27 00:18:04 UTC
The actually relevant RFC is 2396.
I think the parameters do not belong to any of the processed paths returned by
Tomcat.
However, I've read in the RFC that a parameter applies to a path segment (!) as
separated by '/', so the unparameterized URL in your example should likely be:
http://localhost/appname/servlet-name/extra/info/hi.jsp. I'm not to sure about
my understanding of the RFC, and I don't plan to fix this issue (clean patch
welcome, I suppose, though).
If you want to use the parameters, you can get the original request URI, as usual.
Comment 4 John Trollinger 2003-12-03 17:43:24 UTC
Remy, 

Is the way tomcat is now spec complient?  

According to the spec 

PathInfo: The part of the request path that is not part of the Context Path or
the Servlet Path. It is either null if there is no extra path, or is a string 
with a leading ‘/’.

so clearly if my url was 
http://localhost/appname/servletname/extra;a/path;b/info;c
then if /appname is my context
and /servletname is my servlet path
then /extra;a/path;b/info;c should be my path info according to the spec  (see 
SRV.4.4 and table 2)

so if I submit a patch for this would you except this behavior?

John
Comment 5 Remy Maucherat 2003-12-03 17:56:26 UTC
I say no. What if there are path parameters elsewhere ? You remove them ? It
doesn't make any sense.
Comment 6 John Trollinger 2003-12-03 18:04:58 UTC
I dont know how path parms are used normaly (personaly I have never seen them 
used except in our app) so I am not sure how they should be treated, but 
following the spec it seems the should be included in path info.

I would like to get some feedback from others (hopefuly someone else out there 
uses them)
Comment 7 Remy Maucherat 2003-12-03 18:23:10 UTC
getRequestURI will return the full URI, right (including all the parameters) ? I
think if you need them, it is more consistent to work from that. Since
parameters will need to be stripped out for mapping (you find the path info only
after the mapping is done, so you "can't" (= hard) add back the parameters
reliably), I don't see how it can be made to work automagically.
Comment 8 John Trollinger 2003-12-03 18:26:56 UTC
Well the biggest problem right now is 

http://localhost/appname/servletname/extra;a/path;b/info;c
then if /appname is my context
and /servletname is my servlet path

right now tomcat returns only /extra which is not right no matter how you look 
at it

Comment 9 John Trollinger 2003-12-04 16:05:52 UTC
Remy, 

I am trying to make a patch for this and have a question I hope you can answer.

in the CoyoteAdapter the MessageBytes class is used.  This class holds 2 
arrays, one of char and one of byte.  The CoyoteAdapter class modifies the byte 
array and not the char array so the data in the MessageBytes class is out of 
sync.  Is this desired effect or should these arrays hold the same data.

Looking at the comment on line 297 of CoyoteAdapter (rev. 1.14) it thinks that 
the jsessionid has been removed but it has only been removed from the byte[] 
not the char[] (from the methods MessageBytes.getByteChunk() and 
MessageBytes.getByteChunk())

I think the char[] and byte[] inside MessageBytes should be the same.

any help would be great.

also according the the spec requestURI = contextPath + servletPath + pathInfo

so if the requestURI has the path parms the pathInfo HAS to

john
Comment 10 Remy Maucherat 2003-12-04 17:28:26 UTC
I don't agree, they simply didn't think about path parameters (I didn't know you
could put them in the middle of the path). Since path parameters won't be
mapped, what if there are parameters in the servlet path ?
I'd like to point out also that the session id parameter, if present, is always
removed from the request URI. That could possibly imply that other parameters
should also be removed everywhere.
So I don't really agree with handling parameters at this time, because it's a mess.

As for your question, at this time of the processing, the URI has been converted
to chars, so use the CharChunk, as the current algorithm is doing.
Comment 11 John Trollinger 2003-12-04 17:45:21 UTC
Do you not want a patch for this at all, or do you want one that just rips out 
the parms and returns only the path?
Comment 12 John Trollinger 2003-12-04 17:49:37 UTC
> As for your question, at this time of the processing, the URI has been 
> converted to chars, so use the CharChunk, as the current algorithm is doing.

Well as it is now the jsessionid is only being remove from the byte array 
(ByteChunk) the only reason you do not see the jsessionid in the path is the 
code finds the first ';' and drops it and everything after it.

Comment 13 Remy Maucherat 2003-12-04 17:54:25 UTC
At this point, removing path parameters seems to me the way to go, but I don't
know ... I'll mark this as remind.
Comment 14 John Trollinger 2003-12-04 17:55:55 UTC
One last thing, as for you question about context and servlet path, having 
parms in them does not work at all.  (it either does not find the context, or 
it does not find the servlet).  How do you get clarification on the spec for 
these type of things (trust me when I say I dont know why anyone would use 
these either, but I inherited this code that does)

John
Comment 15 Amy Roh 2003-12-10 22:40:24 UTC
Servlet spec folks talked about this (parameters in path and whether getpathinfo
should return them or not), and they couldn't get the consensus. Most people
seem to like that getPathInfo should NOT include the parameters, but we haven't
had a thorough discussion and that's listed as an item for the next version of
the spec. So, for now, it's container-specific but the servlet spec lead
recommends to remove them.
Comment 16 Remy Maucherat 2003-12-11 18:39:18 UTC
Ok, thanks, but should this URI: /appname/servlet-name/extra;path/info;here/hi.jsp
Become: /appname/servlet-name/extra/info/hi.jsp
Or: /appname/servlet-name/extra

I think the URI RFC (that nobody cares about in the HTTP world) specifies the
first one :)
Tomcat currently does the second one.
I'm reopening the bug, as we'll get a resolution, apparently.
Comment 17 Amy Roh 2003-12-11 19:44:43 UTC
I don't think the spec has clearly resolved this issue.

Right now, it somewhat implies that the return value of getPathInfo() include
the things after ";".

SRV.4.1
"Path parameters that are part of a GET request are not exposed by these
APIs.(getParameter, etc.) They must be parsed from the String values returned by
the getRequestURI method or the getPathInfo method."

However, practically, the path parameter can be placed ANYWHERE in the URL and
it is confusing that getPathInfo returns them. 

I think returning "/appnama/servlet-name/extra" is fine.

As long as getRequestURI contains the path params, I don't think it violates the
spec.
Comment 18 John Trollinger 2003-12-15 12:34:00 UTC
Should this work ?

    <servlet-mapping>
        <servlet-name>MyServlet</servlet-name>
        <url-pattern>/test/path/here</url-pattern>
    </servlet-mapping>

invoke servlet -> http://localhost/context/test;a/path;b/here

or is the url-pattern required to have the path parm in it (same with a context 
having a parm in it) because neither of these work now.

Comment 19 Remy Maucherat 2003-12-15 18:22:59 UTC
What Amy said implies: no, this doesn't work.
Comment 20 John Trollinger 2003-12-15 19:08:55 UTC
Amy, I am confused.. the spec says that I should be able to get the parm from 
the getRequestURI or getPathInfo.  are you reading this as the contain must 
provide it in just one of these methods?  It seems to me the spec is saying I 
should be able to retieve it from either method, not just one (which ever your 
vendor supports)

SRV.4.1
"Path parameters that are part of a GET request are not exposed by these
APIs.(getParameter, etc.) They must be parsed from the String values returned by
the getRequestURI method or the getPathInfo method."
Comment 21 Amy Roh 2003-12-15 23:08:00 UTC
Two things:

1. getPathInfo should/shouldn't return ";blah"

   I think it's ok to leave it as it is for now since the spec
   is most likely going to be changed in the next version.
   I asked 154EG spec lead to consider removing the wording of
   "getPathInfo() method" in SRV.4.1 and he said they address
   this issue again and probably do remove as some members
   already mentioned that.

2. matching

  Should this work ?

    <servlet-mapping>
        <servlet-name>MyServlet</servlet-name>
        <url-pattern>/test/path/here</url-pattern>
    </servlet-mapping>

  invoke servlet -> http://localhost/context/test;a/path;b/here

  or is the url-pattern required to have the path parm in it (same with
  a context having a parm in it) because neither of these work now.
  ------------------------
  For the above comment, SRV.11.1 says:
  "The path used for mapping to a servlet is the request URL from the
   request object minus the context path and the path parameters."

  That implies the deployment descriptor above should work
  and the <url-pattern> should not contain the path parameters.
  However, the spec doesn't talk about the case when the path params
  are within the context path. So, it's up to the container, and I
  would think that we can apply the same rule for the context path too.