Bug 36541

Summary: session getAttribute/setAttribute and removeAttribute are NOT Thread safe.
Product: Tomcat 5 Reporter: Leon Rosenberg <struts_user>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: critical CC: bsouther, craig.mcclanahan
Priority: P1    
Version: 5.0.25   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: compiled fix for 5.0.25
patch for 5.0.19+, source code
bug reproduction

Description Leon Rosenberg 2005-09-07 12:46:44 UTC
I'm not quite sure if it's a bug or spec flaw, but I talked to Craig McClanahan
and he encouraged me to submit it.
The session attribute handling methods in 5.0.x aren't thread safe. The
org.apache.catalina.session.StandardSession and StadardSessionFacade do not
synchronize in get/set/remove Attribute. The result is following:
If you write and read from the session simultaneously with multiple threads the
getter/setter methods corrupt the underlying HashMap. The HashMap's entries got
circularly linked, and any thread using a get() on such a HashMap will spin
forever chasing its tail (quoted from Larry Isaacs). 

Now what Josh Bloch and Co. are saying in the javadoc for HashMap:
 * <p><b>Note that this implementation is not synchronized.</b> If multiple
 * threads access this map concurrently, and at least one of the threads
 * modifies the map structurally, it <i>must</i> be synchronized externally.
 * (A structural modification is any operation that adds or deletes one or
 * more mappings; merely changing the value associated with a key that an
 * instance already contains is not a structural modification.)  This is
 * typically accomplished by synchronizing on some object that naturally
 * encapsulates the map.  If no such object exists, the map should be
 * "wrapped" using the <tt>Collections.synchronizedMap</tt> method.  This is
 * best done at creation time, to prevent accidental unsynchronized access to
 * the map: <pre> Map m = Collections.synchronizedMap(new HashMap(...));
 * </pre>

The bug is quite easy to fix, by making the map synchronized (as stated above)
or explicitely synchronize the places where HashMap.get() or put() is called. 
I could provide a patch if wished.
Comment 1 Tim Funk 2005-09-07 13:12:27 UTC
And to prevent the REOPEN ...

SRV.7.7.1 Threading Issues
Multiple servlets executing request threads may have active access to a single
session object at the same time. *The Developer has the responsibility for
synchronizing access to session resources as appropriate.*
Comment 2 Leon Rosenberg 2005-09-07 15:21:36 UTC
Sorry, but this means that the develop is responsible for the object he puts in
session, not for the session itself.
To quote Graig:
the underlying spec language (SRV.7.7.1) was originally intended to remind
application users that they must make their *own* objects threadsafe if they are
stored as a session attribute. To use that language as an excuse for the
*container* not having to make its own collections threadsafe means that the
language is broken.
So, reopen it now?
Comment 3 Remy Maucherat 2005-09-07 15:39:44 UTC
(In reply to comment #2)
> Sorry, but this means that the develop is responsible for the object he puts in
> session, not for the session itself.
> To quote Graig:
> the underlying spec language (SRV.7.7.1) was originally intended to remind
> application users that they must make their *own* objects threadsafe if they are
> stored as a session attribute. To use that language as an excuse for the
> *container* not having to make its own collections threadsafe means that the
> language is broken.
> So, reopen it now?

Don't bother ;)
Comment 4 Rick Knowles 2005-09-07 15:45:16 UTC
(In reply to comment #2)

Heheh ... I think you answered the question for yourself ... fix the spec 
first, then the container implementations second.
Comment 5 Leon Rosenberg 2005-09-07 16:34:41 UTC
(In reply to comment #4)
> (In reply to comment #2)
> 
> Heheh ... I think you answered the question for yourself ... fix the spec 
> first, then the container implementations second.

Interesting, servlet spec 2.3 has exact the same wording of SRV.7.7.1 Threading
Issues and tomcat 4.1.31, which is the official implementation of the 2.3 spec,
has a SYNCHRONIZED session. 

So tomcat 4.x is buggy, or tomcat 5.x is. You say it's the 4.x, I say it's 5.x.
Comment 6 Rick Knowles 2005-09-07 16:47:05 UTC
(In reply to comment #5)

Actually I'd say neither is buggy, since they both implement the spec as it's 
written. If that's not what was intended, then as you quoted Craig saying: "the
language is broken", and the spec needs to be changed.
Comment 7 Craig McClanahan 2005-09-07 18:08:30 UTC
(In reply to comment #6)
> (In reply to comment #5)
> 
> Actually I'd say neither is buggy, since they both implement the spec as it's 
> written. If that's not what was intended, then as you quoted Craig saying: "the
> language is broken", and the spec needs to be changed.

I would agree that neither implementation is buggy -- it is entirely legal for a
servlet container to avoid letting an application corrupt its interna data
structures.  It's too bad that the current Tomcat developers care more about
performance than they care about reliability.

If you aren't going to change it back to the 4.1 implementation (with
synchronization locks around the accesses), please take my name out of the
@author tag for org.apache.catalina.session.StandardSession -- this code does
*not* represent anything I wish to be associated with.
Comment 8 Remy Maucherat 2005-09-07 18:45:47 UTC
(In reply to comment #7)
> I would agree that neither implementation is buggy -- it is entirely legal for a
> servlet container to avoid letting an application corrupt its interna data
> structures.  It's too bad that the current Tomcat developers care more about
> performance than they care about reliability.

Of course. Thankfully, you guys have glassfish now for good reliability, so you
don't have to deal with these loony Tomcat developers anymore ;)

> If you aren't going to change it back to the 4.1 implementation (with
> synchronization locks around the accesses), please take my name out of the
> @author tag for org.apache.catalina.session.StandardSession -- this code does
> *not* represent anything I wish to be associated with.

Sure, I have no problem with that if it's your wish.

These changes were made as an experiment (the spec allows this to be
non-synced), and included in 5.0.19+ and as a consequence in all the most
popular 5.0.x releases. In the end, it would seem it worked reasonably well.
However, I did get a few reports of corruption a while after this, and I added
back synchrnozation on write operations to the map in the 5.5 branch. I never
ported this back to 5.0.x given the lack of demand. Apparently, you didn't look
at it.

All that got discussed in the past, including the readding of some of the syncs.
Comment 9 Leon Rosenberg 2005-09-07 21:29:36 UTC
After we finish clarifying who's is bigger (and mine is best case average), we
could return to the bug.

1. The bug isn't invalid, so I reopen it. You can set it to WONTFIX, but not to
INVALID.

2. Applications working well with tomcat 4, resin or probably any other servlet
container do not work with tomcat 5.0.x or tomcat 5.5.x. Thus tomcat 5 seems to
be not compatible to the servlet spec (or be the only one compatible), if not in
the language of the specification itself, but in the intension of it. So tomcat
5 can be considered BROKEN.

3. The bug applies to tomcat 5.5.x as well, because 
"... added back synchrnozation on write operations to the map in the 5.5
branch... " doesn't fix anything, since the read operation are the problem and
MUST be synchronized. 

4. The effort to provide a workaround for this problem is enormous compared to
the effort to fix the bug. Each and every framework around there now MUST
provide two versions, the normal version and the tomcat 5 compatible version.
You can't use struts, tapestry, jstl, actually NOTHING existing with tomcat 5.

5. The idea of removing synchronization to gain performance is absurd.
Who needs the performance? People like us, who have 2000-3000 concurrent users
on each webserver. But people who have 2000-3000 concurrent users, have also
many concurrent requests, even from the same user, so instead of gaining
performance we are gaining CRASHES. This bug killed 8 (!!!) webservers yesterday. 

6. Consider the flurry in the tomcat users community, if the above points + your
 refusal to provide a three LOCs fix gonna make tomcat 5 UNUSEABLE.
Comment 10 Wade Chandler 2005-09-07 21:41:36 UTC
Look, the basic problem is that the underlying implementation used to support
attribute holders is being used in error according to it's documentation
(java.util.HashMap documentation).  It doesn't just leave the data in an
inconstistent state, but rather crashes systems.  If the developer has to
synchronize access to the session or application or context every where
attributes are being set then why not change the base application (Tomcat) to do
this already....basically the application, context, and session are useless if
nothing can be stored in them.  So, they have to be synchronized anyways, so
this should be done at the Tomcat level.  5.5.x and 5.0.x really need to be
fixed because it doesn't make sense not to.  Basically I'm getting this from the
conversation (and anyone else should be).....the spec says it's up to the
developer to synchronize........the developer has to synchronize every where the
attributes are set and get.....so what is the difference if Tomcat does it or
the developer does it?  Plus, does the spec also say it's up to the developer to
synchronize request, context, and session attributes?  It's been a while since I
have read the servlet spec, but I have the 2.3 version on disk and will skim
over it again, but even if the spec says it....it doesn't make sense not to
synchronize code that has to be synchronized one way or another anyways.
Comment 11 Wade Chandler 2005-09-07 21:58:02 UTC
Something I'm curious about.  If EL is used on in a JSP and a session object
accessed and manipulated is this synchronized?  What package should I look in
for that to check?
Comment 12 Wade Chandler 2005-09-07 22:08:17 UTC
So I verified that at least the 5.5.9 code only synchronized the setAttribute
and not also the getAttribute....why?  
Comment 13 Leon Rosenberg 2005-09-07 22:22:04 UTC
(In reply to comment #12)
> So I verified that at least the 5.5.9 code only synchronized the setAttribute
> and not also the getAttribute....why?  

to quote  Remy Maucherat:
it was an experiment and he added back synchrnozation on write operations to the
map in the 5.5 branch...

Actually it doesn't make any sense because you need both get and set to be
synchronized, to have the desired effect, so I think we should consider this a
part of the "experiment". 
Comment 14 Wade Chandler 2005-09-07 22:40:08 UTC
Well at least ApplicationContext.java and ApplicationRequest.java are
synchronizing their attributes in both cases put/get.  So, that is one thing not
to worry about, but there are other issues, and I'm betting the EL accessing
code isn't synchronizing because looking at the code for jasper2 and looking at
the runtime file PageContextImpl.java in the scoped getAttribute which calls the
scoped doGetAttribute it does not synchronize the code either.  So, basically
what that says is if a JSP developer were to use jsp:useBean and creates a
session bean then there are going to be problems because as soon as this happens
an unsynchronized call is taking place.  

Thing that makes this debate over whether to change it or not silly is that in
the scoped calls the application and request code is already synchronized at the
class level so synchronizing only on the session at this point seems weird.  

Regardless....if this tiny three line change doesn't take place then there are
more lines in Tomcat that apparently need to be changed, because this bug causes
other issues.
Comment 15 Darryl Miles 2005-09-07 23:23:11 UTC
Why not keep everybody happy, simply make the session management class a
configurable option at both Server and Context level, if the TC developers wish
to "experiment" they can configure the un-synchronized access they so desire.
Comment 16 Richard Dunn 2005-09-07 23:43:25 UTC
I strongly agree with Craig, et al to error on the side of a more robust implemention by using 
synchronization on the side of Tomcat. I think doing otherwise would be doing a high-wire act without a 
net. There are too many places for a developer to miss handling the threading correctly to leave this as is. 
Comment 17 Frank W. Zammetti 2005-09-08 00:36:36 UTC
I've been following this with a lot of interest ever since Leon raised the 
issue yesterday, and I discussed it with some folks at work today a bit.

Looking at this strictly from a developers' point of view, I could care less 
what the specs says, and I could care less why the Tomcat code is the way it 
is.  The bottom-line is that if I call session.getAttribute() or setAttribute
() in a servlet or Struts Action or whatnot, I do not expect there to be any 
chance of the internal data structures getting corrupted, or me getting back 
inconsistent data, and I most definitely do not expect there to be any chance 
of a server hang or crash, and I do not expect to have to do anything myself 
to ensure any of this.  Any other answer is, to me and to those I spoke to, 
just plain wrong.

I am in absolute agreement with those saying this needs to be fixed.  I do not 
recall ever having been bitten by this problem, but it's just subtle enough 
that I might not have known if I did.

I don't think anyone is looking to place blame here.  I don't care what was 
originally in the Tomcat code and what is there now or why it was changed.  
This simply comes down to a legitimate issue that needs to be resolved.  It's 
not even a minor issue frankly, it's a pretty substantial one, regardless of 
the fact that it apparently hasn't caused huge problems for everyone.

If the spec needs to be fixed, no problem, contact who needs to be contacted 
and let them know.  But that DOES NOT mean you serialize and wait for them to 
do their thing.  There is a solution here that, to my understanding, isn't 
contrary to the spec as it exists today anyway, so not following through with 
it is kind of silly.  I'm sure any number of people would be willing to submit 
a patch for this if it's an issue of not having time, but to be arguing about 
whether it should be fixed or not doesn't seem to be reasonable on this one.
Comment 18 Wade Chandler 2005-09-08 00:42:52 UTC
(In reply to comment #17)

> I don't think anyone is looking to place blame here.  I don't care what was 
> originally in the Tomcat code and what is there now or why it was changed.  
> This simply comes down to a legitimate issue that needs to be resolved.  It's 
> not even a minor issue frankly, it's a pretty substantial one, regardless of 
> the fact that it apparently hasn't caused huge problems for everyone.

I'm not even sure if I've been bitten by this either, but I have seen on the
list numerous people speaking of running out of Tomcat threads and setting their
connections to the max.  If this issue were causing problems they might be
having it and not even realize it.

> If the spec needs to be fixed, no problem, contact who needs to be contacted 
> and let them know.  But that DOES NOT mean you serialize and wait for them to 
> do their thing.  There is a solution here that, to my understanding, isn't 
> contrary to the spec as it exists today anyway, so not following through with 
> it is kind of silly.  I'm sure any number of people would be willing to submit 
> a patch for this if it's an issue of not having time, but to be arguing about 
> whether it should be fixed or not doesn't seem to be reasonable on this one.

I agree....synchronizing these calls isn't going to be contrary to the spec in
no way at all.
Comment 19 Jason Lea 2005-09-08 01:08:17 UTC
I wonder if the new java.util.concurrent classes could be used instead 
of simple HashMap?

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html

but that would mean total dependence on j2se 1.5 and that would be a 
problem for supporting j2se 1.4, though a backport is being worked on here

http://www.mathcs.emory.edu/dcl/util/backport-util-concurrent/

other reading here:  
http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html

I think that sort of thing would provide a nice solution for speed + 
reliability.  Using this as the underlaying base would/should fix issues with EL
access too.
Comment 20 Remy Maucherat 2005-09-08 04:27:14 UTC
As discussed on tomcat-dev after I reexamined StandardSession code further,
adequate synchrnoization seems to be in place in the current Tomcat 5.5 code so
that the infinite loop situation described in this issue does not occur. Please
do not reopen the report.
Comment 21 Craig McClanahan 2005-09-08 05:25:01 UTC
(In reply to comment #15)
> Why not keep everybody happy, simply make the session management class a
> configurable option at both Server and Context level, if the TC developers wish
> to "experiment" they can configure the un-synchronized access they so desire.
> 

Actually, the session management code *is* already configurable, albeit not
trivially.  You can include a <Manager> element inside your <Context> element
and create a custom implementation of the Manager interface that returns session
instances (probably subclassed from StandardSession) that do the locking for you.

The key question remains what the default behavior should be, and/or whether
there should be an easy boolean setting to turn this particular capability on or
off.
Comment 22 Craig McClanahan 2005-09-08 05:32:17 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I would agree that neither implementation is buggy -- it is entirely legal for a
> > servlet container to avoid letting an application corrupt its interna data
> > structures.  It's too bad that the current Tomcat developers care more about
> > performance than they care about reliability.
> 
> Of course. Thankfully, you guys have glassfish now for good reliability, so you
> don't have to deal with these loony Tomcat developers anymore ;)
> 

Apparently it is indeed a good thing :-).  The appropriate patch was just
committed to the Glassfish repository.  If people want an open source
implementation of the servlet spec where the developers listen to users on
issues like this, you might want to browse over to:

    https://glassfish.dev.java.net

and take a look.

> > If you aren't going to change it back to the 4.1 implementation (with
> > synchronization locks around the accesses), please take my name out of the
> > @author tag for org.apache.catalina.session.StandardSession -- this code does
> > *not* represent anything I wish to be associated with.
> 
> Sure, I have no problem with that if it's your wish.
> 

This (removing my name from the @author tag on StandardSession), both here and
everywhere else in the Tomcat code base, would indeed be my wish.
Comment 23 Leon Rosenberg 2005-09-08 08:50:57 UTC
(In reply to comment #20)
> As discussed on tomcat-dev after I reexamined StandardSession code further,
> adequate synchrnoization seems to be in place in the current Tomcat 5.5 code so
> that the infinite loop situation described in this issue does not occur. Please
> do not reopen the report.

Remy, sorry, but you are wrong. The read is the problem, not the write. During
the write process the hashmap is in flux, so if a read occurs at this time and
is not synchronized the thread which reads is killed.
So 5.5.x is as broken as 5.0.x is
Comment 24 Peter Rossbach 2005-09-08 11:01:31 UTC
Hmm, I also thing the spec interpretation from the current
StandardSession inside Tomcat 5.0 and 5.5. is wrong. We had the same problem
at Cluster DeltaSession for year ago. We now sync also the session attribute 
read operations. Without this fix the Cluster setup is useless for
production servers. 

I vote for change the StandardSession implementation to..

   public Object getAttribute(String name) {

        if (!isValid())
            throw new IllegalStateException(sm
                    .getString("standardSession.getAttribute.ise"));

        synchronized (attributes) {
            return (attributes.get(name));
        }

    }

    public Enumeration getAttributeNames() {

        if (!isValid())
            throw new IllegalStateException(sm
                    .getString("standardSession.getAttributeNames.ise"));

        synchronized (attributes) {
            return (new Enumerator(attributes.keySet(), true));
        }

    }
===

Peter
Comment 25 Leon Rosenberg 2005-09-08 11:35:14 UTC
Created attachment 16339 [details]
compiled fix for 5.0.25

a compiled patch for 5.0.19+
Comment 26 Leon Rosenberg 2005-09-08 11:36:45 UTC
Created attachment 16340 [details]
patch for 5.0.19+, source code

patch for 5.0.19+, source code
Comment 27 Leon Rosenberg 2005-09-08 11:42:53 UTC
I submitted a patch for 5.0.19+. To install it:
jar -xf catalina.jar
replace StandardSession.class with one attached.
make a new jar. 
Alternatively you can rebuild complete tomcat, so i supplied the source code too :-)

It would be cool to provide a download location for the patched catalina.jar, i
don't think submitting a 700K file into bugzilla is the correct behaviour, so
I'm not doing it.

Comment 28 Remy Maucherat 2005-09-08 12:23:46 UTC
(In reply to comment #24)
> Hmm, I also thing the spec interpretation from the current
> StandardSession inside Tomcat 5.0 and 5.5. is wrong. We had the same problem
> at Cluster DeltaSession for year ago. We now sync also the session attribute 
> read operations. Without this fix the Cluster setup is useless for
> production servers. 
> 
> I vote for change the StandardSession implementation to..

Ok. I don't know yet, since the repository is inconclusive, and doesn't match
what you wrote. I see DeltaSession (introduced first along with 5.0.15+) has
always had (from revision 1.1) syncs on everything (read/writes), so I don't see
any conclusive information showing that the current 5.5.x code does produce the
bug described here. Obviously, if 5.5.x really is still bad for this issue,
it'll have to be fixed.

(In reply to comment #23)
> Remy, sorry, but you are wrong. The read is the problem, not the write. During
> the write process the hashmap is in flux, so if a read occurs at this time and
> is not synchronized the thread which reads is killed.
> So 5.5.x is as broken as 5.0.x is

This means you've tested with 5.5.x, and reproduced an issue ? Or written a
microbenchmark showing a problem with reads ? All I can see is that you're
saying "issue on read, 5.5.x same problem" like a broken record.

(In reply to comment #22)
> Apparently it is indeed a good thing :-).  The appropriate patch was just
> committed to the Glassfish repository.  If people want an open source
> implementation of the servlet spec where the developers listen to users on
> issues like this, you might want to browse over to:
> 
>     https://glassfish.dev.java.net
> 
> and take a look.

Lol, whatever. The web tier of Glassfish is really an ugly behind-the-back fork.
All that was required to avoid the ill feelings is a bit of respect, being
informed just a little bit, which would have allowed reasonable planning for
development resources. I'm glad you're happy about Glassfish, but personally,
I'll use it in the long run to point out there are problems with large companies
like Sun and IBM, their interactions with the ASF, and how they should not be
trusted (as in, accept whatever they contribute, but there's no need for being
thankful for it - and obviously, don't install them as "despot" on a project
ever, but at the ASF, it's a bit hard to get into this situation).

> This (removing my name from the @author tag on StandardSession), both here and
> everywhere else in the Tomcat code base, would indeed be my wish.

I wasn't really serious. In the end, it's not really my decision, so you should
send a request to the pmc once it is correctly setup.
Comment 29 Mario Ivankovits 2005-09-08 12:26:43 UTC
It should be posssible to install such patches by simply place the class file at
the correct position.

In this case it would be

jakarta-tomcat-5.0.28/server/classes/org/apache/catalina/session
Comment 30 Remy Maucherat 2005-09-08 12:34:37 UTC
(In reply to comment #29)
> It should be posssible to install such patches by simply place the class file at
> the correct position.
> 
> In this case it would be
> 
> jakarta-tomcat-5.0.28/server/classes/org/apache/catalina/session

Yes, the classloader setup has been designed to easily allow such patching, but
the folder structure can get a bit messy (you got it right, though).
Comment 31 Leon Rosenberg 2005-09-08 13:48:06 UTC
(In reply to comment #28)
> This means you've tested with 5.5.x, and reproduced an issue ? Or written a
> microbenchmark showing a problem with reads ? All I can see is that you're
> saying "issue on read, 5.5.x same problem" like a broken record.

No this means that the statement of the HashMap authors:
 * <p><b>Note that this implementation is not synchronized.</b> If multiple
 * threads access this map concurrently, and at least one of the threads
 * modifies the map structurally, it <i>must</i> be synchronized externally.
also applies to x Reader, one Writer, so synchronizing "writes only" is a
misusage of the HashMap according to the documentation. 

On the other hand, I made some measures and calculated how much performance you
gain by removing synchronization from get/set methods. 
I compared the performance of synchronized hashmap access against not
synchronized (singlethreaded, pIV 2.8 Ghz, HT) and calculated that you gain 230
milliseconds on 3,000,000 operations! 
That is 0.00008 milliseconds per operation. Even you would have 100 accesses to
the session from 100 parallel threads, it would cost you additional 8 milliseconds. 
According to alexa.com we have an average response time of 0.8 seconds (for the
user) and are faster then 80% of the net (google's average is 0.7). I don't know
how many sites are faster, lets assume the fastest are at about 0.5 seconds (and
sites making 100 session accesses in one request surely do not belong in this
category). So if your average request duration is 500 millis, how important is
it for you to gain 0.00008 milliseconds, or even 0.8 milliseconds?
Comment 32 Remy Maucherat 2005-09-08 14:08:14 UTC
(In reply to comment #31)
> No this means that the statement of the HashMap authors:
>  * <p><b>Note that this implementation is not synchronized.</b> If multiple
>  * threads access this map concurrently, and at least one of the threads
>  * modifies the map structurally, it <i>must</i> be synchronized externally.
> also applies to x Reader, one Writer, so synchronizing "writes only" is a
> misusage of the HashMap according to the documentation. 

It's cool, but I have not asked for the nth lecture of this portion of the
javadoc. How about trying to answer my question ? It doesn't seem that hard.

> On the other hand, I made some measures and calculated how much performance you
> gain by removing synchronization from get/set methods. 
> I compared the performance of synchronized hashmap access against not
> synchronized (singlethreaded, pIV 2.8 Ghz, HT) and calculated that you gain 230
> milliseconds on 3,000,000 operations! 
> That is 0.00008 milliseconds per operation. Even you would have 100 accesses to
> the session from 100 parallel threads, it would cost you additional 8
milliseconds. 
> According to alexa.com we have an average response time of 0.8 seconds (for the
> user) and are faster then 80% of the net (google's average is 0.7). I don't know
> how many sites are faster, lets assume the fastest are at about 0.5 seconds (and
> sites making 100 session accesses in one request surely do not belong in this
> category). So if your average request duration is 500 millis, how important is
> it for you to gain 0.00008 milliseconds, or even 0.8 milliseconds?

In a microbenchmark, the JIT could be playing tricks on you, so I don't know ;)
Obviously, one single read by itself is not going to cost much. Now multiply
that by the number of reads you could be making during a single request, and
also imagine what it could be if useless syncs were added in plenty other places
inside the container "just to be safe". Syncs should be added wherever needed,
but not more than needed.

If you like microbenchmarks, you could compare (let's say with 1/3 writes, 2/3
reads): HashMap without sync, HashMap with syncs on writes, Hashtable,
ConcurrentHashMap. I think there could be some more tuning being done for the
attributes map (like setting a good initial size).

Besides, this is a bit OT, and doesn't answer my question.

I have just looked at two other popular servers source code, and some don't do
any syncing for this, like Tomcat 5.0.x does. Overall, it means it's not
portable, and the webapp really should plan on syncing on the session externally
wherever needed. What I am willing to provide (this is the intent of the code in
5.5.x right now), by default, is making sure the HashMap cannot get corrupted,
and that the infinite loop described in this report cannot occur.
Comment 33 Mario Ivankovits 2005-09-08 14:18:51 UTC
> Overall, it means it's not
> portable, and the webapp really should plan on syncing on the session externally
> wherever needed.
If you use different techniques this might become pain.
It might be hard to tell everyone to synchronize against session, and in the end
you have the same as synchronize in the base class. Well not really the same as
tomcat can synchronize against the map, we have to synchronize on a wider
context - the session.

> What I am willing to provide (this is the intent of the code in
> 5.5.x right now), by default, is making sure the HashMap cannot get corrupted,
> and that the infinite loop described in this report cannot occur.
I wonder how this can be done?
You might have to introduce your own map implementation, no?

Is it possible to create a thread-safe hash-map without synchronization?
Ok, if two threads put in an element with the same key it might not be
deterministic which of both are really set then, but this is not the problem we
have to solve.
Comment 34 Remy Maucherat 2005-09-08 14:59:26 UTC
(In reply to comment #33)
> > Overall, it means it's not
> > portable, and the webapp really should plan on syncing on the session externally
> > wherever needed.
> If you use different techniques this might become pain.
> It might be hard to tell everyone to synchronize against session, and in the end
> you have the same as synchronize in the base class. Well not really the same as
> tomcat can synchronize against the map, we have to synchronize on a wider
> context - the session.

You indeed have to sync on a wider context in many cases.

> > What I am willing to provide (this is the intent of the code in
> > 5.5.x right now), by default, is making sure the HashMap cannot get corrupted,
> > and that the infinite loop described in this report cannot occur.
> I wonder how this can be done?
> You might have to introduce your own map implementation, no?

No, because at this point I believe making sure that the HashMap does not get
corrupted (using syncs on put and remove) is enough to guarantee that the get
method doesn't enter an infinite loop (by returning null, or the result when
there's a problem - it will be unpredictable, but seems to me equivalent to the
higher level concurrency issues if you mix and match reads/writes in the webapp
for critical data without being careful).

Other than this, it doesn't look that the collection being used with its default
parameters is that optimal.

> Is it possible to create a thread-safe hash-map without synchronization?
> Ok, if two threads put in an element with the same key it might not be
> deterministic which of both are really set then, but this is not the problem we
> have to solve.

Maybe using a fully array based structure rather than a linked list would make
it more robust (ie, reads could be bad, writes could be lost, but the structure
would remain consistent). There is stuff like ConcurrentHashMap, but it of
course does synchronization of its own.

I did agree previously (hence the current code in the 5.5 branch) that
robustness is good, and that the HashMap structure should be protected, as
there's no way to restore it, so there are syncs for put and remove.

Since there's a real demand and you guys are quite persistent, I now agree on
adding an extra configuration parameter on the context allowing to define the
sweet spot for the collection size, as well as its synchronization policy (the
default being the current 5.5 behavior, unless/until it is shown to still be
able to cause the inifinite loop originally described in the report).
Comment 35 Darryl Miles 2005-09-08 15:03:14 UTC
How do we approach the specifications committee for official clarification of
this point.

Who is going to take charge to actively do that?

Here is hoping to provoke all vendors to take a look at the issue and report
back to the committee foir a verdict in the coming months.
Comment 36 Darryl Miles 2005-09-08 15:14:03 UTC
(In reply to comment #34)
> No, because at this point I believe making sure that the HashMap does not get
> corrupted (using syncs on put and remove) is enough to guarantee that the get
> method doesn't enter an infinite loop (by returning null, or the result when
> there's a problem - it will be unpredictable, but seems to me equivalent to the
> higher level concurrency issues if you mix and match reads/writes in the webapp
> for critical data without being careful).

I understand the point that others are making for all access needing the same
sychronization.

I don't understand your logic that read/get don't need syncronization.

The issue with the infinite loop read stems from the fact that two different
threads access the map at the same time.  One for read and one for write, while
the writer is modifying the map it corrupted the pointers the reader is using to
traverse the data structures and thus enters an infinite loop (due to the
HashMap design, other collection classes can detect some situations and throw an
exception for ConcurrentAccessException).

This is because there is no synchrnozation between get/put operations.  Only
between puts.



As I pointed out in the TC user mailing list, it would be possible to use a
ReadWriteLock to allow threading of multiple readers to take place.  But while
there is one writer working on the Map you need to be sure no reader is using
the map too.

Introducing the ReadWriteLock might introduce an unwanted JDK5 dependancy, I
think its a new JDK5 concurrency class ?  It might also be slower than a regular
synchronized lock.  But without benchmarks we wont know.
Comment 37 Darryl Miles 2005-09-08 15:20:37 UTC
(In reply to comment #36)
>  One for read and one for write, while
> the writer is modifying the map it corrupted the pointers the reader is using to
> traverse the data structures and thus enters an infinite loop

Sorry to reply to my own post, but my use of the word "corrupted" is a bad choice.


In the normal operation of a write modification to the map the internal data
structures are altered into a temporary inconsitant state.  This inconsitancy is
part of the normal working of the write operation.  When the write operation
returns to the application the map integrity is always consistant.

The basic contract is true of all object design, unless otherwise stated to be
thread safe.  Which we all agree HashMap is not.


If the read operations happens to bump into this moment of temporary
inconsistacy the infinite loop can occur.
Comment 38 Remy Maucherat 2005-09-08 15:41:16 UTC
(In reply to comment #37)
> If the read operations happens to bump into this moment of temporary
> inconsistacy the infinite loop can occur.

All the entry objects will be mutated. While it may be inconsistent and might
loop for an instant (although I am not convinced this could really be the case;
I think the trouble without any sync could only occur if there was more than one
concurrent unsynced write, and in particular, two "remove"), the pointer value
will be corrected and the loop should exit. That's my interpretation looking at
the code. I think I'll write a small program to test this.
Comment 39 Darryl Miles 2005-09-08 15:54:20 UTC
(In reply to comment #38)
> All the entry objects will be mutated. While it may be inconsistent and might
> loop for an instant (although I am not convinced this could really be the case;
> I think the trouble without any sync could only occur if there was more than one
> concurrent unsynced write, and in particular, two "remove"), the pointer value
> will be corrected and the loop should exit. That's my interpretation looking at
> the code. I think I'll write a small program to test this.
> 

Let me ask you another question then.  Where is the written specification for
the HashMap that states your usage is safe ?  It sounds like you as working on
the presumption that all implementation's won't cause an infinite loop (or set
fire to the computer) but you dont have any API contract to back that
presumption up.



I read the specification to state that some put/remove operations (that modify
the map structurally are explicitly not threadsafe)


You can't call threadsafe and non-threadsafe calls to an API at the same time. 
The threadsafe calls are only threadsafe with respect to other threadsafe calls
on the same API.


My understanding of this:

You can call threadsafe API calls at the same time.

Anytime you want to call a non-threadsafe one you have to serialize it with
respect to the API Interface not with respect to itself or other similar
operations (unless otherwise stated).
Comment 40 Remy Maucherat 2005-09-08 16:27:45 UTC
(In reply to comment #39)
> Let me ask you another question then.  Where is the written specification for
> the HashMap that states your usage is safe ?  It sounds like you as working on
> the presumption that all implementation's won't cause an infinite loop (or set
> fire to the computer) but you dont have any API contract to back that
> presumption up.
> 
> I read the specification to state that some put/remove operations (that modify
> the map structurally are explicitly not threadsafe)
> 
> You can't call threadsafe and non-threadsafe calls to an API at the same time. 
> The threadsafe calls are only threadsafe with respect to other threadsafe calls
> on the same API.
> 
> My understanding of this:
> 
> You can call threadsafe API calls at the same time.
> 
> Anytime you want to call a non-threadsafe one you have to serialize it with
> respect to the API Interface not with respect to itself or other similar
> operations (unless otherwise stated).

I guess if it goes back again to lawyerspeak level rather than logic, then
there's nothing to talk about. It is all related to reasonable reliability and
robustness, and I don't believe the algorithm of a hashmap can become that weird
(the Sun structure is already not particularly safe). I mean, there could even
be bugs in the collection implementation too. 

I'd like to remind you once more that this synchronization in the container is
not mandatory from what I can see, and at least one other popular container
apparently behaves like Tomcat 5.0. It's the end of this discussion thread as
far as I am concerned :)

I'll also add a way to configure size and sync of the collection (assuming I
confirm behavior to be acceptable using a test program).
Comment 41 Wade Chandler 2005-09-08 16:43:49 UTC
(In reply to comment #38)
> (In reply to comment #37)
> > If the read operations happens to bump into this moment of temporary
> > inconsistacy the infinite loop can occur.
> 
> All the entry objects will be mutated. While it may be inconsistent and might
> loop for an instant (although I am not convinced this could really be the case;
> I think the trouble without any sync could only occur if there was more than one
> concurrent unsynced write, and in particular, two "remove"), the pointer value
> will be corrected and the loop should exit. That's my interpretation looking at
> the code. I think I'll write a small program to test this.
> 
You can certinaly get null returned when a valid value was added to the Map
previosly as well because of the inconsistency.  The hash index used to locate
the first Entry is based on the hash and the length of the table (I guess we
should say the chosen bucket).  If the value has been added to the map.  Then a
call for a get occurs and a call for a write and the write makes a resize and
the resize changes the location of the bucket then e could match another Entry
besides the one for the correct hash then you start getting into some looping. 
In a test I had a get interation of 4 times occur when this happened, but
considering all of the possible shifting that can occur I'm not convinced that
an infinite loop could not occur.  Given the bucket index algorithm and all of
the possible values for a hash and a table size I don't think one can safely say
it's not possible without synchronizing.  If there is a way to determine that
the behavior of one search for a bucket with any hash is always going to produce
the same result for any size table then I think you can safely not synchronize
the code.(In reply to comment #38)
> (In reply to comment #37)
> > If the read operations happens to bump into this moment of temporary
> > inconsistacy the infinite loop can occur.
> 
> All the entry objects will be mutated. While it may be inconsistent and might
> loop for an instant (although I am not convinced this could really be the case;
> I think the trouble without any sync could only occur if there was more than one
> concurrent unsynced write, and in particular, two "remove"), the pointer value
> will be corrected and the loop should exit. That's my interpretation looking at
> the code. I think I'll write a small program to test this.
> 

You can certinaly get null returned when a valid value was added to the Map
previosly as well because of the inconsistency.  The hash index used to locate
the first Entry is based on the hash and the length of the table (I guess we
should say the chosen bucket).  If the value has been added to the map.  Then a
call for a get occurs and a call for a write and the write makes a resize and
the resize changes the location of the bucket then e could match another Entry
besides the one for the correct hash then you start getting into some looping. 
In a test I had a get interation of 4 times occur when this happened, but
considering all of the possible shifting that can occur I'm not convinced that
an infinite loop could not occur.  Given the bucket index algorithm and all of
the possible values for a hash and a table size I don't think one can safely say
it's not possible without synchronizing.  If there is a way to determine that
the behavior of one search for a bucket with any hash is always going to produce
the same result for any size table then I think you can safely not synchronize
the code.
Comment 42 Darryl Miles 2005-09-08 16:56:03 UTC
(In reply to comment #40)
> > I read the specification to state that some put/remove operations (that modify
> > the map structurally are explicitly not threadsafe)
> > 
> > You can't call threadsafe and non-threadsafe calls to an API at the same time. 
> > The threadsafe calls are only threadsafe with respect to other threadsafe calls
> > on the same API.


There is nothing weird about anything.  Its basic computer programming design.

If you write a program what uses another API and you make presumtions about how
you can use it then you will come unstuck, not matter what any servlet
specification says.

If however you base it on whats is written into the contract of what the API
presents to your application then you can legitimatly point the finger at that
API when you find a problem.

Maybe you should write your own Map interface to get the bahaviour you want, I
think a generic ReadWriteLock protected Map interface would be a good thing for
the wider Java community to have access to.

Your approach for writing a test case to prove the problem is valid, because its
a concurrency rare case, maybe if you sprinckled Thread.sleep() throughout the
HashMap code to slow it down the you might be able to make the problem 100%
reproducable.

But I don't believe Sun's implementation needs to be fixed, as this problem does
not seem to contradict the written contract the HashMap API presents.


> I'd like to remind you once more that this synchronization in the container is
> not mandatory from what I can see

And I'm not disputing you.  I'm disputing the more fundimental usage problem. 
There is no way for a wep-app to override the Session object in a portable way
across containers.


> I'll also add a way to configure size and sync of the collection (assuming I
> confirm behavior to be acceptable using a test program).

This would be welcomed by me.
Comment 43 Andrew Miehs 2005-09-08 16:58:58 UTC
(In reply to comment #38)
> I think the trouble without any sync could only occur if there was more than one
> concurrent unsynced write, and in particular, two "remove"), the pointer value
> will be corrected and the loop should exit. That's my interpretation looking at

If I understand this statement correctly -
You are saying that syncing the put and remove should be enough to stop the
infinite loop, and that this is why it was "FIXED" in 5.5.

If an example could be provided that an unsynced get can lead to an infinite loop,
would this change anything concerning the status of this problem?

A second question following on from this:

When would anyone want to use a "threadUnsafe" session set/getAttribute?

As we have seen, NOT syncing this causes hangs - so therefore developer
needs to do this (if he uses these methods). Wouldn't it make more sense to
just sync it further down in tomcat, as suggested here, rather than needing
to sync the whole method every single time you use this in your code?
Comment 44 Wade Chandler 2005-09-08 17:00:54 UTC
I guess I should add that it seems logical if one web page calls
Session.setAttribute then it shouldn't be possible later to make a call to
getAttribute and it not be available.  i.e. a minute earlier I made a call to
set then a minute later during my get a resize occurs and I get a null even
though I should not get a null.  This could occur because of the bucket
distribution on certain hash and table size combinations vary.  I had one hash
and table combination yield a 9 index.  Later when the resize occured this index
was moved to 3, and the resize before that it was moved to 11.  So, start at 9,
resize, then to 11, resize, then to 3.  So you can imagine what would happen if
during the time I get the hash from indexFor and then access the array what I
might get....null.  This all in HashMap.get
Comment 45 Wade Chandler 2005-09-08 17:03:08 UTC
(In reply to comment #44)
> I guess I should add that it seems logical if one web page calls
> Session.setAttribute then it shouldn't be possible later to make a call to
> getAttribute and it not be available.  i.e. a minute earlier I made a call to
> set then a minute later during my get a resize occurs and I get a null even
> though I should not get a null.  This could occur because of the bucket
> distribution on certain hash and table size combinations vary.  I had one hash
> and table combination yield a 9 index.  Later when the resize occured this index
> was moved to 3, and the resize before that it was moved to 11.  So, start at 9,
> resize, then to 11, resize, then to 3.  So you can imagine what would happen if
> during the time I get the hash from indexFor and then access the array what I
> might get....null.  This all in HashMap.get

This even without speaking of the infinite loop issue.
Comment 46 Rick Knowles 2005-09-08 17:51:41 UTC
(In reply to comment #43)

> When would anyone want to use a "threadUnsafe" session set/getAttribute?
> As we have seen, NOT syncing this causes hangs - so therefore developer
> needs to do this (if he uses these methods). Wouldn't it make more sense to
> just sync it further down in tomcat, as suggested here, rather than needing
> to sync the whole method every single time you use this in your code?

One possible case is for a really simple dumb cache: where you expect the 
result to be present in the session on every time except the first (eg where 
the read:write ratio is 10000:1). In such a case, if the get is 
unsynchronized, it's safe to say:

        Object cached = session.get(key);
        if (cached == null) {
            synchronized (session) {
                cached = session.get(key);
                if (cached == null) {
                    cached = "blah";
                    session.put(key, cached);
                }
            }
        }

I understand there is potential waste doing two gets (one in and one out of 
synchronized block), but this only gets executed on the first hit, so 
decreases in significance as the read:write ratio goes up. 

Anyway - that's one case I've found where an un-threadsafe get helps 
performance wise, especially on lower spec machines short on memory (only 
through subjective observations though - don't have any numbers).
Comment 47 Rick Knowles 2005-09-08 17:56:22 UTC
I guess the point of the above would be that desired synchronization behaviour 
is usage-dependent, so giving the developer the freedom/responsibility to 
decide it is not necessarily a bad thing. Personally my reading of the spec 
was that this was the reason it was left to the developer: because only the 
developer has a good idea what the desired sync behaviour is.
Comment 48 Wade Chandler 2005-09-08 18:03:23 UTC
(In reply to comment #47)
> I guess the point of the above would be that desired synchronization behaviour 
> is usage-dependent, so giving the developer the freedom/responsibility to 
> decide it is not necessarily a bad thing. Personally my reading of the spec 
> was that this was the reason it was left to the developer: because only the 
> developer has a good idea what the desired sync behaviour is.
> 

It would be fine if all the other places in Tomcat were fixed to allow you to do
that.  Point in case jsp:useBean with scope="session" can't be synchronized..not
that I know of without changing the TC5.0 and 5.5 code.  Also the JSTL
libraries, and many other jakarta libraries.
Comment 49 Andrew Miehs 2005-09-08 18:05:48 UTC
(In reply to comment #46)
> 
> One possible case is for a really simple dumb cache: where you expect the 
> result to be present in the session on every time except the first (eg where 
> the read:write ratio is 10000:1). In such a case, if the get is 
> unsynchronized, it's safe to say:

I understand, but how can you guarantee that the 'second' request does not
overtake the first put request? Not very likely, agreed - but still possible.
Users have a great way at pressing F5 very quickly!
Comment 50 Rick Knowles 2005-09-08 18:10:01 UTC
(In reply to comment #49)
> I understand, but how can you guarantee that the 'second' request does not
> overtake the first put request? Not very likely, agreed - but still possible.
> Users have a great way at pressing F5 very quickly!

That was the purpose of the second get check inside the sync block. The second 
request will, if the cache is null, block on the sync'd session, and only 
proceed into the sync block after the put. The first thing it does is check 
again, which returns non null and it continues happily.
Comment 51 Andrew Miehs 2005-09-08 18:14:01 UTC
(In reply to comment #50)
> That was the purpose of the second get check inside the sync block. The second 
> request will, if the cache is null, block on the sync'd session, and only 
> proceed into the sync block after the put. The first thing it does is check 
> again, which returns non null and it continues happily.

The situation can occur IF the second request overtakes the first that
    Object cached = session.get(key)
will simply not come back and return null, because it hangs in the hashmap
inside the get and so your check for null will never be called
Comment 52 Darryl Miles 2005-09-08 18:14:36 UTC
Bear in mind if you are using clustering, you have to put the modified session
attribute after you have finished modifying the object and wish the new state to
persist for the next request, as clutering replication only takes place at
setAttribute() time.  So the 1000:1 may only be realistic for those users.


Also with the other issue of getting a null back when you expected to see an
object.  Session data does not stay around forever, it expires so all web-apps
must deal with the no session situation.  So I'd agree that so long as your
servlet still holds the exact Session object instance you did a setAttribute()
on you can reasonable expect a getAttribute() on it exist.

But between requests you let go of the instance, and leave it upto the container
session manager to hold onto.

If as comment #44 implies you are talking minutes later, then I read into that
you mean across requests.  Well there is no garuntee your session still exists
so your web-app must deal with that situation.

But you should not need to deal with seeing an old overwritten value turn up
again, that would be a problem.
Comment 53 Craig McClanahan 2005-09-08 18:16:29 UTC
(In reply to comment #35)
> How do we approach the specifications committee for official clarification of
> this point.
> 
> Who is going to take charge to actively do that?
> 
> Here is hoping to provoke all vendors to take a look at the issue and report
> back to the committee foir a verdict in the coming months.
> 

The formal mechanism to do that is send email to the feedback address listed on
the spec (for 2.4, that's <servletapi-feedback AT eng.sun.com>).  I've done
that, and corresponded privately with the spec lead for Servlet 2.5 (Greg
Murray) as well, who will then discuss it with the expert group to see what (if
any) language changes they might want to do in Servlet 2.5.
Comment 54 Wade Chandler 2005-09-08 18:17:42 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > I guess the point of the above would be that desired synchronization behaviour 
> > is usage-dependent, so giving the developer the freedom/responsibility to 
> > decide it is not necessarily a bad thing. Personally my reading of the spec 
> > was that this was the reason it was left to the developer: because only the 
> > developer has a good idea what the desired sync behaviour is.
> > 
> 
> It would be fine if all the other places in Tomcat were fixed to allow you to do
> that.  Point in case jsp:useBean with scope="session" can't be synchronized..not
> that I know of without changing the TC5.0 and 5.5 code.  Also the JSTL
> libraries, and many other jakarta libraries.


Also if you access the PageContext in a jsp and use the scoped methods the ones
accessing APPLICATION and REQUEST scope are already synchronized, yet the
session is not.  Also, now you have to synchronize not only session calls, but
you also have to synchronize the calls to the PageContext in a JSP.  So, instead
of some type of a performance gain you end up with double the monitors for those
objects and then the one for the session.  

I think these are the points people are making.....there is so much to change
now that a patch now and maybe a refactoring could be done and an announcement
that the functionality is changing.  But, before TC decides to do that maybe
they should see how all of the other servers are handling this.  Because if they
are handing this with synchronization and others are writing code not worrying
about synchronizing........they are not going to rewrite to use Tomcat later and
what about code targeted for a container like Oracle or Sun ONE that someone is
using in TC which is not synchronized...point being if the user base gets run
off whats the point, and who will be left donating to the project.  I really
think this is too subtle of a thing to say it's by the spec....at least for the
time being, and maybe it should always and forever remain synchronized depending
on what is going on with the real world usage of the spec in other cases.  There
are other situations where you don't even have access to synchronize the
call...again tags, and I'm sure some other API calls directly manipulating the
session.  I have found no where in other Tomcat code synchronizing the session
access, so there are other issues or fix this one.

I haven't even seen comments on all of the other issues in the TC code accessing
the session or the other jakarta projects from anyone but those arguing this
needs to be fixed.  So, what's the story?  Is this one thing changing or are all
of the others changing or is nothing changing (being fixed)?
Comment 55 Leon Rosenberg 2005-09-08 18:47:59 UTC
(In reply to comment #38)
> All the entry objects will be mutated. While it may be inconsistent and might
> loop for an instant (although I am not convinced this could really be the case;
> I think the trouble without any sync could only occur if there was more than one
> concurrent unsynced write, and in particular, two "remove"), the pointer value
> will be corrected and the loop should exit. That's my interpretation looking at
> the code. I think I'll write a small program to test this.

Ok, I wrote a program which can reproduce the bug even with tomcat 5.5
conditions, with synchronized put/remove and unsynchronized get. I must correct
myself, since it doesn't produce an infinite loop, but just a very LONG LASTING
FINITE loop, and I am talking about hours or days of execution.

I do following: 
Having X writer and X reader threads, which modify (set or remove by 50% chance)
and read 10 mappings concurrently. The abovementioned bug occurs pretty soon
(500.000 operations), the read thread hangs around for 5-10 seconds and gets
fixed by the writer thread some time later. 

I've measured that a read operation lasts on average 0.01-0.02 milliseconds. So
if one of the threads detects that last execution lasted longer then 5 seconds
it is safe to say that this thread was hanging and got fixed by another writer.

Speaking of that, we could probably reproduce it on the production servers by
not restarting them, and leting them run some hours or days instead (maybe as
long as the user is still online, I don't know) and hope they will get fixed.
However this is not really a solution.

The reason I can't reproduce the infinite loop, is that I can't stop the writers
 in time. Performing thousands of writes per second in multiple threads makes it
hard to stop all writers at the same point, before they unintentionally fix the
reader.

Still, I assume, I have a good chance that an infinite loop happens on the
production server if the user leaves the server, and no further write operations
are performed on his session.

Comment 56 Jason Lea 2005-09-08 23:31:29 UTC
Looks like the problem will hard to reproduce as it requires a situation where a
write to the hashmap causes a table resize/reindex to occur while another thread
is reading from the hashmap.

I would have thought it would return a null when it couldn't find the value. 
But there is no guarantee on what happens.

Developers could sync on the session everytime they want to read.  But why
should they, it is adding to the complexity of writing web apps.  As Wade
Chandler points out it may not even be possible with the JSTL tags.

Some have suggested using ReadWriteLock or some other tricky method of reading,
checking if null, and then doing a synced read.  Which is all very good, but
there is a simpler solution (for tomcat 5.5) - use ConcurrentHashMap - it says
that "Special-purpose data structures such as the Concurrent hash tables in this
package have far less overhead, and typically much better performance than
placing ReadWriteLocks around most sequential data structures."  Also, the
ConcurrentHashMap already does the second suggestion of doing a sync after a
null is returned (if you look at the link at the end of this).

Here you have someone who has done all this hard work of figuring out when you
can read/write and deals with locking.  It will give you consistant results, and
takes out the opportunity for developers to make mistakes themselves with
deciding whether to sync when reading from a session or not.  Plus JSTL and
everywhere else will benefit in not getting inconsistant results.

Next stop, Application scope...

some more reading about ConcurrentHashMap internals:
http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
Comment 57 Wade Chandler 2005-09-09 01:03:44 UTC
(In reply to comment #56)
> Looks like the problem will hard to reproduce as it requires a situation where a
> write to the hashmap causes a table resize/reindex to occur while another thread
> is reading from the hashmap.
> 
> I would have thought it would return a null when it couldn't find the value. 
> But there is no guarantee on what happens.
> 
It should return a null when you simply look at it the code.  I can't reproduce
the infinite loop myself, but Leon says he can reproduce the problem, and if
it's so then it's a problem (Murphy's Law), and I definitely see that it is
possible to add an attribute then later while a write is occuring to ... for
instance ... be logged in to some application which is relying on session
variables and the system think you are not then get kicked out because of
concurrency issues in with the HashMap not being synchronized because the system
thinks a session attribute/variable isn't set when it is.  The original set
could have occurred hours before then just when the read is occuring a write
takes place which resizes the HashMap, and then an issue arises which makes web
apps unpredictable in Tomcat....you can predict it by knowing the default size
of the Map and not using more session variables than the Map threshold to size I
suppose, but give me a break.  I can't say for certain, myself, whether the
infinite loop will occur just syncing the writes or not, but the way the get
method works with the index being calculated for the hash bucket (array index)
you can certainly say it's unpredictable and it can definitely cause goofy
errors.  So an INVALID resolution seems irresponsible for this issue.
Comment 58 Andrew Miehs 2005-09-09 01:45:28 UTC
I assume that everyone has read the tomcat mailing list regarding this issue
and seen and read the link pointing to

http://blogs.opensymphony.com/plightbo/archives/000175.html

So are we still arguing whether or not getAttribute can cause an infinite loop?
or not?

If we all agree that getAttribute can cause an infinite loop IF called in a 
multithreaded environment, we (I assume) all agree that the call to it MUST be
synced SOMEWHERE. This also includes the example in comment 46, as it can not
be gaurenteed that getAttribute (inside the session.get) in the first line 
does not happen in the middle of putAttribute in the middle of session.put.

Now the last question is remaining is who's responsibility is it to sync it?

If the end developers need to sync it - they will need to sync much more than 
just the 'hashmap' itself


Comment 59 Wade Chandler 2005-09-09 02:10:31 UTC
(In reply to comment #58)
> I assume that everyone has read the tomcat mailing list regarding this issue
> and seen and read the link pointing to
> 
> http://blogs.opensymphony.com/plightbo/archives/000175.html
> 
> So are we still arguing whether or not getAttribute can cause an infinite loop?
> or not?
> 
> If we all agree that getAttribute can cause an infinite loop IF called in a 
> multithreaded environment, we (I assume) all agree that the call to it MUST be
> synced SOMEWHERE. This also includes the example in comment 46, as it can not
> be gaurenteed that getAttribute (inside the session.get) in the first line 
> does not happen in the middle of putAttribute in the middle of session.put.
> 
> Now the last question is remaining is who's responsibility is it to sync it?
> 
> If the end developers need to sync it - they will need to sync much more than 
> just the 'hashmap' itself
> 
> 
> 

Again though, when you look at all of the places accessing Session.getAttribute
you can't synchronize all of them, so the last question that I see is what is
going to be fixed?  Either all of these other places in TC need to be fixed or
this one does.  But, I think a real world study should take place before all the
 other places are fixed and this one isn't.  I mean, if all other containers are
behaving like TC4.x and synchronizing these calls then it just makes sense for
TC to follow suit.
Comment 60 Andrew Miehs 2005-09-09 02:13:25 UTC
(In reply to comment #58)
> synced SOMEWHERE. This also includes the example in comment 46, as it can not
> be gaurenteed that getAttribute (inside the session.get) in the first line 
> does not happen in the middle of putAttribute in the middle of session.put.

Actually, after looking at this a bit more - I need to correct myself - this 
example works because the hashmap does not need to be resized on the first and 
only putAttribute call.

Still - not very pretty though.
Comment 61 Mario Ivankovits 2005-09-09 07:35:35 UTC
Too much has said alread, but I also think it is not the only option to simply
synchronize.

Most of the time the different webapps/technologies do not utilize the same
content of such a session map. e.g. JSF and your home app mostly wont use the
same keys in this hashmap.
So why introduce a bottleneck? (and I didnt mean a real performance one, its
more  philosophical)
Why should one technique influence another one? And there is an impact, in the
worst case this synchronized map acts like a "sequencer".

What I would like to say is, its somehow reasonable to not simply synchronize,
but to find a hashmap which is able to handle this case.

This hashmap should be able to insert keys in concurrent. I one insert the same
key in parallel the result wont be deterministic, ok - fine.
And for sure, such a map should not lock (and not even wait for some "free me"
event) on get.

Isnt there a "cool" java collection developer here?
Comment 62 Jason Lea 2005-09-09 10:05:32 UTC
(In reply to comment #61)

java.util.ConcurrentHashMap
Comment 63 Dominik Drzewiecki 2005-09-09 10:14:10 UTC
(In reply to comment #62)
> (In reply to comment #61)
> 
> java.util.ConcurrentHashMap

You meant java.util.concurrent.ConcurrentHashMap, didn't you? 
Too bad it has been introduced in java 5.0
Comment 64 Leon Rosenberg 2005-09-09 12:26:20 UTC
Another question:

http://java.sun.com/j2se/javadoc/writingdoccomments/index.html

According to sun's javadoc principals (above link)
#  The Java Platform API Specification is a contract between callers and
implementations.

The Specification describes all aspects of the behavior of each method on which
a caller can rely. It does not describe implementation details, such as whether
the method is native or synchronized. The specification should describe
(textually) the thread-safety guarantees provided by a given object. In the
absence of explicit indication to the contrary, all objects are assumed to be
"thread-safe" (i.e., it is permissible for multiple threads to access them
concurrently). It is recognized that current specifications don't always live up
to this ideal.

Since the javadoc of the HTTPSession
(http://java.sun.com/j2ee/1.4/docs/api/index.html) doesn't mention anything
about session being not thread-safe, should not the developer be able to rely on
the fact that the implementation is thread safe?





Comment 65 Darryl Miles 2005-09-09 12:55:40 UTC
(In reply to comment #64)
> Another question:
> 
> http://java.sun.com/j2se/javadoc/writingdoccomments/index.html
> 
> According to sun's javadoc principals (above link)
> #  The Java Platform API Specification is a contract between callers and
> implementations.

Leon,

Can you publish your TC+HashMap finding you based your comment #55 ?


I've always tried to program for the worst case scenario, assume thread-unsafe
unless otherwise stated.  Probably my 'C' language background thinking here.  So
it interests me to hear this stated in reverse for Java (not a bad thing at
all).  Where within the reference is that implicit thread-safe notion stated ?


Is a HashMap really needed for session attributes, exactly how many attributes
does an average application have set ?  Replacing the collection with a bespoke
TC internal class based on something as silly as a linked list or fixed hash
bucket redirect into linked list should serve most users well.

Then making the exact collection implementation class a configurable item so
those users that would benifit from a safe HashMap for scalabilty of key count
would be happy to fix their performance with a config change.


It would please me greatly to see a more proper collection class for the job
with a safe write operation whilst allowing simultenous mutiple reads.  But
trying to bend the HashMap API into something its not is wrong.

Java Moto #1: "You program to interfaces"
Comment 66 Darryl Miles 2005-09-09 13:10:00 UTC
(In reply to comment #65)
> all).  Where within the reference is that implicit thread-safe notion stated ?

http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level
Comment 67 Darryl Miles 2005-09-09 13:28:29 UTC
(In reply to comment #66)
> http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level

That reference is invalid in the context of this bug report.  I read it to be a
hyperthetical example of what to put inside javadoc information.

Nothing concrete to under pin all Java programming design.


My comments fallback to what is written (and not written) into the documentation
for the HashMap implementation that TC has elected to use for the job.  It does
not say you call get() while you are also doing a put() (when the put() may
cause internal re-arangement), it explicitly warns you of this problem and as we
have no way knowing if the put will cause a rearrangment to achieve TCs goal of
robustness we have to synchronize our usage.
Comment 68 Leon Rosenberg 2005-09-09 14:39:16 UTC
Created attachment 16347 [details]
bug reproduction

eclipse ready project. 
to start: 
jar -xf synchtest.jar
cd synchtest
java -cp classes leon.synch.StartTest
Comment 69 Leon Rosenberg 2005-09-09 14:49:29 UTC
(In reply to comment #65)
> Leon,
> 
> Can you publish your TC+HashMap finding you based your comment #55 ?

It's a simple program, which actually shows the bug (on synchronized put/remove
and unsychronized get). I am to stupid to program the writers to stop as soon as
the loop occurs, so they fix the problem some time after it occurs (thats why a
long loop instead of infinite loop). 

After each execution the reader measures how long the execution lasted. If the
execution lasted longer then 5 seconds it cries. The mid time of the execution
is 0.05 milliseconds. So we can safely assume, that executions which last 5
seconds or more aren't normal. In fact I had it all night running with 714 "too
long executions", longest of them being 70 seconds. If I add synchronized(map)
in the getAttribute method of the Storage class it doesn't occure anymore.




Comment 70 Andrew Miehs 2005-09-09 15:12:54 UTC
(In reply to comment #67)
> (In reply to comment #66)
> > http://java.sun.com/j2se/javadoc/writingapispecs/index.html#top-level
> 
> That reference is invalid in the context of this bug report.  I read it to be a
> hyperthetical example of what to put inside javadoc information.
> 
> Nothing concrete to under pin all Java programming design.

I think what leon is trying to say is that 'this is what should be done'.

So either - the documentation for HTTPsession has a problem, that it is missing
the information regarding it not being thread safe - or the implementation
in TC is wrong.

According to what I have read here, it has been decided that TC says that HTTPsession
is NOT threadsafe, meaning that all developers who use HTTPsession will
- should they call 'put' - need to ensure that they sync the session - otherwise
nasty things can happen.

This means that the documentation should definitely be changed to warn developers,
and that all 3rd party projects fix their code so that it does not crash/ hang tomcat 5.0 .

(I haven't had a proper look to check if the synced puts in 5.5 are enough to stop the
loops)...

IMHO - Anyone who doesn't care whether or not the data is consistent - really
doesn't need the data, and shouldn't bother calling the routine.
Comment 71 Allistair Crossley 2005-09-09 15:16:50 UTC
Guys please .. Remy already said way back in #40 that he would look into 
allowing this to be configurable :)
Comment 72 Wade Chandler 2005-09-09 17:37:31 UTC
(In reply to comment #71)
> Guys please .. Remy already said way back in #40 that he would look into 
> allowing this to be configurable :)

I don't really see anything that says the real issue will be addressed.  An out
of the box Tomcat has to be configured not to have a bug?  Plus, he said he
would look into it and all this and that and that's fine, but no where does it
reflect he believes there is an issue, and right now a bug caused by an
uninformed decision remains.  One, how do you synchronize calls with jsp:useBean
with scope of session?  Where else is session accessed in the Tomcat code
without synchronizing that developers should be aware of?  Is JSESSIONID stored
in the session as well and where is it accessed?  We need to make sure that
access is synchronized.  If we are going to talk about specifications and
responsibilities lets talk about them, but logically and truthfully.  There is
more to this issue than can be addressed by: "It's the developers
responsibility.".  Do we need to file bugs for all of the other issues if this
is going to remain invalid?  I say that must be the case if this is to remain
invalid.

Personally I don't like the idea that "this was done as an experiment"...
written in one of the comments above... in a system said to be "production
quality" (Tomcat home page).  The point is this: if it were an experiment it
should have used an option to turn it on, but should not have been a default by
any means.  If a system has an apparent flaw that reduces it's stability and
reliability and  especially when it can be solved with a couple of lines of
code.....a patch is a no brainer.  This entire issue makes Tomcat currently
invalid as there are sections of standard usage which can not be safe guarded
against concurrency issues.  If I have an issue where customers can get kicked
out of the site after being logged in and the client asks me to fix it....I
can't without changing the entire application, and by that I mean not using
jsp:useBean calls in my JSP pages without wrapping them with other calls, so
what's the point of using the tags in JSP if they can't be safe guarded without
more work.  I thought the point of the tags, EL, and other things is to make
development quicker and to provide short cuts.

My opinion:

1) Before a decision was made to change the synchronization of something which
was currently synchronized when documentation for the underlying implementation
explains there are issues with not synchronizing should not have only been
discussed as to whether it was safe, logical, and correct, but should have been
proved.

2) If it were to be an experiment, then it should have used an option to turn it
on, and the default been the more correct implementation for the given situation.

3) When a problem is identified in a piece of software being promoted as
production ready it should be handled.  Not shaken off as INVALID.  Where is the
proof that unsychronized reads don't have issues.  Examine the HashMap source
code and write a few tests.  You'll see with the bouncing index that a valid
value can exist in the Map for a given key and that Entry not be located because
null is returned.  Thus a program relying on a session variable to be set will
not be able to retrieve that session variable and take an entirely different
path because of this....it's very simple........just read the code:
    public Object get(Object key) {
        Object k = maskNull(key);
        int hash = hash(k);
        int i = indexFor(hash, table.length);
        Entry e = table[i]; 
        while (true) {
            if (e == null)
                return e;
            if (e.hash == hash && eq(k, e.key)) 
                return e.value;
            e = e.next;
        }
    }

between the time i is set from a call to indexFor if table is resized with all
values null or all items are transfered and the given index for i is null then
null is returned as soon as we step into the while loop.  What that means is: If
between the time i is set from indexFor and the HashMap is resized because of a
write then there is a concurrency issue which affects the consistency of the
application.

How can this issue be worked around currently using Tags?  If we need to file
bugs on all the other places please let us know.  If you would like some help
with the other issues I would be willing to help, but my feelings are this
should be changed back for the time being and we need a new release based on the
latest release with only this change to the source.  In the mean time all the
other issues can be worked on as all of that will take more time than this
change.  Once those issues are dealt with then change it back and make sure the
issue is documented on the front page of the Tomcat documentation.
Comment 73 Gregory Block 2005-09-14 01:10:24 UTC
Ok, I'm going to *explain* why I'm so upset about this.

"There is heavily used public API in this heavily multithreaded application that are not thread safe and 
not documented clearly as such."

You might as well put it on the front page - you've got a page of comments from various ASF people 
trying to explain to the world how that sentence (which is what it all boils down to) makes a hair of 
sense.

Any other parts of the most commonly used API in the system, those things which are not only used on 
multithreaded applications but happen to be *heavily* used parts of the API, with a high risk of 
threading problems, happen to be?  Because I'm looking at your response and wondering WTF you 
expect of developers - to walk through the spec and GUESS at which things you've decided to willfully 
misinterpret for the sake of shaving off a few hundred milliseconds that cause my live applications hell?

I've been wondering what's caused this for ages now, and just *happened* to stumble across this.  How 
dare *ANYONE* at the ASF claim a holier-than-thou attitude about a fork when something as simple 
and basic as this gets explained away, marked invalid, and ignored.

You're experimenting with people's live applications, whom everyone's been encouraging to trust 
software written by authors who think a few hundred milliseconds on Joe's web app is more important 
than stability.  Plain and simple.  WTF are you doing to Tomcat?  Is there anything more important than 
its reputation as a stable platform?

If the ASF worked so bloody well, we wouldn't be seeing someone asking us to misread sections of the 
spec.  There is *no* defence for this kind of behaviour in a server like this; there's loads of defence for 
the existence of the bug, to be sure - but none for this kind of response...  So all the pointless crying 
about someone forking off looks a lot more, now, like you're getting exactly what you deserve for 
decisions which quite clearly diverge from the sane by several kilometers.  Even rereading this, I can't 
get over it - a major bug in the support for multithreading, being ignored by its developers for a range 
of differently pointless reasons, resulting in a "hey, we'll make it configurable".

Configurable?  What, now I *want* to selectively cause data corruption?  I didn't realise it was such a 
beneficial option to have on hand.  Sure, some of our readers get 'teh suck', but hey, it's 8 milliseconds 
faster for the other guy.

This is one of those bugs that goes out in e-mail, to just about everyone I know, with a note that says 
maybe we should be looking at alternative solutions to Tomcat in our live environment - because if this 
is the kind of judgement being used to build the application we're depending on to serve content, you 
can keep it.  This all works so long as I don't have to question fundamental decisions about whether the 
software is well made, and up until now, I've never even considered the possibility of whether or not I 
should trust what comes out of the Tomcat dev team.

Now I find out that a major bug that's been affecting our platforms was someone's little 'experiment'.  I 
haven't got two fingers big enough to stick up at the moron who thought *that* explanation was a good 
idea.

Fix it.  Resolved my arse.  And while you're at it, tell me what other areas of the "spec" I should be 
misinterpreting, and working around in my code.  Crying about a fork?  Reading this, you practically 
*deserve* one.
Comment 74 Frank W. Zammetti 2005-09-14 01:32:06 UTC
I reopened this ticket because it pretty clearly should not be closed, or at 
least not with an INVALID resolution.  I would have thought others would have 
done this (again, it seems it has been closed and reopened already), but so be 
it, I'll do it.

* 18 people voted for this bug, which alone indicates it is not INVALID.

* There continues to be comments made on it which reflect people wanting this 
addressed in some way other than sweeping it under the rug (see the rather 
heated comment right before this one... that is not the first time I have 
personally heard similar sentiments from someone recently).  Even if all the 
committers vehemently disagree with that thinking (and it appears to mainly be 
one frankly) it is improper to close this as INVALID because that does not 
accurately reflect reality.

* There continues to be discussion on the mailing lists, and elsewhere, about 
this.  That too indicates the INVALID resolution is not correct.

If someone wants to verbally slap me for doing this, go for it.  I for one do 
not believe this whole situation is being handled properly and this is my act 
of civil disobedience in the matter.  If you want to close it again, at least 
DO NOT do the Tomcat user community the disservice as marking it as INVALID 
because that resolution will do more to hurt people than not fixing it (i.e., 
some may conclude there really is nothing to worry about and get burnt in 
production).  Mark it as what it truly is: WONTFIX... which would *still* be 
the wrong answer IMO, but will at least provide the proper information to 
anyone looking at this ticket after the fact, scratching their head asking 
what the hell is going on?!?
Comment 75 Niall Pemberton 2005-09-14 04:09:12 UTC
(In reply to comment #73)

The great thing about the ASF is its open source and you can fix it yourself. I 
would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat 
developer isn't going to make that more likely - this is a volunteer 
organization.

I've created a patch, based on Craig's idea (see comment#21), that can 
be "plugged" in for Tomcat 5.0.x which creates "thread safe" sessions - 
available <a 
href="http://www.niallp.pwp.blueyonder.co.uk/TomcatBug36541.html">here</a>
Comment 76 Jay Burgess 2005-09-14 04:28:19 UTC
Just realize that some of us work for companies that don't allow us to create
our own patched version--we're required to use the binary, versioned downloads.
 So while a patch may be sufficient for some of the userbase, it's not the
answer for everyone. 

So let's just fix it.

Jay
Comment 77 Niall Pemberton 2005-09-14 04:35:24 UTC
(In reply to comment #76)

OK "patch" is probably mis-leading - its really a "plug-in". You still use the 
Tomcat binary download unchanged. Its just a case of dropping in an extra jar 
and configuring Tomcat to use it.

(In reply to comment #75)
> (In reply to comment #73)
> would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat 

Sorry Wade, meant Gregory :-(
Comment 78 Yoav Shapira 2005-09-14 04:55:06 UTC
Guys, we're in active discussion with the Servlet Spec expert group about
resolving this.  It's an open issue in the center of everyone's radar and
highest on the priority list.  It'll get fixed in the best way possible as soon
as possible.  Let's try to stay professional about it.
Comment 79 Yoav Shapira 2005-09-14 04:55:30 UTC
Guys, we're in active discussion with the Servlet Spec expert group about
resolving this.  It's an open issue in the center of everyone's radar and
highest on the priority list.  It'll get fixed in the best way possible as soon
as possible.  Let's try to stay professional about it.
Comment 80 Wade Chandler 2005-09-14 05:24:29 UTC
(In reply to comment #75)
> (In reply to comment #73)
> 
> The great thing about the ASF is its open source and you can fix it yourself. I 
> would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat 
> developer isn't going to make that more likely - this is a volunteer 
> organization.
> 
> I've created a patch, based on Craig's idea (see comment#21), that can 
> be "plugged" in for Tomcat 5.0.x which creates "thread safe" sessions - 
> available <a 
> href="http://www.niallp.pwp.blueyonder.co.uk/TomcatBug36541.html">here</a>

That is the great thing about open source software.  Unfortunately I didn't
write comment #73.  It certainly is a volunteer organization.  I have tried to
help and give advice on the Tomcat list for years.  I have even tried to help
out with a few open source projects with some of my own time: iReport on
sourceforge.net is one of them.  

When I work on any open source project I accept others critism and am conscious
of the fact that I did not write all of the source code in the project nor am I
the only person using and relying on the software.  I am also aware that because
an issue isn't affecting me directly it doesn't mean it isn't an issue, and
these types of problems require being looked into when clear and relevant
evidence is provided to display a real and valid problem exists.

The last paragraph is written becuase, I have not been bitten by this specific
bug as of yet.  I saw the issue come across the Tomcat users list.  I then
evaluated the situation.  As I was evaluating the situation I noticed that while
I didn't see how an infinite loop could occur I definitely could see an issue
because not synchronizing the reads can allow incorrect returns from a call to
getAttribute (null when the key clearly points to a relevant value) and the
while loop can be entered and iterated over a different number of times before
returning.  

I even went as far to take java.util.HashMap, copy it into it's own class so I
could access it's internals from code, then wrote tests to validate what I
thought could happen.  I then noticed areas in the Tomcat source code which are
accessing the session which are not synchronized which developers can not
directly synchronize.  So, clearly there is a real issue which needs to be
addressed a the release level.  A patch is fine, and thank you for it, but it
doesn't solve the scenario where a new user who doesn't know about this issue,
the patch, or anything else associated with it downloads and installs Tomcat. 
So, when they run into it we'll have this same thing over and over again until
it's fixed.  I have a feeling it's being worked on though.

Also, I think some of the more hateful comments could have been avoided had
there not been antagonizing language used by certain folks.  If you will notice
the comment you are referring to posted by another user, they seem agitated by
some of the responses to the issue.  So, yes it's a volunteer organization, and
ASF should be respected by both users and developers alike.  It should be
considered a privilege to be allowed to be a commiter here IMO.  I have always
felt that way about anything I have been able to be a part even when I may feel
like debating an issue or getting a little heated.

So, I have written what I just wrote.  I would like to send a sincere thank you
to all of the Apache developers for their efforts and time: no hard feelings
from me.  Really, I have worked in this business long enough to have been in
heated debates time and time again with co-workers and fellow developers about
programming issues.  Sorry for the confusion, and I want to reiterate another
frustrated individual besides myself wrote comment #73.  I don't want to argue.
 I just want to contribute to software, and make it the best it can be, and
certainly as good as it should be.  So, I'll continue to point out things I see
as issues, and try to grin and bear it when they get pointed out to me.
Comment 81 Wade Chandler 2005-09-14 05:30:49 UTC
(In reply to comment #77)
> (In reply to comment #76)
> 
> OK "patch" is probably mis-leading - its really a "plug-in". You still use the 
> Tomcat binary download unchanged. Its just a case of dropping in an extra jar 
> and configuring Tomcat to use it.
> 
> (In reply to comment #75)
> > (In reply to comment #73)
> > would prefer it fixed in Tomcat, but I think your rant, Wade, at a Tomcat 
> 
> Sorry Wade, meant Gregory :-(

Thanks for the apology Niall.

Wade
Comment 82 Leon Rosenberg 2005-09-14 07:54:59 UTC
just to give a short update on the patch (for those who doubt or/and are interested)

we installed the patch on friday morning, so it's in the production (live)
system for 5 nearly days now. 
Since that we haven't had a single tomcat crash or hangup. Same period a week
ago it were about 15 servers. We haven't changed anything else, and the system
has served about 500.000.000 (real) requests with the patch, so I assume it can
be safely said, that the patch works and solves the bug.
Comment 83 Remy Maucherat 2005-09-14 07:59:26 UTC
(In reply to comment #73)
> Fix it.  Resolved my arse.  And while you're at it, tell me what other areas
of the "spec" I should be 
> misinterpreting, and working around in my code.  Crying about a fork?  Reading
this, you practically 
> *deserve* one.

Yes, unfortunately, you need to find yourself a replacement brain really fast so
you could actually comprehend the issue. Syncing reads in addition to writes
(which already prevents data corruption, and is done again in the entire 5.5
branch) provides very little, since your reads are going to be dirty anyway
unless you sync yourself at the application level. Basically, the
ConcurrentHashMap, which uses the same algorithm as the regular HashMap, just
does an additional synced read if the result was null to the initial unsynced read.

The problem should be solved soon, howver, and you'll be able to stop writing
pointless rants, as the spec wording is going to be heavily changed.

BTW, I am not crying about the Glassfish fork, I am mentionning that the way it
was done was, as usual with companies doing forks, innappropriate.

> its reputation as a stable platform

Lol, most of the time, people (like you) say it's crap :D
Comment 84 Leon Rosenberg 2005-09-14 08:04:39 UTC
(In reply to comment #78)
> Guys, we're in active discussion with the Servlet Spec expert group about
> resolving this.  It's an open issue in the center of everyone's radar and
> highest on the priority list.  It'll get fixed in the best way possible as soon
> as possible.  Let's try to stay professional about it.

Thanx Yoav, great news! 
I hope this issue will get solved, and we all get our "I've found a bug in the
spec and all I got was this lousy t-shirt" t-shirts, and, most important, stable
production systems soon.

regards
Leon
Comment 85 Gregory Block 2005-09-14 08:33:35 UTC
1) Take the rant for what it is:  A demand that pointless excuses for why it is marked invalid end.
2) We'll all feel a lot better now that there's a fix on the cards instead of a lot of excuses.
3) "people like me" quietly use and support Tomcat in our community, and don't show up in forums 
because, up until reading this, we didn't even have issues like this *crossing our mind* to bring up.

Just get it fixed.  Don't care how - it's *not* an invalid bug, and there's no excuse for that.
Comment 86 Gregory Block 2005-09-14 10:29:21 UTC
Ok, so completely innocently here (despite what people may happen to think - I'm *still* incensed by 
reading the history on this bug, and still feel well and truly PO'd...)

Why does the following not represent an issue in DeltaSession 1.35?
http://cvs.apache.org/viewcvs.cgi/jakarta-tomcat-catalina/modules/cluster/src/share/org/apache/
catalina/cluster/session/DeltaSession.java?rev=1.35&view=markup

Access to attributes.put() remains unsynchronized; synchronizing the remainder, especially when other 
methods are synchronized, strikes me as an oversight - it should cause the exact same problem (albeit 
less often, because reads are being serialized) as a put can still cause the hashmap to resize, and 
therefore still hit the problem; those puts are also allowed during retrieval of attributes.keyset() and 
other long-running actions as a result of this.  removeAttributeInternal() also fails to synchronize the 
removal of the attribute from the map, and thus also creates the issue; attribute removal and insert can 
both cause a right-timed reader to fail.

So it's not just limited to StandardSession, which...
http://cvs.apache.org/viewcvs.cgi/jakarta-tomcat-catalina/catalina/src/share/org/apache/catalina/
session/StandardSession.java?rev=1.60&view=markup

also fails to synchronize its session data in the current 5.0.11 branch - although it, too, does half the 
job by performing unsynchronized reads, synchronizing writes, failing to synchronize keyset retrieval, 
etc.

So how is this not a 5.5 bug?  StandardSession will definitely exhibit the behavior a lot more often, but 
the bug is still fundamentally *there* on both.
Comment 87 Peter Rossbach 2005-09-14 14:50:53 UTC
(In reply to comment #86)

I have fix the unsync write session attribute operation at DeltaSession.

Peter
Comment 88 Wade Chandler 2005-09-14 16:10:32 UTC
(In reply to comment #87)
> (In reply to comment #86)
> 
> I have fix the unsync write session attribute operation at DeltaSession.
> 
> Peter
There are other places that have this issue I noticed while going through the
Tomcat code such as the StandardSession get/setNote methods which are not
synchronized and are used throughout the code without being synchronized.  They
are used in the authenticators I know and I'm sure they are used in other places
because of their nature.  Should I file a separate bug for each, or is it enough
to give a generic bug which states there are a lot of cases of unsynchronized
HashMap in Tomcat so they can be tracked down?  I would work on a patch for it
as long as I'm not working on it at the same time as anyone else.  Are all
instances being looked at as a result of issue 36541?
Comment 89 Vit Timchishin 2005-09-14 16:45:53 UTC
I am sorry if I am too late to push (the issue is already fixed), but in case
it's not I have something to say:
1)SRV 15.1.7 says nothing about HttpSession being not thread-safe. And AFAIR
this means the interface should be thread-safe.
2)SRV 7.7.1 says about synchronizing access to "session resources", but it says
nothing about session object itself. If one thinks session itself was meant in
it, he should agree that more exact method of synchronizing then "as
appropriate" should be used in the specs: the Developer can choose any way it
wishes to synchronize access to it's own resources but the method of
synchronizing access to a object (HttpSession) that is used by different
Developers (if such a synchronization is needed) MUST be defined.
If you read "session resources" as "session", this means anyone can choose any
appropriate method of synchronization of session object, that is totally incorrect.
As for me, this means that neither 7.7.1 nor any other point of specs does NOT
say that session object is unsafe and this means that in any implementation it
MUST be safe.
Comment 90 Will Pugh 2005-09-15 16:57:16 UTC
Syncing reads in addition to writes
> (which already prevents data corruption, and is done again in the entire 5.5
> branch) provides very little, since your reads are going to be dirty anyway
> unless you sync yourself at the application level. Basically, the
> ConcurrentHashMap, which uses the same algorithm as the regular HashMap, just
> does an additional synced read if the result was null to the initial unsynced
read.

I don't want to flog a dead horse, but I do not think the above is correct. 
Normally "Dirty Read" means that you can read an uncommitted transaction.  In
this case, it means that you can get an incorrect read when looking at a peice
of data that no one else is touching.  E.g.  If someone is adding Item X, you
may be prevented from seeing Item Y.  This is mostly caused by the resize issue
alluded to above, and is addressed by ConcurrentHashMap much differently than it
is addressed in HashMap.

If you look at the Get code for ConcurrentHashMap:
     V get(Object key, int hash) {
           if (count != 0) { // read-volatile
               HashEntry<K,V> e = getFirst(hash);
               while (e != null) {
                   if (e.hash == hash && key.equals(e.key)) {
                       V v = e.value;
                       if (v != null)
                           return v;
                       return readValueUnderLock(e); // recheck
                   }
                   e = e.next;
               }
           }
           return null;
       }

   HashEntry<K,V> getFirst(int hash) {
           HashEntry[] tab = table;
           return (HashEntry<K,V>) tab[hash & (tab.length - 1)];
       }

The code in getFirst first sets the table to a local variable, this is probabaly
meant to get around the issue brought up by issue#72 where the table can change
between gettting the table length and then accessing an index in that table.

In addition to this, there are a lot of other differences between
ConcurrentHashMap and Hashmap that address a number of these subtle issues.
Look at the differences between ConcurrentHashMap$Segment.rehash() and
HashMap.transfer().  Hashmap.transfer() simply transfers all the entries over to
the new list.  ConcurrentHashMap$Segment.rehash() needs to determine which
entries can be moved over and which need to be copied so that it can keep the
current table used by the ConcurrentHashMap valid.

The overall point here is that concurrency issues are subtle, and using data
structures that are not meant to be used concurrently in a concurrent manner can
be dnagerous.
Comment 91 Will Pugh 2005-09-15 23:18:52 UTC
Actually, for completeness, I thought I would mention some of the other more
subtle differences between HashMap and ConcurrentHashMap that make this dangerous.

In both HashMap and ConcurrentHashMap adding a new entry comes down to a line
that looks something like:

  table[bucketIndex] = new Entry<K,V>(hash, key, value, e);

Looks safe, right?  Problem is that Entry<K,V>.next is not final, so it need not
be actually flushed to memory by the time table[bucketIndex] is set.  For
ConcurrentHashMap, HashEntry<K,V>.next is declared final, so it must be flushed
to memory before the constructor is finished.

This means that for the HashMap case, when doing a get, you can get this
partially created instance and not look down the rest of the chain, and thus not
find your element.  In the ConcurrentHashMap case, this cannot happen because of
the above mentioned "final" variable.  For more info on this read section 17.5
of the JVM spec as it is revised for 1.5

http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.5
Comment 92 Remy Maucherat 2005-09-19 13:47:03 UTC
As mandated in the next Servlet specification (the change seems to be accepted),
the session internal maps are now fully synchronized, by using a Hashtable.
Comment 93 Leon Rosenberg 2005-09-19 14:23:11 UTC
could you post a link to the new spec? 
I can't find any changes under:
http://www.jcp.org/aboutJava/communityprocess/maintenance/jsr154/index3.html 
Comment 94 Andreas Schildbach 2005-09-19 22:56:56 UTC
If session attributes are now going to be synchronized as per spec, what about
servlet context attributes? They will face the same issue.
Comment 95 Craig McClanahan 2005-09-20 00:06:46 UTC
(In reply to comment #94)
> If session attributes are now going to be synchronized as per spec, what about
> servlet context attributes? They will face the same issue.

The proposed (and now committed) change protects the attribute collection
*inside* the HttpSession object, eliminating the opportunity for an application
to cause that collection to become corrupted.  But the fact that a session
attribute can be accessed from multiple threads, and therefore might need
internal thread safety checks internal to itself, is still the application
designer's issue to solve.  Indeed, it's not anything a container could do
anything about, even if it wanted to.

The attribute collection inside Tomcat's ServletContext implementation was
already synchronized, and not subject to corruption.  Only the technique used to
represent *session* attributes was in question here.
Comment 96 Andreas Schildbach 2005-09-21 20:44:16 UTC
(In reply to comment #95)

> The attribute collection inside Tomcat's ServletContext implementation was
> already synchronized, and not subject to corruption.

Yeah, but what does the spec say (now)? If it is about as unclear as with the
session attribute collection, history will repeat itself: Tomcat or some other
container will implement ServletContext attribute collection unsynchronized
(because its faster), some people will whine about lockups, spec needs to be
changed one more time, container needs to be changed in order to be (new) spec
compliant.