A user was asking questions on the tomcat users mailing list about TC and how it handled concurrency in sessions and session passivation/object caching of sessions when they are inactive etc. I thought, surely this is using locks etc, so had responded as such. Then I got curious. I started looking in the code and found that indeed TC has a concurrency issue when the session is to pushed to disk if it is inactive or the max number of live sessions is too high. The problem is a user could have a session getting ready to become inactive. The server is beginning to write the session out to disk. The user then comes in on a request and gets the current session. They set an attribute. The web application and the user think all is OK. But in reality the data they just put into the session will be lost. The server just dumped their session to disk containing the old data as they updated the copy. So, the next time they come in the session will be pulled from disk with the old values and anything they put into it will have been lost. Depending on the application this could be very bad. To see the issue go to the file: java/org/apache/catalina/session/PersistentManagerBase.java methods: swapIn swapOut writeSession findSession (other related) then the different stores load and save methods. There is nothing keeping this from happening. If a lock is on a per session level this should keep it snappy for all other requests etc. Either the session could be used as the lock or an object instance variable on the session instance. I noticed there is a comment in the source code for PersistentManagerBase.processMaxIdleSwaps which reads: // Swap out all sessions idle longer than maxIdleSwap // FIXME: What's preventing us from mangling a session during // a request? So, apparently someone thought of this. Nothing is keeping it from mangling a session currently. What might be a fix is a session is given an instance variable which can be used for a synchronization lock at the session level. In PersistentManagerBase.findSession, this lock would be used before the session can be returned. After the session map is accessed it should then return null if the session had been passivating before it was asked to be found. The session would then be loaded from storage again before it was ever given back to be accessed had it been passivating. It would also be used in PersistentManagerBase.* which call swapOut. The code would look like this (might explain it better): StandardSession: /** Used for locking the session during persistence operations. **/ Integer persistence_locker = new Integer(0); PersistentManagerBase: /** * Return the active Session, associated with this Manager, with the * specified session id (if any); otherwise return <code>null</code>. * This method checks the persistence store if persistence is enabled, * otherwise just uses the functionality from ManagerBase. * * @param id The session id for the session to be returned * * @exception IllegalStateException if a new session cannot be * instantiated for any reason * @exception IOException if an input/output error occurs while * processing this request */ public Session findSession(String id) throws IOException { Session session = super.findSession(id); //OK, at this point, we're not sure if another thread is trying to //remove the session or not so the only way around this is to lock //it (or attempt to) and then try to get it by this session id again. //If the other code ran swapOut, then we should get a null back during //this run, and if not, then by doing this we lock it out and then can //access the session safely and will call access on it to update the //access time and hopefully keep the processes from running swapOut //so, we're adding two accesses to the hashmap instead of one for //each request. But, we know we get the session or null back after //the other process has had a chance to remove it or not. if(session!=null{ synchronized(session.persistence_locker){ session = super.findSession(session.getIdInternal()); if(session!=null){ //we need to do this here //to keep any external calling code from messing up the //concurrency. session.access(); } } } if (session != null) return (session); // See if the Session is in the Store session = swapIn(id); return (session); } /** * Swap idle sessions out to Store if they are idle too long. */ protected void processMaxIdleSwaps() { if (!isStarted() || maxIdleSwap < 0) return; Session sessions[] = findSessions(); long timeNow = System.currentTimeMillis(); // Swap out all sessions idle longer than maxIdleSwap // FIXME: What's preventing us from mangling a session during // a request? if (maxIdleSwap >= 0) { for (int i = 0; i < sessions.length; i++) { StandardSession session = (StandardSession) sessions[i]; synchronized(session.persistence_locker){ if (!session.isValid()) continue; int timeIdle = // Truncate, do not round up (int) ((timeNow - session.getLastAccessedTime()) / 1000L); if (timeIdle > maxIdleSwap && timeIdle > minIdleSwap) { if (log.isDebugEnabled()) log.debug(sm.getString ("persistentManager.swapMaxIdle", session.getIdInternal(), new Integer(timeIdle))); try { swapOut(session); } catch (IOException e) { ; // This is logged in writeSession() } } } } } } /** * Swap idle sessions out to Store if too many are active */ protected void processMaxActiveSwaps() { if (!isStarted() || getMaxActiveSessions() < 0) return; Session sessions[] = findSessions(); // FIXME: Smarter algorithm (LRU) if (getMaxActiveSessions() >= sessions.length) return; if(log.isDebugEnabled()) log.debug(sm.getString ("persistentManager.tooManyActive", new Integer(sessions.length))); int toswap = sessions.length - getMaxActiveSessions(); long timeNow = System.currentTimeMillis(); for (int i = 0; i < sessions.length && toswap > 0; i++) { synchronized(sessions[i].persistence_locker){ int timeIdle = // Truncate, do not round up (int) ((timeNow - sessions[i].getLastAccessedTime()) / 1000L); if (timeIdle > minIdleSwap) { if(log.isDebugEnabled()) log.debug(sm.getString ("persistentManager.swapTooManyActive", sessions[i].getIdInternal(), new Integer(timeIdle))); try { swapOut(sessions[i]); } catch (IOException e) { ; // This is logged in writeSession() } toswap--; } } } } /** * Back up idle sessions. */ protected void processMaxIdleBackups() { if (!isStarted() || maxIdleBackup < 0) return; Session sessions[] = findSessions(); long timeNow = System.currentTimeMillis(); // Back up all sessions idle longer than maxIdleBackup if (maxIdleBackup >= 0) { for (int i = 0; i < sessions.length; i++) { StandardSession session = (StandardSession) sessions[i]; synchronized(session.persistence_locker){ if (!session.isValid()) continue; int timeIdle = // Truncate, do not round up (int) ((timeNow - session.getLastAccessedTime()) / 1000L); if (timeIdle > maxIdleBackup) { if (log.isDebugEnabled()) log.debug(sm.getString ("persistentManager.backupMaxIdle", session.getIdInternal(), new Integer(timeIdle))); try { writeSession(session); } catch (IOException e) { ; // This is logged in writeSession() } } } } } } So, currently one could very easily have a session and have some issues. Your user would have a time when they thought they set something or your code could think this then immediately the changes are lost. Anyways, that is what looks to be the issue and a workable fix. The only place I saw other issues was inside of: java/org/apache/catalina/valves/PersistentValve.java where it incorrectly grabs the store from the PersistentManager and uses it directly instead of using the manager API. To me this is bad in that the manager is not able to be the manager and this other logic is accessing the store directly and should never happen...unless it is used only in test cases etc. The only way you could possibly *sort of* workaround this issue would be to use a HttpSessionActivationListener, but even then it would be hard to code up a 100% reliable solution. You can see how this would *sort of* be a way to work around the issue by looking at the class StandardSession method passivate , and see the events will be thrown to alert code the session will passivate, but it would be pretty complicated and ugly to protect the session completely.
I thought about marking this as a P1 as it could cause some very hard to track down data loss in a web application. What is the difference between P1 and P2 for the Tomcat project?
Using a plain Object to synchronize on rather than an Integer(0) would be very slightly more efficient.
Any reason not to use the Session object itself as the monitor?
No real reason I can find. We might want to use an instance variable as the lock just in case there are ever any other synchronized methods or sections added to the session where the entire session is locked. That would help if those synchronized session methods would need to be used by other threads and could be accessed even during passivation. I only find one place where the session itself is locked and that is in the expires method where a synchronized (this) is used. So, using the session itself would be OK for now, and it may logically work out to be the case that no other action should ever be able to actually be made or ever wanted on the session while it is being cached/passivated. I mentioned it could be used as the lock in the my description some where. So, at least for now, or always if deemed to always be wanted, I don't see any reason why it itself can not be the lock.
I have applied a patch to trunk based on the discussion here and proposed it for 6.0.x.
The patch has been applied to 6.0.x and will be in 6.0.17 onwards. Thanks for the suggested patch.
Created attachment 23592 [details] Reproducer, part 1: web application Web application for TC 5.5, 6.0 to reproduce the issue
Created attachment 23593 [details] Reproducer, part 2: client HTTPClient. commons-httpclient-3.1.jar and its dependencies are not included. To run: 1. Deploy bug43343.war on a web server 2. Unpack this bug43343_client.zip and put the following libraries into lib subfolder: commons-httpclient-3.1.jar commons-codec-1.3.jar commons-logging-1.1.1.jar 3. Run client.bat if you are on Windows The client creates several (100) instances of HttpClient that connect to http://localhost:8080/bug43343/index.jsp?number=nn&loop=ll Each HttpClient remembers its cookies, and thus connects to a separate session. The number parameter in URL is unique for each HttpClient instance, and loop starts with 0 and increments by 1 for each request from the same client. The context.xml file of the web application configures PersistentManager with some low values, to that this issue is more visible. The requests are being sent serially, with an interval of at least 4 seconds between request of the same client. The client prints out the response of web application, and terminates if status code 500 is received. Expected result: Successful run of the client. That is, 150 loops, which needs about 10 minutes to complete.
I am reopening this issue, because: 1. The fix was not applied to TC 5.5, and the issue can be reproduced with TC 5.5.27. 2. The fix [1], as applied to 6.0 is not complete, though one may argue. I am reducing severity to "normal". The issue does not apply to the default configuration, where StandardManager is used. [1] http://svn.apache.org/viewvc?rev=656751&view=rev On the issue: When PersistentManagerBase swaps out a session it performs two actions (see its implementation of swapOut()): 1. writing session data out 2. calling session.recycle() which invalidates the session When session that is being used by an active request is being swapped out at the same time (and what is the need for doing that? - you will not be able to free the memory, because there is a reference), two races can occur: Case A: request writes data to session at the same time as it is being written out and before the recycle() call. Result: the update is lost, next request will see the old values. This is the case that is fixed by PersistentManagerBase patch in rev=656751 [1]. Case B: request accesses session after session.recycle(), e.g. after swapOut() completes. Result: java.lang.IllegalStateException: setAttribute: Session already invalidated, though the session was not invalidated, but just swapped out. --------------------------------------------- Sample output from the client when Case A is observed (with TC 5.5.27): Method failed: HTTP/1.1 500 Internal Server Error Error (number=-1509748992, loop=140): java.lang.IllegalStateException: Attribute loop is not equal to the expected val ue. Expected: 139 Actual: 138 --------------------------------------------- Case B can be reproduced more easily if you replace s/ if (false) { / if (true) { / on line 57 in file index.jsp of the web application. Sample output from the client when Case B is observed (current tc6.0.x, with the mentioned change of line 57): Method failed: HTTP/1.1 500 Internal Server Error Error (number=-1309785265, loop=0): java.lang.IllegalStateException: setAttribute: Session already invalidated I have also seen the following NullPointerException when running against TC 5.5.27, and I think that it is the Case B: The server encountered an internal error () that prevented it from fulfilling this request. exception: org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServle tWrapper.java:460) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:373) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) root cause: java.lang.NullPointerException org.apache.catalina.session.StandardSession.setAttribute(StandardSession .java:1309) org.apache.catalina.session.StandardSession.setAttribute(StandardSession .java:1248) org.apache.catalina.session.StandardSessionFacade.setAttribute(StandardS essionFacade.java:130) org.apache.jsp.index_jsp._jspService(index_jsp.java:88) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:331) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) --------------------------------------------- Case B can be solved by avoiding swapping out sessions that are owned by a request. Synchronizing on a session is not sufficient. A workaround is to avoid enabling IdleSwap in the configuration (it is disabled by default).
Case B fixed in trunk and proposed for 6.0.x Complete fix proposed for 5.5.x.
This has been fixed for 6.0.x and will be included in 6.0.21 onwards. Transferring to TC5 so it can be used to track the fix there.
This has been fixed in 5.5.x and will be included in 5.5.28 onwards.