Summary: | [websocket] Stability issues when concurrently sending large messages | ||
---|---|---|---|
Product: | Tomcat 8 | Reporter: | Konstantin Preißer <kpreisser> |
Component: | WebSocket | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 8.0.x-trunk | ||
Target Milestone: | ---- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: |
Test case (modifications to the Chat example)
Screenshot of FF getting a raw websocket respone (1) Screenshot of FF getting a raw websocket respone (2) Updated Test case (modifications to the Chat example) Sample project to replicate regression |
Description
Konstantin Preißer
2013-11-20 00:01:34 UTC
Created attachment 31056 [details]
Screenshot of FF getting a raw websocket respone (1)
Created attachment 31057 [details]
Screenshot of FF getting a raw websocket respone (2)
Note: When using Google Chrome and when the WebSocket connection fails, Chrome reports one of the following errors in the console: WebSocket connection to 'ws://localhost:8080/examples/websocket/chat' failed: Status line contains embedded null WebSocket connection to 'ws://localhost:8080/examples/websocket/chat' failed: Received unexpected continuation frame. I can see one problem immediately. While the code does check for concurrent messages (which will trigger an IllegalStateException) text messages are converted to bytes before that check in a single buffer. Therefore, multiple threads could access that buffer and that will cause corruption. I'll take a look. I've hopefully fixed this in trunk. If you can provide some feedback on the fix that would be great. I'm leaving this open as I have a little clean-up still to do in trunk for this fix and I also want to back-port the changes to 7.0.x. Created attachment 31060 [details] Updated Test case (modifications to the Chat example) Hi Mark, I updated my working copy of Tomcat 8 trunk to r1543817 and have the changes applied from the attached patch. It modifies the Chat example in the same way as the previous test case (and it changes the catch (IOException e) to catch (Exception e) to also catch ISE, and do not remove a client in this catch block - instead wait for the onClose method to be called). Unfortunately, even with the fix and the added synchronized() block in the Chat's broadcast() method, I still see the same problems as before (WebSocket connection is closed soon after it is opened, Chrome displays "Unexpected continuation frame" error, browser receives raw websocket response instead of the requested chat.xhtml file). I also got following exceptions (Note: One of the NPEs is probably the same that I once observed with the DrawBoard example and the NIO connector when repeatedly pressing F5): Exception in thread "WebSocketServer-localhost-/examples-1" java.lang.IllegalStateException: Unexpected state. Please report a bug. Message will not be sent because the WebSocket session is currently sending another message at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:272) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:643) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.onResult(WsRemoteEndpointImplBase.java:653) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.endMessage(WsRemoteEndpointImplBase.java:305) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$EndMessageHandler.onResult(WsRemoteEndpointImplBase.java:453) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer$OnResultRunnable.run(WsRemoteEndpointImplServer.java:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 20-Nov-2013 15:25:04.759 SEVERE [http-nio-8080-exec-4] websocket.chat.ChatAnnotation.onError Chat Error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.onWritePossible(AbstractServletOutputStream.java:134) at org.apache.coyote.http11.upgrade.AbstractProcessor.upgradeDispatch(AbstractProcessor.java:97) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 20-Nov-2013 15:30:49.345 SEVERE [http-nio-8080-ClientPoller-1] org.apache.tomcat.util.net.NioEndpoint.processSocket Error allocating socket processor java.lang.NullPointerException at org.apache.tomcat.util.net.NioEndpoint.processSocket(NioEndpoint.java:624) at org.apache.tomcat.util.net.NioEndpoint$Poller.processKey(NioEndpoint.java:1165) at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1122) at java.lang.Thread.run(Thread.java:744) 20-Nov-2013 15:32:33.070 SEVERE [http-nio-8080-exec-5] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun java.lang.NullPointerException at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:593) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 20-Nov-2013 15:32:53.378 SEVERE [http-nio-8080-exec-23] org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process Error reading request, ignored java.lang.IllegalStateException: Unexpected state. Please report a bug. Message will not be sent because the WebSocket session is currently sending another message at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:272) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:643) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.onResult(WsRemoteEndpointImplBase.java:653) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.endMessage(WsRemoteEndpointImplBase.java:305) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$EndMessageHandler.onResult(WsRemoteEndpointImplBase.java:453) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.clearHandler(WsRemoteEndpointImplServer.java:206) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:102) at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsWriteListener.onWritePossible(WsHttpUpgradeHandler.java:238) at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.onWritePossible(AbstractServletOutputStream.java:167) at org.apache.coyote.http11.upgrade.AbstractProcessor.upgradeDispatch(AbstractProcessor.java:97) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 20-Nov-2013 15:24:55.951 SEVERE [http-nio-8080-exec-9] websocket.chat.ChatAnnotation.onError Chat Error: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Key must be cancelled java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Key must be cancelled at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:245) at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:476) at org.apache.tomcat.websocket.WsSession.onClose(WsSession.java:439) at org.apache.tomcat.websocket.WsFrameBase.processDataControl(WsFrameBase.java:316) at org.apache.tomcat.websocket.WsFrameBase.processData(WsFrameBase.java:270) at org.apache.tomcat.websocket.WsFrameBase.processInputBuffer(WsFrameBase.java:116) at org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:55) at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:192) at org.apache.coyote.http11.upgrade.AbstractServletInputStream.onDataAvailable(AbstractServletInputStream.java:180) at org.apache.coyote.http11.upgrade.AbstractProcessor.upgradeDispatch(AbstractProcessor.java:95) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Key must be cancelled at org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:102) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:241) ... 16 more Caused by: java.io.IOException: Key must be cancelled at org.apache.coyote.http11.upgrade.NioServletOutputStream.doWriteInternal(NioServletOutputStream.java:83) at org.apache.coyote.http11.upgrade.NioServletOutputStream.doWrite(NioServletOutputStream.java:60) at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.writeInternal(AbstractServletOutputStream.java:118) at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.write(AbstractServletOutputStream.java:85) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:94) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:378) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:279) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:236) ... 16 more [Probably expected, when the browser aborts the connection:] 20-Nov-2013 15:25:24.777 SEVERE [http-nio-8080-exec-8] websocket.chat.ChatAnnotation.onError Chat Error: java.io.IOException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException java.io.IOException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:245) at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:476) at org.apache.tomcat.websocket.WsSession.onClose(WsSession.java:439) at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.close(WsHttpUpgradeHandler.java:172) at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.access$200(WsHttpUpgradeHandler.java:45) at org.apache.tomcat.websocket.server.WsHttpUpgradeHandler$WsReadListener.onDataAvailable(WsHttpUpgradeHandler.java:194) at org.apache.coyote.http11.upgrade.AbstractServletInputStream.onDataAvailable(AbstractServletInputStream.java:180) at org.apache.coyote.http11.upgrade.AbstractProcessor.upgradeDispatch(AbstractProcessor.java:95) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:640) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException at org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:102) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:241) ... 14 more Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:474) at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:123) at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:185) at org.apache.coyote.http11.upgrade.NioServletOutputStream.doWriteInternal(NioServletOutputStream.java:93) at org.apache.coyote.http11.upgrade.NioServletOutputStream.doWrite(NioServletOutputStream.java:60) at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.writeInternal(AbstractServletOutputStream.java:118) at org.apache.coyote.http11.upgrade.AbstractServletOutputStream.write(AbstractServletOutputStream.java:85) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:94) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:378) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:279) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:236) ... 14 more I'm not seeing the errors you are seeing but I did spot a possible failure mode which I have now fixed. Could you re-test 8.0.x please. Hi Mark, (In reply to Mark Thomas from comment #7) > I'm not seeing the errors you are seeing but I did spot a possible failure > mode which I have now fixed. Could you re-test 8.0.x please. Thanks a lot. I updated to r1543948 and cannot reproduce the main problems any more (like websocket connection suddenly closes, wrong response sent to the browser). I think it is fixed now. I also don't get NPEs or IllegalArgumentExceptions any more. I still do see IOExceptions ("Software caused connection abort" / "Key must be cancelled") and ClosedChannelExceptions reported through the ChatAnnotation's onError method, but these are probably expected due to the browser aborting the connection when repeatedly pressing F5. I also still see the following exception: Exception in thread "WebSocketServer-localhost-/examples-14" java.lang.IllegalStateException: Message will not be sent because the WebSocket session has been closed at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:306) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:273) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:637) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.onResult(WsRemoteEndpointImplBase.java:649) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.endMessage(WsRemoteEndpointImplBase.java:299) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$EndMessageHandler.onResult(WsRemoteEndpointImplBase.java:447) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer$OnResultRunnable.run(WsRemoteEndpointImplServer.java:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) (In reply to Konstantin Preißer from comment #8) > I also still see the following exception: Actually I meant the following exception here (but the other one also occurs): Exception in thread "WebSocketServer-localhost-/examples-1" java.lang.IllegalStateException: Unexpected state. Please report a bug. Message will not be sent because the WebSocket session is currently sending another message at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:266) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.write(WsRemoteEndpointImplBase.java:637) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$TextMessageSendHandler.onResult(WsRemoteEndpointImplBase.java:649) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.endMessage(WsRemoteEndpointImplBase.java:299) at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$EndMessageHandler.onResult(WsRemoteEndpointImplBase.java:447) at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer$OnResultRunnable.run(WsRemoteEndpointImplServer.java:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) (In reply to Konstantin Preißer from comment #8) > Thanks a lot. I updated to r1543948 and cannot reproduce the main problems > any more (like websocket connection suddenly closes, wrong response sent to > the browser). I think it is fixed now. After some additional testing, the problem that the browser could not establish a WebSocket connection / Websocket connection is closed a few seconds after opening it happened again on Google Chrome, the console displayed the following: WebSocket connection to 'ws://localhost:8080/examples/websocket/chat' failed: Received new data frame but previous continuous frame is unfinished. OK. Getting better but maybe not quite there yet. Because WebSocket has the potential for multiple threads to be working with a socket concurrently (one read, one write) there is a world of opportunity for weird and wonderful failure modes. It is possible that that is what we are seeing here but I'd like to at least understand each class of exceptions you are seeing. The IOExceptions and ClosedChannelExceptions are expected when the client drops the connection. I'm not worried about these. The IllegalStateException (closed) might be OK but I need to confirm. The IllegalStateException (report a bug) shouldn't happen so I want to look at this some more. The "Received new data frame but previous continuous frame is unfinished." definitely shouldn't happen. Need to look into that as well. (In reply to Mark Thomas from comment #11) > The IllegalStateException (closed) might be OK but I need to confirm. It is OK but could be handled better so I have improved the handling of this. > The IllegalStateException (report a bug) shouldn't happen so I want to look > at this some more. The logic was wrong here so it has been fixed. > The "Received new data frame but previous continuous frame is unfinished." > definitely shouldn't happen. Need to look into that as well. I suspect that these were a result of the logic errors. 8.0.x has been updated. As I still can't reproduce this I'd again appreciate your help with testing. Hi Mark, (In reply to Mark Thomas from comment #12) > 8.0.x has been updated. As I still can't reproduce this I'd again appreciate > your help with testing. I updated to r1544165 and I'm now unable to reproduce the Exceptions (besides the expected IOException) and the frame error that Google Chrome's console had shown, so I think it is now fixed. Thanks! (The only problem I got was that it took several minutes for Tomcat to shutdown after testing - it did not close the Websocket Connections and even after closing all browsers it took some minutes for Tomcat to complete shutdown.) Thanks for the testing. The slow shutdown could be caused by not correctly handling an IOException. That is something to follow-up in a separate BZ issue. REOPENING, there is a regression See thread on users@, "IllegalStateException sending WebSocket message that worked a few months ago" http://tomcat.markmail.org/thread/75uisvwve2l2u3n2 Apparently using an OutputStream obtained via WsRemoteEndpointImplBase.getSendStream() is broken. There is a test case for a writer, TestWsRemoteEndpoint.doTestWriter(..). It seems that there is no test case for a stream. There is a small difference in implementation between doWrite() methods of WsRemoteEndpointImplBase$WsWriter and WsRemoteEndpointImplBase$WsOutputStream: The writer calls a package-visible method that does not perform a state check. The stream calls a public method that performs a state check and fails. I've fixed the regression and added a test case for Stream messages to trunk and 7.0.x. Created attachment 31078 [details] Sample project to replicate regression Something still isn't quite right. I've attached a sample project. It's silly-looking, I know, but it's simple and it demonstrates the problem reliably every time. Steps: 1. Compile it and deploy to /cluster. 2. Go to http://localhost:8080/cluster/clusterNode1. The browser should display "OK" and there should be some diagnostic output in stdout. 3. Go to http://localhost:8080/cluster/clusterNode2. The browser should display "OK" and there should be a lot more diagnostic output in stdout. -- Step 3 is where it failed two days ok. You never saw "OK" and the diagnostic output. You just got the exception in the browser. After Mark's changes last night / this morning, step 3 succeeds. 4. Go back to http://localhost:8080/cluster/clusterNode1. The exception is back. 5. Go back to http://localhost:8080/cluster/clusterNode2. The exception is back there, tee. Should be fixed now. Test case appears to work. (No errors reported) Confirmed. Looks like it is working now. Thanks! |