The current implementation uses the blocking SslSocketConnector which takes the default maxIdleTime as 200 seconds. We noticed in our cluster that when users use a custom client that accesses the WebHDFS REST APIs through https, it could block all the 250 handler threads in NN jetty server, causing severe performance degradation for accessing WebHDFS and NN web UI. Attached screenshots (blocking_1.png and blocking_2.png) illustrate that when using SslSocketConnector, the jetty handler threads are not released until the 200 seconds maxIdleTime has passed. With sufficient number of SSL connections, this issue could render NN HttpServer to become entirely irresponsive.
We propose to use the non-blocking SslSelectChannelConnector as a fix. We have deployed the attached patch within our cluster, and have seen significant improvement. The attached screenshot (unblocking.png) further illustrates the behavior of NN jetty server after switching to using SslSelectChannelConnector.
The patch further disables SSLv3 protocol on server side to preserve the spirit of