Created attachment 36251 [details] Figures 1 & 2 We are using the Tomcat APR connector in our application to perform client-certificate validation with OCSP checks. We've noticed a gradual increase in the memory consumed by the Java process until the system runs out of memory and the OOM-killer we configured kills and restarts the process. The application we created is queried often (every second by two simultaneous clients). We have tested this with two types of client certificates from two different root CA's: PKIoverheid (the root certificate of the Dutch national government) and Comodo certificates, both containing OCSP urls. We first noticed the problem with the PKIoverheid certificates, which are larger in size than the Comodo certificates. In figure 1, showing the available server memory, you can see that using these larger PKIoverheid certificates the server runs out of memory every 2,5 - 3 hours. Afterwards we tried the same thing with smaller Comodo certificates (see figure 2) which has the same result but takes a longer time (15 hours). When we turned off the client certificate validation by either commenting out the call to X509_verify_cert in OpenSSL (which in turn calls Tomcat Native's SSL_callback_SSL_verify that performs the OCSP checks) or setting SSLVerifyClient to "none" and clientAuth to "false" in the APR connector the server did not run out of memory and the graph of available memory flatlines. I have tested this with the Apache Native Library v1.2.17, Tomcat v9.0.12, APR v1.5.2 and JDK v1.8.0_181 running on an Ubuntu 16.04.5 server. On the JBoss jira I spotted a similar issue where somebody used different versions but had the same problem: https://issues.jboss.org/browse/JWS-1140.
I have further isolated the issue by replacing the verify_cb function 'SSL_callback_SSL_verify' (from the Tomcat Native Library) with a no-op function. When I do this the available memory remains constant, our test server didn't run out of memory all weekend with the same polling frequency as before.
replacing SSL_callback_SSL_verify() by no-op disable all the OSCP checks, that is probably not what you want to do... But yes that shows that the leak is somewhere in SSL_callback_SSL_verify().
You are correct jfclere, I indeed only tried this in an attempt to isolate the cause of the leak. I should have been more clear in my previous comment :-)
The problem is OCSP_parse_url() we have forgotten: OPENSSL_free(hostname); OPENSSL_free(c_port); OPENSSL_free(path); I will commit the fix tomorrow, testing it now.
Try with r1846499, I still have another memory leak but can't find where.
I was getting errors in the Python build script when running the buildconf file: ImportError: No module named 'ConfigParser' And I tried to run the buildcheck.sh file which reported I didn't have Python installed, but I do: sander:/tmp$ python Python 2.7.12 (default, Dec 4 2017, 14:50:18) So for now I've applied the patch you suggested to the downloaded sources of Tomcat Native Library 1.2.17 I was using. Thank you for the fix! I will report back in a few hours.
ImportError: No module named 'ConfigParser' that is because you are using python... You need an apr version that supports python3 or use python2.
Ok, I will try again to build the code from SVN and see if it makes a difference, but right now the server still runs out of memory. I have added the three lines of code you suggested in this place: free_bio: BIO_free(bio_req); free_req: if(apr_sock && ok) /* if ok == 0 we have already closed the socket */ apr_socket_close(apr_sock); apr_pool_destroy(mp); sk_OCSP_CERTID_free(ids); OCSP_REQUEST_free(ocsp_req); // Manually added code OPENSSL_free(hostname); OPENSSL_free(c_port); OPENSSL_free(path); // End manually added code end: return ocsp_resp; It seems that this does have a positive effect on the memory usage, it now took 4,5 hours to run out of memory rather than 3 but the end result is still the same. I will report back when I've tried the exact commit in SVN.
OK I know that adding: OPENSSL_free(hostname); OPENSSL_free(c_port); OPENSSL_free(path); is not enough, but I am happy it helps ;-)
try with http://svn.apache.org/viewvc?rev=1846593&view=rev I think I have fixed all the leaks now.
It worked! I updated the code yesterday and the server still hasn't run out of memory. In the graph I can see that it stabilises nicely. Thank you so much jfclere! :-)