Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      In parent issue SOLR-5776, NullSecureRandom was introduced and SSLTestConfig was refactored so that both client & server would use it to prevent blocked threads waiting for entropy.

      Since those commits to master & branch_6x, all Solaris jenkins builds got failures at the same spots in TestMiniSolrCloudClusterSSL.testSslAndNoClientAuth - and looking at the logs the root cause appears to be intranode communication failures due to "javax.crypto.BadPaddingException"

      Initial speculation was that perhaps the Solaris SSL impl has bugs in it's padding code that are tickled when the SecureRandom instance returns long strings of null bytes, but subsequently we got reports of similar, less frequently occuring, bugs on other OSs (see SOLR-9082).

      1. SOLR-9068.Lucene-Solr-6.x-Solaris_110.log
        2.10 MB
        Hoss Man
      2. SOLR-9068.Lucene-Solr-master-Solaris_558.log
        4.74 MB
        Hoss Man
      3. SOLR-9068.patch
        2 kB
        Hoss Man
      4. SOLR-9068.patch
        2 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          hossman Hoss Man added a comment -

          Attaching Jenkins failure logs...

          http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/558/consoleText
          http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/110/consoleText

          Interesting bits...

             [junit4]   2> 1664862 ERROR (OverseerThreadFactory-5652-thread-2-processing-n:127.0.0.1:55264_solr) [n:127.0.0.1:55264_solr    ] o.a.s.c.OverseerCollectionMessageHandler Error from shard: https://127.0.0.1:55219/solr
             [junit4]   2> org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:55219/solr
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
             [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
             [junit4]   2> 	at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195)
             [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> Caused by: javax.net.ssl.SSLHandshakeException: Invalid TLS padding data
             [junit4]   2> 	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1020)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
             [junit4]   2> 	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
             [junit4]   2> 	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
             [junit4]   2> 	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
             [junit4]   2> 	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
             [junit4]   2> 	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
             [junit4]   2> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
             [junit4]   2> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
             [junit4]   2> 	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
             [junit4]   2> 	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
             [junit4]   2> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
             [junit4]   2> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
             [junit4]   2> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:511)
             [junit4]   2> 	... 11 more
             [junit4]   2> Caused by: javax.crypto.BadPaddingException: Invalid TLS padding data
             [junit4]   2> 	at sun.security.ssl.CipherBox.removePadding(CipherBox.java:751)
             [junit4]   2> 	at sun.security.ssl.CipherBox.decrypt(CipherBox.java:491)
             [junit4]   2> 	at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015)
             [junit4]   2> 	... 27 more
          

          And...

             [junit4]   2> 1655925 ERROR (OverseerThreadFactory-3737-thread-2-processing-n:127.0.0.1:34220_solr) [n:127.0.0.1:34220_solr    ] o.a.s.c.OverseerCollectionMessageHandler Error from shard: https://127.0.0.1:43535/solr
             [junit4]   2> org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:43535/solr
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:604)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
             [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
             [junit4]   2> 	at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195)
             [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
             [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> Caused by: javax.net.ssl.SSLHandshakeException: Invalid Padding length: 162
             [junit4]   2> 	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1020)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
             [junit4]   2> 	at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
             [junit4]   2> 	at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
             [junit4]   2> 	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
             [junit4]   2> 	at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
             [junit4]   2> 	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
             [junit4]   2> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
             [junit4]   2> 	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
             [junit4]   2> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
             [junit4]   2> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
             [junit4]   2> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
             [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:495)
             [junit4]   2> 	... 11 more
             [junit4]   2> Caused by: javax.crypto.BadPaddingException: Invalid Padding length: 162
             [junit4]   2> 	at sun.security.ssl.CipherBox.removePadding(CipherBox.java:743)
             [junit4]   2> 	at sun.security.ssl.CipherBox.decrypt(CipherBox.java:491)
             [junit4]   2> 	at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172)
             [junit4]   2> 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015)
             [junit4]   2> 	... 25 more
          
          Show
          hossman Hoss Man added a comment - Attaching Jenkins failure logs... http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/558/consoleText http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/110/consoleText Interesting bits... [junit4] 2> 1664862 ERROR (OverseerThreadFactory-5652-thread-2-processing-n:127.0.0.1:55264_solr) [n:127.0.0.1:55264_solr ] o.a.s.c.OverseerCollectionMessageHandler Error from shard: https://127.0.0.1:55219/solr [junit4] 2> org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:55219/solr [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) [junit4] 2> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) [junit4] 2> at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195) [junit4] 2> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [junit4] 2> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: javax.net.ssl.SSLHandshakeException: Invalid TLS padding data [junit4] 2> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) [junit4] 2> at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) [junit4] 2> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1020) [junit4] 2> at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) [junit4] 2> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) [junit4] 2> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) [junit4] 2> at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394) [junit4] 2> at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353) [junit4] 2> at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) [junit4] 2> at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) [junit4] 2> at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) [junit4] 2> at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) [junit4] 2> at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) [junit4] 2> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) [junit4] 2> at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) [junit4] 2> at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) [junit4] 2> at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) [junit4] 2> at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:511) [junit4] 2> ... 11 more [junit4] 2> Caused by: javax.crypto.BadPaddingException: Invalid TLS padding data [junit4] 2> at sun.security.ssl.CipherBox.removePadding(CipherBox.java:751) [junit4] 2> at sun.security.ssl.CipherBox.decrypt(CipherBox.java:491) [junit4] 2> at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172) [junit4] 2> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015) [junit4] 2> ... 27 more And... [junit4] 2> 1655925 ERROR (OverseerThreadFactory-3737-thread-2-processing-n:127.0.0.1:34220_solr) [n:127.0.0.1:34220_solr ] o.a.s.c.OverseerCollectionMessageHandler Error from shard: https://127.0.0.1:43535/solr [junit4] 2> org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:43535/solr [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:604) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) [junit4] 2> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) [junit4] 2> at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195) [junit4] 2> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [junit4] 2> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: javax.net.ssl.SSLHandshakeException: Invalid Padding length: 162 [junit4] 2> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) [junit4] 2> at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) [junit4] 2> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1020) [junit4] 2> at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) [junit4] 2> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) [junit4] 2> at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) [junit4] 2> at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543) [junit4] 2> at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409) [junit4] 2> at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) [junit4] 2> at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) [junit4] 2> at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) [junit4] 2> at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) [junit4] 2> at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) [junit4] 2> at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) [junit4] 2> at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) [junit4] 2> at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) [junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:495) [junit4] 2> ... 11 more [junit4] 2> Caused by: javax.crypto.BadPaddingException: Invalid Padding length: 162 [junit4] 2> at sun.security.ssl.CipherBox.removePadding(CipherBox.java:743) [junit4] 2> at sun.security.ssl.CipherBox.decrypt(CipherBox.java:491) [junit4] 2> at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172) [junit4] 2> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015) [junit4] 2> ... 25 more
          Hide
          hossman Hoss Man added a comment -

          on the theory that the Solaris SSL impl has bugs when the SecureRandom returns nothing but NUL bytes, I drafted this patch to always fill any byte[] with Byte.MAX_VALUE.

          I'm not seeing any measurably slowdown when using this patch, but obviously it isn'

          Uwe Schindler: I'd love to know if you can (semi-)reliably reproduce the failures your jenkins machines were getting on your Solaris box, and if this patch fixes those bugs.

          (I'm not seeing any measurably slowdown when using this patch, but it obviously involves some extra cycles, so it's not really worth committing unless it solves the problem)

          Show
          hossman Hoss Man added a comment - on the theory that the Solaris SSL impl has bugs when the SecureRandom returns nothing but NUL bytes, I drafted this patch to always fill any byte[] with Byte.MAX_VALUE. I'm not seeing any measurably slowdown when using this patch, but obviously it isn' Uwe Schindler : I'd love to know if you can (semi-)reliably reproduce the failures your jenkins machines were getting on your Solaris box, and if this patch fixes those bugs. (I'm not seeing any measurably slowdown when using this patch, but it obviously involves some extra cycles, so it's not really worth committing unless it solves the problem)
          Hide
          thetaphi Uwe Schindler added a comment -

          I can try this out. I will aply the patch and "ant beast" one of the tests. I will report here. If this prooves to be the right thing, we should maybe open a bug report at Oracle.

          If this works I see not problem with the patch, because it is used during tests only. Right?

          Show
          thetaphi Uwe Schindler added a comment - I can try this out. I will aply the patch and "ant beast" one of the tests. I will report here. If this prooves to be the right thing, we should maybe open a bug report at Oracle. If this works I see not problem with the patch, because it is used during tests only. Right?
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Hoss,

          unfortunately, your patch did not change anything. I ran it with and without, in both cases it always failed on first time when beasting or running standalone.
          Maybe we should open a bug report at Oracle and for now disable the tests with assumeFalse(Constants.SUN_OS).

          If you like a can give you an account on the Solaris machine to try yourself (keep in mind, it has neither GIT nor ANT installed, totally blank - all is provided by Jenkins).

          Show
          thetaphi Uwe Schindler added a comment - Hi Hoss, unfortunately, your patch did not change anything. I ran it with and without, in both cases it always failed on first time when beasting or running standalone. Maybe we should open a bug report at Oracle and for now disable the tests with assumeFalse(Constants.SUN_OS) . If you like a can give you an account on the Solaris machine to try yourself (keep in mind, it has neither GIT nor ANT installed, totally blank - all is provided by Jenkins).
          Hide
          hossman Hoss Man added a comment -

          If this works I see not problem with the patch, because it is used during tests only. Right?

          Correct, this is only a question of what SecureRandom source we use during tests (the idea being to prevent so low entropy jenkins machines from blocking when randomizing SSL testing)

          ... and for now disable the tests with assumeFalse(Constants.SUN_OS).

          While this one test in particular seems to always trigger some Padding related problem in the SSLEngine, the underlying problem is something that could affect any SSL test (note that even with this test, the jenkins failures have diff Padding related Exceptions between master and 6x, presumably because some small amount of information in the Solr request/response payload is slightly diff between branches?) ... so if we do ultimately need to have special case logic when Constants.SUN_OS it shouldn't be specific to this test class/method, it should be part of the SSLTestConfig so we don't get confusing failures from any other test that might randomize SSL.

          I've uploaded a new quick & dirty patch that uses a java.util.Random inside our NullSecureRandom.

          Uwe Schindler: can you please try this new patch out?

          • If this patch solves the problem I can come up with a better final fix that includes 2 diff "mock" SecureRandom instances and picks which one we use in SSLTestConfig depending on the Constants.SUN_OS.
          • If this patch doesn't solve the problem then there is something more fundementally odd going on on Solaris (maybe our custom SecureRandomSpi is tickling some assumption in the JVM?) and I'll give up and just change SSLTestConfig to simply use the platform default SecureRandom on that OS.

          If you like a can give you an account on the Solaris machine to try yourself (keep in mind, it has neither GIT nor ANT installed, totally blank - all is provided by Jenkins).

          No thank you – that sounds terrible. This is/should-be the last patch I'll ask you to manually try on Solaris

          Maybe we should open a bug report at Oracle ...

          Probably, but from what i've seen you have to deal with in the past, don't have the time or patience to try and deal with their process. If you want to file one by all means go ahead – but you might want to wait until we figure out if using java.utilRandom under the covers works as a workarround, or if there is just some fundemental bug when using custom SecureRandom instances.

          Show
          hossman Hoss Man added a comment - If this works I see not problem with the patch, because it is used during tests only. Right? Correct, this is only a question of what SecureRandom source we use during tests (the idea being to prevent so low entropy jenkins machines from blocking when randomizing SSL testing) ... and for now disable the tests with assumeFalse(Constants.SUN_OS). While this one test in particular seems to always trigger some Padding related problem in the SSLEngine, the underlying problem is something that could affect any SSL test (note that even with this test, the jenkins failures have diff Padding related Exceptions between master and 6x, presumably because some small amount of information in the Solr request/response payload is slightly diff between branches?) ... so if we do ultimately need to have special case logic when Constants.SUN_OS it shouldn't be specific to this test class/method, it should be part of the SSLTestConfig so we don't get confusing failures from any other test that might randomize SSL. I've uploaded a new quick & dirty patch that uses a java.util.Random inside our NullSecureRandom . Uwe Schindler : can you please try this new patch out? If this patch solves the problem I can come up with a better final fix that includes 2 diff "mock" SecureRandom instances and picks which one we use in SSLTestConfig depending on the Constants.SUN_OS . If this patch doesn't solve the problem then there is something more fundementally odd going on on Solaris (maybe our custom SecureRandomSpi is tickling some assumption in the JVM?) and I'll give up and just change SSLTestConfig to simply use the platform default SecureRandom on that OS. If you like a can give you an account on the Solaris machine to try yourself (keep in mind, it has neither GIT nor ANT installed, totally blank - all is provided by Jenkins). No thank you – that sounds terrible. This is/should-be the last patch I'll ask you to manually try on Solaris Maybe we should open a bug report at Oracle ... Probably, but from what i've seen you have to deal with in the past, don't have the time or patience to try and deal with their process. If you want to file one by all means go ahead – but you might want to wait until we figure out if using java.utilRandom under the covers works as a workarround, or if there is just some fundemental bug when using custom SecureRandom instances.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Hi,
          sorry for the delay: The patch with the answer to the Ultimate Question of Life, the Universe, and Everything looks good. It is currently running using ant beast -Dbeast.iters=100 -Dtestcase=TestMiniSolrCloudClusterSSL (this single test) and no failure up to now (already 8 rounds through).

          So I think this works. Maybe the padding code in the JDK has a bug that it should look random, but not "all bytes are equal". Why not use this patch also for non-Solaris?

          Show
          thetaphi Uwe Schindler added a comment - - edited Hi, sorry for the delay: The patch with the answer to the Ultimate Question of Life, the Universe, and Everything looks good. It is currently running using ant beast -Dbeast.iters=100 -Dtestcase=TestMiniSolrCloudClusterSSL (this single test) and no failure up to now (already 8 rounds through). So I think this works. Maybe the padding code in the JDK has a bug that it should look random, but not "all bytes are equal". Why not use this patch also for non-Solaris?
          Hide
          thetaphi Uwe Schindler added a comment -

          OK after 20 rounds I would say: new Random(42) WORKS

          Show
          thetaphi Uwe Schindler added a comment - OK after 20 rounds I would say: new Random(42) WORKS
          Hide
          hossman Hoss Man added a comment -

          Why not use this patch also for non-Solaris?

          Well, as miller put it in the parent issue once: mainly because the goal here is to keep the SSL code as fast as possible since we don't actaully care about the "correcectness" of the SSL, we just care that Solr is using SSL and doesn't have any hardcoded http assumptions that break when SSL is enabled. So if we can avoid wasting CPU cycles on (psuedo)randomness by having a bunch of No-Op methods, then we might as well.

          Show
          hossman Hoss Man added a comment - Why not use this patch also for non-Solaris? Well, as miller put it in the parent issue once: mainly because the goal here is to keep the SSL code as fast as possible since we don't actaully care about the "correcectness" of the SSL, we just care that Solr is using SSL and doesn't have any hardcoded http assumptions that break when SSL is enabled. So if we can avoid wasting CPU cycles on (psuedo)randomness by having a bunch of No-Op methods, then we might as well.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7e2f9f506dd3a94c9df0514bf0e22624a8cb0f92 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e2f9f5 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7e2f9f506dd3a94c9df0514bf0e22624a8cb0f92 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e2f9f5 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5586d2 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          Show
          jira-bot ASF subversion and git services added a comment - Commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5586d2 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS
          Hide
          hossman Hoss Man added a comment -

          revised summary & description based on new evidence of this popping up on other operating systems (see SOLR-9082) ... although much less often then on Solaris.

          I plan to rollback the conditional logic i added in my last commit and just complely replace "NullSecureRandom" with the code Uwe already beasted for me and rename it "NotSecurePsuedoRandom" (since NullSecureRandom as a name really won't apply anymore)

          Show
          hossman Hoss Man added a comment - revised summary & description based on new evidence of this popping up on other operating systems (see SOLR-9082 ) ... although much less often then on Solaris. I plan to rollback the conditional logic i added in my last commit and just complely replace "NullSecureRandom" with the code Uwe already beasted for me and rename it "NotSecurePsuedoRandom" (since NullSecureRandom as a name really won't apply anymore)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7144984e164e10a6ba2a7c89ffa748af1310cc50 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7144984 ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7144984e164e10a6ba2a7c89ffa748af1310cc50 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7144984 ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ac0e73a521a66fc37638e884ab386b0173f79b0f in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac0e73a ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          Show
          jira-bot ASF subversion and git services added a comment - Commit ac0e73a521a66fc37638e884ab386b0173f79b0f in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac0e73a ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom
          Hide
          thetaphi Uwe Schindler added a comment -

          Thanks Hoss for fixing!
          The remaining question is still: why did it sometimes not like the bytes we return? Theoretically, our random could also return all 0-bytes - would it fail then?
          As a queick fix this looks good, but I would like to understand it!

          Show
          thetaphi Uwe Schindler added a comment - Thanks Hoss for fixing! The remaining question is still: why did it sometimes not like the bytes we return? Theoretically, our random could also return all 0-bytes - would it fail then? As a queick fix this looks good, but I would like to understand it!
          Hide
          hossman Hoss Man added a comment -

          Haven't seen this fail since the the latest fix, so i'm calling this resolved.

          The remaining question is still: why did it sometimes not like the bytes we return? Theoretically, our random could also return all 0-bytes - would it fail then? As a queick fix this looks good, but I would like to understand it!

          I would like to understand it too, but i don't have the time/patience to deal with trying to figure out how to write a non-solr/non-jetty test case to try and submit to oracle. If you do have the time feel free to do so and give me all the credit

          Show
          hossman Hoss Man added a comment - Haven't seen this fail since the the latest fix, so i'm calling this resolved. The remaining question is still: why did it sometimes not like the bytes we return? Theoretically, our random could also return all 0-bytes - would it fail then? As a queick fix this looks good, but I would like to understand it! I would like to understand it too, but i don't have the time/patience to deal with trying to figure out how to write a non-solr/non-jetty test case to try and submit to oracle. If you do have the time feel free to do so and give me all the credit
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d866ae79db42c28c99aa7efd58848418b9d2e6a6 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d866ae7 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          jira-bot ASF subversion and git services added a comment - Commit d866ae79db42c28c99aa7efd58848418b9d2e6a6 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d866ae7 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit fb9b7dcfbdb1ecf57cb0dfc3d2d722a96b471874 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fb9b7dc ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)

          Show
          jira-bot ASF subversion and git services added a comment - Commit fb9b7dcfbdb1ecf57cb0dfc3d2d722a96b471874 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fb9b7dc ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)

            People

            • Assignee:
              Unassigned
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development