Solr
  1. Solr
  2. SOLR-5776

Look at speeding up using SSL with tests.

    Details

    • Type: Test Test
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine).

      I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up.

      1. SOLR-5776.patch
        9 kB
        Hoss Man
      2. SOLR-5776.patch
        4 kB
        Steve Davids
      3. SOLR-5776.patch
        1 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Just spawn not so many threads and jetties? To test distributed stuff with SSL, you need 2 or 3 jetties, but not 50!

          Show
          Uwe Schindler added a comment - Just spawn not so many threads and jetties? To test distributed stuff with SSL, you need 2 or 3 jetties, but not 50!
          Hide
          Robert Muir added a comment -

          My first thought is: ensure jettysolrunner connectors/threadpools are setup correctly for ssl.

          Second thought is: examine SSL settings. We don't need to be using military grade stuff here like 4096bit RSA or whatever, for tests we should use "low grade" crypto.

          Show
          Robert Muir added a comment - My first thought is: ensure jettysolrunner connectors/threadpools are setup correctly for ssl. Second thought is: examine SSL settings. We don't need to be using military grade stuff here like 4096bit RSA or whatever, for tests we should use "low grade" crypto.
          Hide
          Mark Miller added a comment - - edited

          Just spawn not so many threads and jetties? To test distributed stuff with SSL, you need 2 or 3 jetties

          The tests that fail spawn like 1-4 jetties, so that won't help.

          Show
          Mark Miller added a comment - - edited Just spawn not so many threads and jetties? To test distributed stuff with SSL, you need 2 or 3 jetties The tests that fail spawn like 1-4 jetties, so that won't help.
          Hide
          ASF subversion and git services added a comment -

          Commit 1572275 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1572275 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572275 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1572275 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572295 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1572295 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572295 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1572295 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572391 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1572391 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572391 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1572391 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572393 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1572393 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572393 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1572393 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572408 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1572408 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572408 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1572408 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572410 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1572410 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572410 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1572410 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572974 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1572974 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572974 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1572974 ] SOLR-5776 : Suppress SSL
          Hide
          ASF subversion and git services added a comment -

          Commit 1572976 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1572976 ]

          SOLR-5776: Suppress SSL

          Show
          ASF subversion and git services added a comment - Commit 1572976 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1572976 ] SOLR-5776 : Suppress SSL
          Hide
          Mark Miller added a comment -

          On a tip from Robert, I started looking at SecureRandom as the source of this problem.

          It seems that at least on Linux, the default SecureRandom algorithm will get data from /dev/random, which can block once it exhausts entropy.

          Some testing with a custom java.security.egd file seems to bear this out as the problem.

          I'm still trying to work out the best solution.

          Show
          Mark Miller added a comment - On a tip from Robert, I started looking at SecureRandom as the source of this problem. It seems that at least on Linux, the default SecureRandom algorithm will get data from /dev/random, which can block once it exhausts entropy. Some testing with a custom java.security.egd file seems to bear this out as the problem. I'm still trying to work out the best solution.
          Hide
          Mark Miller added a comment -

          Attached patch appears to be a working workaround.

          Show
          Mark Miller added a comment - Attached patch appears to be a working workaround.
          Hide
          Mark Miller added a comment - - edited

          Bah, it seems to be much less frequent, but it can still happen. I think the issue is that if you don't specify the seed, it will still read from /dev/random for that.

          I had looked into a custom SecureRandom via SPI, but it's my first foray into SPI, and while it seems relatively straightforward, I have not yet figured out how to plug in a custom SecureRandomSPI class in tests. Even when that's done, the impl is not so straightforward - from what I can tell, you cannot extend the standard SecureRandom to fix this and the open jdk code is Oracle GPL and the Harmony code is fairly different and would require some hacking to get in. Otherwise we would need to come up with a clean room impl that was still about as decent as Random.

          Show
          Mark Miller added a comment - - edited Bah, it seems to be much less frequent, but it can still happen. I think the issue is that if you don't specify the seed, it will still read from /dev/random for that. I had looked into a custom SecureRandom via SPI, but it's my first foray into SPI, and while it seems relatively straightforward, I have not yet figured out how to plug in a custom SecureRandomSPI class in tests. Even when that's done, the impl is not so straightforward - from what I can tell, you cannot extend the standard SecureRandom to fix this and the open jdk code is Oracle GPL and the Harmony code is fairly different and would require some hacking to get in. Otherwise we would need to come up with a clean room impl that was still about as decent as Random.
          Hide
          Hoss Man added a comment -

          Random uniformed comment from someone whose only knowledge of this is googling things i saw in your patch: did you also try setting "securerandom.source" to "file:/dev/./urandom" in the security manager settings?

          Show
          Hoss Man added a comment - Random uniformed comment from someone whose only knowledge of this is googling things i saw in your patch: did you also try setting "securerandom.source" to "file:/dev/./urandom" in the security manager settings?
          Hide
          Mark Miller added a comment -

          Supposedly, java.security.egd is the system property that overrides securerandom.source.

          Show
          Mark Miller added a comment - Supposedly, java.security.egd is the system property that overrides securerandom.source.
          Hide
          Hoss Man added a comment -

          Supposedly, java.security.egd is the system property that overrides securerandom.source.

          Hmmm, yeah ....

          I just spent a bit of time reading this: http://moi.vonos.net/java/securerandom/ before i realized that apparently i died and am now in hell. If, for the sake of argument, we assume i'm wrong – that i really am alive, and this is not hell – then from what i can tell:

          • The only way to guarantee that arbitrary code using a SecureRandom (maybe just calling "new" maybe calling "generateSeed" - who knows it's arbitrary and out of our control) won't block indefinitely is to ensure that we are using the SHA1PRNG algorithm with ThreadedSeedGenerator
          • the way to ensure that you get the SHA1PRNG algorithm with ThreadedSeedGenerator is to ensure that the effective value of "securerandom.source" is unset.
          • "securerandom.source" is explicitly configured in the "java.security" file that ships with the JRE/JDK
          • the "java.security.egd" system property can be used at runtime to to override the "securerandom.source" value configured in "java.security"
          • it is apparently not possible to unset the value of "securerandom.source" just by using a system property - not even using -Djava.security.egd="" (which seems to just be ignored and you still get "NativePRNG")

          One thing i didn't see discussed in that article is what happens if you use: -Djava.security.egd="/bogus/file/that/does/not/exist" – that clearly triggers the use of SHA1PRNG, but it's not clear what happens when the URLSeedGenerator can't open the file.

          My gut tells me that it may then be defaulting to the (guaranteed to never block) ThreadedSeedGenerator because it definitely behaves differently on my machine using the code below – in particularly, it's a bit slow to seed, and I theorize it's because it's running the ThreadedSeedGenerator (but i can't figure out how to ask the SecureRandom and/or Provider which SeedGenerator is in use)...

          import java.security.SecureRandom;
          public final class Random {
            public static final void main(String[] args) {
              for (int i= 0; i< 10; i++) {
                SecureRandom r = new SecureRandom();
                System.out.println( r.getAlgorithm() + "::" + r.getProvider().toString() + "::" + r.nextInt());
              }
            }
          }
          
          hossman@frisbee:~/tmp$ javac Random.java && java -Djava.security.egd="" Random 
          NativePRNG::SUN version 1.7::-197797452
          ...
          hossman@frisbee:~/tmp$ javac Random.java && java -Djava.security.egd="/does/not/exist" Random 
          SHA1PRNG::SUN version 1.7::-926688095
          ...
          
          Show
          Hoss Man added a comment - Supposedly, java.security.egd is the system property that overrides securerandom.source. Hmmm, yeah .... I just spent a bit of time reading this: http://moi.vonos.net/java/securerandom/ before i realized that apparently i died and am now in hell. If, for the sake of argument, we assume i'm wrong – that i really am alive, and this is not hell – then from what i can tell: The only way to guarantee that arbitrary code using a SecureRandom (maybe just calling "new" maybe calling "generateSeed" - who knows it's arbitrary and out of our control) won't block indefinitely is to ensure that we are using the SHA1PRNG algorithm with ThreadedSeedGenerator the way to ensure that you get the SHA1PRNG algorithm with ThreadedSeedGenerator is to ensure that the effective value of "securerandom.source" is unset. "securerandom.source" is explicitly configured in the "java.security" file that ships with the JRE/JDK the "java.security.egd" system property can be used at runtime to to override the "securerandom.source" value configured in "java.security" it is apparently not possible to unset the value of "securerandom.source" just by using a system property - not even using -Djava.security.egd="" (which seems to just be ignored and you still get "NativePRNG") One thing i didn't see discussed in that article is what happens if you use: -Djava.security.egd="/bogus/file/that/does/not/exist" – that clearly triggers the use of SHA1PRNG, but it's not clear what happens when the URLSeedGenerator can't open the file. My gut tells me that it may then be defaulting to the (guaranteed to never block) ThreadedSeedGenerator because it definitely behaves differently on my machine using the code below – in particularly, it's a bit slow to seed, and I theorize it's because it's running the ThreadedSeedGenerator (but i can't figure out how to ask the SecureRandom and/or Provider which SeedGenerator is in use)... import java.security.SecureRandom; public final class Random { public static final void main( String [] args) { for ( int i= 0; i< 10; i++) { SecureRandom r = new SecureRandom(); System .out.println( r.getAlgorithm() + "::" + r.getProvider().toString() + "::" + r.nextInt()); } } } hossman@frisbee:~/tmp$ javac Random.java && java -Djava.security.egd="" Random NativePRNG::SUN version 1.7::-197797452 ... hossman@frisbee:~/tmp$ javac Random.java && java -Djava.security.egd="/does/not/exist" Random SHA1PRNG::SUN version 1.7::-926688095 ...
          Hide
          Mark Miller added a comment -

          it is apparently not possible to unset the value of "securerandom.source" just by using a system property

          Yeah, I spent a fair amount of time trying to do that - fun times.

          but it's not clear what happens when the URLSeedGenerator can't open the file.

          I had a fair amount of runs with this just by accident - I'm pretty sure I still saw issues, but I suppose I can give it another whirl. I tried a lot of things based on various google reading.

          But yeah, I've been running down a similar set of items and some others - it was a sad and unproductive bunch of hours with a fair amount of SOL (swear out loud?) moments.

          Show
          Mark Miller added a comment - it is apparently not possible to unset the value of "securerandom.source" just by using a system property Yeah, I spent a fair amount of time trying to do that - fun times. but it's not clear what happens when the URLSeedGenerator can't open the file. I had a fair amount of runs with this just by accident - I'm pretty sure I still saw issues, but I suppose I can give it another whirl. I tried a lot of things based on various google reading. But yeah, I've been running down a similar set of items and some others - it was a sad and unproductive bunch of hours with a fair amount of SOL (swear out loud?) moments.
          Hide
          Steve Davids added a comment -

          Attached a patch that tells Jetty to use the SHA1PRNG secure random algorithm for tests.

          Show
          Steve Davids added a comment - Attached a patch that tells Jetty to use the SHA1PRNG secure random algorithm for tests.
          Hide
          Mark Miller added a comment -

          Nice - did not know we could actually set the secure random jetty will use. I'm not sure if setting it to SAH1PRNG is enough - pretty sure I tried that a fair amount by system property - does that also give us the ThreadedSeedGenerator?

          Show
          Mark Miller added a comment - Nice - did not know we could actually set the secure random jetty will use. I'm not sure if setting it to SAH1PRNG is enough - pretty sure I tried that a fair amount by system property - does that also give us the ThreadedSeedGenerator?
          Hide
          Steve Davids added a comment -

          Jetty allows you to set the algorithm name but if further customization is necessary you can set the SSLContext (which Jetty will then ignore all of the set truststores/keystores/etc as that is configured in the SSLCotnext). The SSLContext Builder (see SSLTestConfig) allows you to set your own SecureRandom object - this will give you the hooks to provide your own SecureRandomSpi implementation which is used to generate the random value (we can probably just create a simple noop implementation).

          Show
          Steve Davids added a comment - Jetty allows you to set the algorithm name but if further customization is necessary you can set the SSLContext (which Jetty will then ignore all of the set truststores/keystores/etc as that is configured in the SSLCotnext). The SSLContext Builder (see SSLTestConfig) allows you to set your own SecureRandom object - this will give you the hooks to provide your own SecureRandomSpi implementation which is used to generate the random value (we can probably just create a simple noop implementation).
          Hide
          Mark Miller added a comment -

          Yeah, this seems like a very promising development. I'll try and run some tests later today or tomorrow.

          Show
          Mark Miller added a comment - Yeah, this seems like a very promising development. I'll try and run some tests later today or tomorrow.
          Hide
          ASF subversion and git services added a comment -

          Commit 1588388 from markrmiller@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1588388 ]

          SOLR-5776: Enabled SSL tests can easily exhaust random generator entropy and block.

          Show
          ASF subversion and git services added a comment - Commit 1588388 from markrmiller@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1588388 ] SOLR-5776 : Enabled SSL tests can easily exhaust random generator entropy and block.
          Hide
          Steve Davids added a comment -

          Looks like there are a few issues with the previous checkin:

          1) SolrTestCase4j has hard coded "trySsl = true" vice the random boolean value
          2) JettySolrRunner is trying to configure the SecureRandomAlgorithm but that value isn't set in the SSLTestConfig thus that algorithm isn't actually being set in Jetty. You may want to make the buildSSLContext method Public and set the Jetty SSLContext with the return value which will then use the NullSecureRandom.

          Show
          Steve Davids added a comment - Looks like there are a few issues with the previous checkin: 1) SolrTestCase4j has hard coded "trySsl = true" vice the random boolean value 2) JettySolrRunner is trying to configure the SecureRandomAlgorithm but that value isn't set in the SSLTestConfig thus that algorithm isn't actually being set in Jetty. You may want to make the buildSSLContext method Public and set the Jetty SSLContext with the return value which will then use the NullSecureRandom.
          Hide
          Mark Miller added a comment -

          Thanks for reviewing!

          1) Yeah, I'll flip that back when I commit to 4x - I want to see how all the various jenkins jobs handle full SSL for a few runs first.

          2) Yeah, I kept the support for setting the algorithm in JettySolrRunner - it seems like a nice addition. But I don't set it in SSLTestConfig because we don't need to set it for tests?

          Show
          Mark Miller added a comment - Thanks for reviewing! 1) Yeah, I'll flip that back when I commit to 4x - I want to see how all the various jenkins jobs handle full SSL for a few runs first. 2) Yeah, I kept the support for setting the algorithm in JettySolrRunner - it seems like a nice addition. But I don't set it in SSLTestConfig because we don't need to set it for tests?
          Hide
          Hoss Man added a comment - - edited

          Mark Miller...

          1) I don't think you need to override nextBytes with a No-Op method .. as long as you override generateSeed we should be fine
          2) there's a potential AIOOB error with your implementation of generateSeed – the contract says that the byte[] returned will have whatever length was specified in the argument, so this could break spectacularly in a future version of jetty/java
          3) I'm not sure that you need NullSecureRandom at all – if you go back to Steve's patch, and just ensure that you call setSeed(new byte[] {'f','u'}) on the result of SecureRandom.getInstance("SHA1PRNG") beore letting jetty have it, then generateSeed should never, ever be called.

          EDIT: mid-air-collision, deleted things miller addressed in his previous comment

          Show
          Hoss Man added a comment - - edited Mark Miller ... 1) I don't think you need to override nextBytes with a No-Op method .. as long as you override generateSeed we should be fine 2) there's a potential AIOOB error with your implementation of generateSeed – the contract says that the byte[] returned will have whatever length was specified in the argument, so this could break spectacularly in a future version of jetty/java 3) I'm not sure that you need NullSecureRandom at all – if you go back to Steve's patch, and just ensure that you call setSeed(new byte[] {'f','u'}) on the result of SecureRandom.getInstance("SHA1PRNG") beore letting jetty have it, then generateSeed should never, ever be called. EDIT: mid-air-collision, deleted things miller addressed in his previous comment
          Hide
          Mark Miller added a comment -

          1, 3) Since we need no security with SSL, doesn't it make sense to have all this just be a no op and have the best performance? Is there any benefit to running more code.

          2) Thanks, I'll address.

          Show
          Mark Miller added a comment - 1, 3) Since we need no security with SSL, doesn't it make sense to have all this just be a no op and have the best performance? Is there any benefit to running more code. 2) Thanks, I'll address.
          Hide
          Hoss Man added a comment -

          Since we need no security with SSL, doesn't it make sense to have all this just be a no op and have the best performance?

          I don't think so?

          We still want some psuedo-randomness in the SSL, otherwise we might encounter some bug where it doesn't work, but we don't know because the SSL layer is always going down a certain code path that bypasses the bug because it's (Null)SecureRandom is always returning 0/NUL/false from getInt/getBytes/getBoolean

          We want to bypass the expensive entropy step, but we shouldn't bypass the inherient psuedo-randomness of SSL ... otherwise why not bypass all of the lucene randomization framework?

          (if anything, SSLTestConfig should probably seed SecureRandom.getInstance("SHA1PRNG") with bytes it gets from LuceneTestFramework.random())

          Show
          Hoss Man added a comment - Since we need no security with SSL, doesn't it make sense to have all this just be a no op and have the best performance? I don't think so? We still want some psuedo-randomness in the SSL, otherwise we might encounter some bug where it doesn't work, but we don't know because the SSL layer is always going down a certain code path that bypasses the bug because it's (Null)SecureRandom is always returning 0/NUL/false from getInt/getBytes/getBoolean We want to bypass the expensive entropy step, but we shouldn't bypass the inherient psuedo-randomness of SSL ... otherwise why not bypass all of the lucene randomization framework? (if anything, SSLTestConfig should probably seed SecureRandom.getInstance("SHA1PRNG") with bytes it gets from LuceneTestFramework.random() )
          Hide
          Steve Davids added a comment -

          It is important to note that the SSLTestConfig buildSSLContext is currently only being used in the context of building HttpClient on the client side, not on the Jetty side. So you will still be susceptible to the NativePRNG SecureRandom instance on the server side (and thus the long pauses).

          Show
          Steve Davids added a comment - It is important to note that the SSLTestConfig buildSSLContext is currently only being used in the context of building HttpClient on the client side, not on the Jetty side. So you will still be susceptible to the NativePRNG SecureRandom instance on the server side (and thus the long pauses).
          Hide
          Mark Miller added a comment -

          We will see how the jenkins cluster takes it - in my testing, I went from common failures when ssl was fully true to no failures after many, many, runs (I've been running it locally all day).

          Show
          Mark Miller added a comment - We will see how the jenkins cluster takes it - in my testing, I went from common failures when ssl was fully true to no failures after many, many, runs (I've been running it locally all day).
          Hide
          Mark Miller added a comment -

          We still want some psuedo-randomness in the SSL, otherwise we might encounter some bug where it doesn't work

          But we are not really interested in testing SSL at that level are we? All I'm really concerned about is that we are using https and http in the right places and rough communication can take place - how jetty SSL behaves with a securerandom vs no random seems beyond what we should care about testing for proper SSL support?

          Show
          Mark Miller added a comment - We still want some psuedo-randomness in the SSL, otherwise we might encounter some bug where it doesn't work But we are not really interested in testing SSL at that level are we? All I'm really concerned about is that we are using https and http in the right places and rough communication can take place - how jetty SSL behaves with a securerandom vs no random seems beyond what we should care about testing for proper SSL support?
          Hide
          Mark Miller added a comment -

          So you will still be susceptible to the NativePRNG SecureRandom

          Just to be clear, I ran into the same issues when specifying "SHA1PRNG" on the server side - though perhaps because I was not taking care of the issue on the client side - which seems to be where the real trouble point is?

          I'll bring back setting "SHA1PRNG" on the server side, since at a minimum, it should be better than NativePRNG.

          Show
          Mark Miller added a comment - So you will still be susceptible to the NativePRNG SecureRandom Just to be clear, I ran into the same issues when specifying "SHA1PRNG" on the server side - though perhaps because I was not taking care of the issue on the client side - which seems to be where the real trouble point is? I'll bring back setting "SHA1PRNG" on the server side, since at a minimum, it should be better than NativePRNG.
          Hide
          Steve Davids added a comment -

          ...perhaps because I was not taking care of the issue on the client side - which seems to be where the real trouble point is?

          That seems strange, I thought it should be a problem on both sides, though I'm definitely not an expert on this subject, nor have I come across the problem of trying to generate a SecureRandom value out in the wild.

          Show
          Steve Davids added a comment - ...perhaps because I was not taking care of the issue on the client side - which seems to be where the real trouble point is? That seems strange, I thought it should be a problem on both sides, though I'm definitely not an expert on this subject, nor have I come across the problem of trying to generate a SecureRandom value out in the wild.
          Hide
          Mark Miller added a comment -

          And the first jenkins fail pops right away Of course. I'll try adding "SHA1PRNG" on server side.

          Show
          Mark Miller added a comment - And the first jenkins fail pops right away Of course. I'll try adding "SHA1PRNG" on server side.
          Hide
          Mark Miller added a comment -

          Tagged that wrong:

          Commit 1588402 from markrmiller@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1588402 ]
          SOLR-5980: Set the server side to SHA1PRNG as in Steve's original patch.

          That should leave us still getting some urandom, but perhaps it minimizes things to something reasonable. I'm not super optimistic. Beyond that, I'm not sure what we can do other than use spi.

          Show
          Mark Miller added a comment - Tagged that wrong: Commit 1588402 from markrmiller@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1588402 ] SOLR-5980 : Set the server side to SHA1PRNG as in Steve's original patch. That should leave us still getting some urandom, but perhaps it minimizes things to something reasonable. I'm not super optimistic. Beyond that, I'm not sure what we can do other than use spi.
          Hide
          Mark Miller added a comment -

          Bah, just saw a local fail after all that even with the SHA1PRNG change

          I'll revert for now - I can't spend more time on this near term.

          Show
          Mark Miller added a comment - Bah, just saw a local fail after all that even with the SHA1PRNG change I'll revert for now - I can't spend more time on this near term.
          Hide
          ASF subversion and git services added a comment -

          Commit 1588406 from markrmiller@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1588406 ]

          SOLR-5776: sigh Revert this for now.

          Show
          ASF subversion and git services added a comment - Commit 1588406 from markrmiller@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1588406 ] SOLR-5776 : sigh Revert this for now.
          Hide
          Shalin Shekhar Mangar added a comment -

          According to the following StackOverflow discussion, setting System.setProperty("java.security.egd", "file:/dev/./urandom"); (notice the extra dot) seems to fool the JVM and use the SHA generator. My local testing seems to confirm this as well. We should add this in the test base class.

          http://stackoverflow.com/questions/137212/how-to-solve-performance-problem-with-java-securerandom

          Show
          Shalin Shekhar Mangar added a comment - According to the following StackOverflow discussion, setting System.setProperty("java.security.egd", "file:/dev/./urandom"); (notice the extra dot) seems to fool the JVM and use the SHA generator. My local testing seems to confirm this as well. We should add this in the test base class. http://stackoverflow.com/questions/137212/how-to-solve-performance-problem-with-java-securerandom
          Hide
          Mark Miller added a comment -

          I tried that too - like many things I tried locally, felt like it worked locally, but just ended up being some random luck and it still was a problem.

          Feel free to try again like the fsync issue though You might have some better luck.

          Show
          Mark Miller added a comment - I tried that too - like many things I tried locally, felt like it worked locally, but just ended up being some random luck and it still was a problem. Feel free to try again like the fsync issue though You might have some better luck.
          Hide
          Mark Miller added a comment -

          Not sure how your twisting by the way, but the best way is to turn on all the disabled ssl tests and then turn off random ssl and just have all of them use ssl - I have enough local random entropy to have that work as is sometimes, but it will fail more often at least. Our Jenkins machines seem to have much less entropy and fail much, much easier.

          Show
          Mark Miller added a comment - Not sure how your twisting by the way, but the best way is to turn on all the disabled ssl tests and then turn off random ssl and just have all of them use ssl - I have enough local random entropy to have that work as is sometimes, but it will fail more often at least. Our Jenkins machines seem to have much less entropy and fail much, much easier.
          Hide
          Shalin Shekhar Mangar added a comment -

          Yeah, I have been running tests manually but I'm thinking of setting up jenkins on an old and slow box that I have lying around. Do you have a jenkins config that I can copy?

          Show
          Shalin Shekhar Mangar added a comment - Yeah, I have been running tests manually but I'm thinking of setting up jenkins on an old and slow box that I have lying around. Do you have a jenkins config that I can copy?
          Hide
          Mark Miller added a comment -

          I run a variety of jobs - nightly, chaosmonkey, only, etc, but here is the basic config I'm using for a std trunk job. https://paste.apache.org/4mOT

          Show
          Mark Miller added a comment - I run a variety of jobs - nightly, chaosmonkey, only, etc, but here is the basic config I'm using for a std trunk job. https://paste.apache.org/4mOT
          Hide
          Uwe Schindler added a comment -

          Shalin Shekhar Mangar: I can assign a password to your policeman jenkins account, so you can look into the jobs. Basically it is very simple: Freestyle build with: checkout using Subversion plugin and then Ant builder plugin, advanced options, set target to execute and build properties (-D parameters)

          Show
          Uwe Schindler added a comment - Shalin Shekhar Mangar : I can assign a password to your policeman jenkins account, so you can look into the jobs. Basically it is very simple: Freestyle build with: checkout using Subversion plugin and then Ant builder plugin, advanced options, set target to execute and build properties (-D parameters)
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Mark and Uwe!

          Uwe Schindler - Access to policeman jenkins will be a huge help.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Mark and Uwe! Uwe Schindler - Access to policeman jenkins will be a huge help.
          Hide
          Steve Rowe added a comment -

          Haveged is designed to solve the problem of low-entropy hosts (e.g. headless servers), though it appears to be Linux-only.

          Show
          Steve Rowe added a comment - Haveged is designed to solve the problem of low-entropy hosts (e.g. headless servers), though it appears to be Linux-only.
          Hide
          Hoss Man added a comment -

          One of the comments steve made when opening SOLR-6254...

          I found some info about /dev/random problems on FreeBSD here: https://wiki.freebsd.org/201308DevSummit/Security/DevRandom, which lead me to /etc/rc.d/iinitrandom, which gets around the limited entropy by cat'ing a bunch of shit to /dev/random:
          ...
          I think we should try the same strategy in a crontab every X minutes, to see if that addresses the test failures.

          miller's response to that specific suggestion...

          I think it's fine as a short term workaround, but not a great solution. We probably should just disable SSL unless we can address it in a portable way.

          Here's my straw man counter proposal:

          • update the solr tests so that:
            • SSL randomization only happens if a "tests.randomssl" sys prop is set - default is false
              • NOTE: would mean updates to the "reproduce with" line formatting
              • should be updated in test-help as well
              • could be used in lucene/replicator module as well – it already has a "tests.jettySSL" (doh! ... not included in the reproduce line!)
            • sanity check that we have at least some basic coverage of Solr w/SSL that is not randomized (ie: SSLMigrationTest and at least one new test that always uses SSL to bring up a few nodes, index a few docs, do a query, and shuts down)
            • remove most of the @SuppressSSL annotations currently in place (should only be used for tests that truly needs to supress SSL because of the nature of the test: ie explicitly veryfing something about non-ssl mode)
          • update the jenkins boxes to:
            • have cron like steve suggests
            • set "tests.randomssl" to true when running builds

          The end result, if everything works properly should be:

          • no matter who runs the tests, some basic sanity checking of SSL is done
          • on our jenkins builds, we do extensive randomized testing of SSL with all the cloud (and lucene/replicator) functionality
          • users who have enough entropy on their system can run -Dtests.randomssl=true if they choose.

          Obviously though, before putting any work into the tests framework to support something like "tests.randomssl" as a first class sysprop, the first baby step to see if this plan is even viable would be the cron steve mentioned to create lots of entropy – if that doesn't work, then the whole plan is moot.

          Show
          Hoss Man added a comment - One of the comments steve made when opening SOLR-6254 ... I found some info about /dev/random problems on FreeBSD here: https://wiki.freebsd.org/201308DevSummit/Security/DevRandom , which lead me to /etc/rc.d/iinitrandom, which gets around the limited entropy by cat'ing a bunch of shit to /dev/random: ... I think we should try the same strategy in a crontab every X minutes, to see if that addresses the test failures. miller's response to that specific suggestion... I think it's fine as a short term workaround, but not a great solution. We probably should just disable SSL unless we can address it in a portable way. Here's my straw man counter proposal: update the solr tests so that: SSL randomization only happens if a "tests.randomssl" sys prop is set - default is false NOTE: would mean updates to the "reproduce with" line formatting should be updated in test-help as well could be used in lucene/replicator module as well – it already has a "tests.jettySSL" (doh! ... not included in the reproduce line!) sanity check that we have at least some basic coverage of Solr w/SSL that is not randomized (ie: SSLMigrationTest and at least one new test that always uses SSL to bring up a few nodes, index a few docs, do a query, and shuts down) remove most of the @SuppressSSL annotations currently in place (should only be used for tests that truly needs to supress SSL because of the nature of the test: ie explicitly veryfing something about non-ssl mode) update the jenkins boxes to: have cron like steve suggests set "tests.randomssl" to true when running builds The end result, if everything works properly should be: no matter who runs the tests, some basic sanity checking of SSL is done on our jenkins builds, we do extensive randomized testing of SSL with all the cloud (and lucene/replicator) functionality users who have enough entropy on their system can run -Dtests.randomssl=true if they choose. Obviously though, before putting any work into the tests framework to support something like "tests.randomssl" as a first class sysprop, the first baby step to see if this plan is even viable would be the cron steve mentioned to create lots of entropy – if that doesn't work, then the whole plan is moot.
          Hide
          Mark Miller added a comment -

          +1, sounds good to me.

          Show
          Mark Miller added a comment - +1, sounds good to me.
          Hide
          Steve Rowe added a comment - - edited

          edit: prepended /sbin/ to sysctl and dmesg in the crontab, since /sbin/ isn't in the PATH under cron

          I logged into ASF FreeBSD Jenkins's lucene.zones.apache.org and ran sudo su - hudson and crontab -e to put in place a cron job to run every minute. Here's the result (from user hudson's crontab -l):

          # Stolen from /etc/rc.d/initrandom to unblock /dev/random
          * * * * * ( ps -fauxww; /sbin/sysctl -a; date; df -ib; /sbin/dmesg; ps -fauxww ; cat /bin/ls ) | dd of=/dev/random bs=8k 2>/dev/null
          

          When I time the above command, it takes about 0.2 seconds to run, so running this every minute shouldn't overwhelm the system. Maybe it doesn't need to run every minute, I don't know, we can try dialling it back if this works.

          I'll re-enable SSL for a couple tests on trunk that previously failed regularly on ASF FreeBSD Jenkins, to see if this change allows those to pass.

          Show
          Steve Rowe added a comment - - edited edit : prepended /sbin/ to sysctl and dmesg in the crontab, since /sbin/ isn't in the PATH under cron I logged into ASF FreeBSD Jenkins's lucene.zones.apache.org and ran sudo su - hudson and crontab -e to put in place a cron job to run every minute. Here's the result (from user hudson's crontab -l ): # Stolen from /etc/rc.d/initrandom to unblock /dev/random * * * * * ( ps -fauxww; /sbin/sysctl -a; date; df -ib; /sbin/dmesg; ps -fauxww ; cat /bin/ls ) | dd of=/dev/random bs=8k 2>/dev/null When I time the above command, it takes about 0.2 seconds to run, so running this every minute shouldn't overwhelm the system. Maybe it doesn't need to run every minute, I don't know, we can try dialling it back if this works. I'll re-enable SSL for a couple tests on trunk that previously failed regularly on ASF FreeBSD Jenkins, to see if this change allows those to pass.
          Hide
          Steve Rowe added a comment -

          I'll re-enable SSL for a couple tests on trunk that previously failed regularly on ASF FreeBSD Jenkins, to see if this change allows those to pass.

          Hmm, I didn't do this, because looking back at failures for TestCloudSchemaless before I added the @SuppressSSL annotation, they were all on Policeman Jenkins' MacOSX VM, not on ASF Jenkins.... Uwe Schindler, can you add a similar cron job (or whatever the equivalent is for launchd/launchctl, never used it myself)?

          Show
          Steve Rowe added a comment - I'll re-enable SSL for a couple tests on trunk that previously failed regularly on ASF FreeBSD Jenkins, to see if this change allows those to pass. Hmm, I didn't do this, because looking back at failures for TestCloudSchemaless before I added the @SuppressSSL annotation, they were all on Policeman Jenkins' MacOSX VM, not on ASF Jenkins.... Uwe Schindler , can you add a similar cron job (or whatever the equivalent is for launchd/launchctl, never used it myself)?
          Hide
          Uwe Schindler added a comment -

          Use account "steve_rowe" instead: I have no idea what the cron-job is doing! Can somebody explain and why this helps?

          You can log into MacOSX jenkins (if its running): ssh jenkins@jenkins-mac.thetaphi.de (has IPv4 and IPv6 address), for the password send me a note. You can try out whatever you want. Please note, that the virtual machine gets reset to "clean and empty state" on every update, so once you found a good cron-job, I can persist it on the VM snapshot.

          Is there anything to change in Windows or the Linux Jenkins?

          Do I really need such a cronjob? I can also explicitely pass tests.randomssl=false in the jenkins config? I don't like such crazy stuff going on the machines. I am also not sure about good randomness on virtual machines...

          Show
          Uwe Schindler added a comment - Use account "steve_rowe" instead : I have no idea what the cron-job is doing! Can somebody explain and why this helps? You can log into MacOSX jenkins (if its running): ssh jenkins@jenkins-mac.thetaphi.de (has IPv4 and IPv6 address), for the password send me a note. You can try out whatever you want. Please note, that the virtual machine gets reset to "clean and empty state" on every update, so once you found a good cron-job, I can persist it on the VM snapshot. Is there anything to change in Windows or the Linux Jenkins? Do I really need such a cronjob? I can also explicitely pass tests.randomssl=false in the jenkins config? I don't like such crazy stuff going on the machines. I am also not sure about good randomness on virtual machines...
          Hide
          Hoss Man added a comment -

          Do I really need such a cronjob? I can also explicitely pass tests.randomssl=false in the jenkins config?...

          the goal was to do some experiments with such a cron in place to see if it helped out the situation – if it did, then we could move forward with teh work to add support to the test framework for a general "tests.randomssl" (that would be included in the reproduce line, etc...) so that by default SSL randomization isn't used, but on jenkins boxes that have enough entropy to really hammer it the prop would be available.

          ie: we wanted to see if the cron would even help on these low entropy machines before putting in a bunch of work on the test framework to enable this sys prop.

          Show
          Hoss Man added a comment - Do I really need such a cronjob? I can also explicitely pass tests.randomssl=false in the jenkins config?... the goal was to do some experiments with such a cron in place to see if it helped out the situation – if it did, then we could move forward with teh work to add support to the test framework for a general "tests.randomssl" (that would be included in the reproduce line, etc...) so that by default SSL randomization isn't used, but on jenkins boxes that have enough entropy to really hammer it the prop would be available. ie: we wanted to see if the cron would even help on these low entropy machines before putting in a bunch of work on the test framework to enable this sys prop.
          Hide
          ASF subversion and git services added a comment -

          Commit 1614774 from Use account "steve_rowe" instead in branch 'dev/trunk'
          [ https://svn.apache.org/r1614774 ]

          SOLR-5776: Re-enable SSL for this test, to see if attempts to increase the entropy pool on ASF FreeBSD Jenkins and Policeman MacOSX Jenkins are helping at all.

          Show
          ASF subversion and git services added a comment - Commit 1614774 from Use account "steve_rowe" instead in branch 'dev/trunk' [ https://svn.apache.org/r1614774 ] SOLR-5776 : Re-enable SSL for this test, to see if attempts to increase the entropy pool on ASF FreeBSD Jenkins and Policeman MacOSX Jenkins are helping at all.
          Hide
          ASF subversion and git services added a comment -

          Commit 1614775 from Use account "steve_rowe" instead in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1614775 ]

          SOLR-5776: Re-enable SSL for this test, to see if attempts to increase the entropy pool on ASF FreeBSD Jenkins and Policeman MacOSX Jenkins are helping at all. (merged trunk r1614774)

          Show
          ASF subversion and git services added a comment - Commit 1614775 from Use account "steve_rowe" instead in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1614775 ] SOLR-5776 : Re-enable SSL for this test, to see if attempts to increase the entropy pool on ASF FreeBSD Jenkins and Policeman MacOSX Jenkins are helping at all. (merged trunk r1614774)
          Hide
          Steve Rowe added a comment - - edited

          Steve Rowe: I have no idea what the cron-job is doing! Can somebody explain and why this helps?

          The cron job is feeding the entropy pool by writing to /dev/random, which should unblock reads from /dev/random, assuming there is some randomness in what gets fed in, and/or that side-effect I/O timings feed the pool.

          I don't know much about this stuff, but here's some things I've read recently about it:

          Is there anything to change in Windows or the Linux Jenkins?

          I don't know; as Hoss says above, we want to run some experiments to see if re-enabling SSL in tests that have had trouble in the past will cause trouble again. So we should know in short order if these need changes.

          You can log into MacOSX jenkins (if its running): ssh jenkins@jenkins-mac.thetaphi.de (has IPv4 and IPv6 address), for the password send me a note. You can try out whatever you want. Please note, that the virtual machine gets reset to "clean and empty state" on every update, so once you found a good cron-job, I can persist it on the VM snapshot.

          Done - I only did two things: sudo touch /etc/crontab to enable cron (apparently on OSX cron doesn't do anything unless this file exists, and it doesn't exist until you create it); and added a crontab for the jenkins user via crontab -e - here's the result (via crontab -l):

          # Stolen from FreeBSD's /etc/rc.d/initrandom to unblock /dev/random
          * * * * * ( ps -faxww ; /usr/sbin/sysctl -a ; date ; df -ib ; ps -faxww ; cat /bin/ls ) | dd of=/dev/random bs=8k 2>/dev/null
          

          I re-enabled SSL on TestCloudSchemaless, and I'll monitor Jenkins to see if it starts failing.

          Show
          Steve Rowe added a comment - - edited Steve Rowe: I have no idea what the cron-job is doing! Can somebody explain and why this helps? The cron job is feeding the entropy pool by writing to /dev/random , which should unblock reads from /dev/random , assuming there is some randomness in what gets fed in, and/or that side-effect I/O timings feed the pool. I don't know much about this stuff, but here's some things I've read recently about it: https://we.riseup.net/debian/entropy (Linux random entropy, some Debian specifics, with a discussion of other OSs) http://security.stackexchange.com/questions/42952/how-can-i-measure-and-increase-entropy-on-mac-os-x (Q/A about OS X entropy) http://en.wikipedia.org/wiki/Entropy_(computing ) (Brief coverage of various OSs' handling of entropy) http://en.wikipedia.org/?title=/dev/random (random number generation on various OSs) https://wiki.freebsd.org/201308DevSummit/Security/DevRandom (FreeBSD /dev/random design discussion) Is there anything to change in Windows or the Linux Jenkins? I don't know; as Hoss says above, we want to run some experiments to see if re-enabling SSL in tests that have had trouble in the past will cause trouble again. So we should know in short order if these need changes. You can log into MacOSX jenkins (if its running): ssh jenkins@jenkins-mac.thetaphi.de (has IPv4 and IPv6 address), for the password send me a note. You can try out whatever you want. Please note, that the virtual machine gets reset to "clean and empty state" on every update, so once you found a good cron-job, I can persist it on the VM snapshot. Done - I only did two things: sudo touch /etc/crontab to enable cron (apparently on OSX cron doesn't do anything unless this file exists, and it doesn't exist until you create it); and added a crontab for the jenkins user via crontab -e - here's the result (via crontab -l ): # Stolen from FreeBSD's /etc/rc.d/initrandom to unblock /dev/random * * * * * ( ps -faxww ; /usr/sbin/sysctl -a ; date ; df -ib ; ps -faxww ; cat /bin/ls ) | dd of=/dev/random bs=8k 2>/dev/null I re-enabled SSL on TestCloudSchemaless , and I'll monitor Jenkins to see if it starts failing.
          Hide
          Steve Rowe added a comment -

          I re-enabled SSL on TestCloudSchemaless, and I'll monitor Jenkins to see if it starts failing.

          It's definitely started failing.

          AFAICT, the experiment to feed the entropy pool using a regularly-run cron job to write to /dev/random has failed: TestCloudSchemaless fails regularly on the two VMs I set up the cron job (ASF FreeBSD and Policeman OS X).

          I'll go disable SSL for this test now.

          Show
          Steve Rowe added a comment - I re-enabled SSL on TestCloudSchemaless, and I'll monitor Jenkins to see if it starts failing. It's definitely started failing. AFAICT, the experiment to feed the entropy pool using a regularly-run cron job to write to /dev/random has failed: TestCloudSchemaless fails regularly on the two VMs I set up the cron job (ASF FreeBSD and Policeman OS X). I'll go disable SSL for this test now.
          Hide
          ASF subversion and git services added a comment -

          Commit 1620176 from Use account "steve_rowe" instead in branch 'dev/trunk'
          [ https://svn.apache.org/r1620176 ]

          SOLR-5776: suppress ssl for this test

          Show
          ASF subversion and git services added a comment - Commit 1620176 from Use account "steve_rowe" instead in branch 'dev/trunk' [ https://svn.apache.org/r1620176 ] SOLR-5776 : suppress ssl for this test
          Hide
          ASF subversion and git services added a comment -

          Commit 1620177 from Use account "steve_rowe" instead in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1620177 ]

          SOLR-5776: suppress ssl for this test (merged trunk r1620176)

          Show
          ASF subversion and git services added a comment - Commit 1620177 from Use account "steve_rowe" instead in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1620177 ] SOLR-5776 : suppress ssl for this test (merged trunk r1620176)
          Hide
          Uwe Schindler added a comment -

          AFAICT, the experiment to feed the entropy pool using a regularly-run cron job to write to /dev/random has failed: TestCloudSchemaless fails regularly on the two VMs I set up the cron job (ASF FreeBSD and Policeman OS X).

          I think the crons are already disabled:

          • MacOSX was reverted to VM snapshot, so the crontab was disabled
          • FreeBSD: The user name was changed Hudson -> Jenkins, so I think the crontab got lost, too.
          Show
          Uwe Schindler added a comment - AFAICT, the experiment to feed the entropy pool using a regularly-run cron job to write to /dev/random has failed: TestCloudSchemaless fails regularly on the two VMs I set up the cron job (ASF FreeBSD and Policeman OS X). I think the crons are already disabled: MacOSX was reverted to VM snapshot, so the crontab was disabled FreeBSD: The user name was changed Hudson -> Jenkins, so I think the crontab got lost, too.
          Hide
          ASF subversion and git services added a comment -

          Commit 1647147 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1647147 ]

          SOLR-5776: Use less SSL in a test run.

          Show
          ASF subversion and git services added a comment - Commit 1647147 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1647147 ] SOLR-5776 : Use less SSL in a test run.
          Hide
          ASF subversion and git services added a comment -

          Commit 1647148 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1647148 ]

          SOLR-5776: Use less SSL in a test run.

          Show
          ASF subversion and git services added a comment - Commit 1647148 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1647148 ] SOLR-5776 : Use less SSL in a test run.
          Hide
          Mark Miller added a comment -

          FYI - the above commit seems to have mostly resolved this. We use SSL in tests far less. The root problem remains though.

          Show
          Mark Miller added a comment - FYI - the above commit seems to have mostly resolved this. We use SSL in tests far less. The root problem remains though.
          Hide
          Hoss Man added a comment -

          Having spent the past week+ looking into bugs in the SSL test setup, and getting a lot more familiar with the various Java SSL APIs, I started rethinking about this overall issue. Knowing what i know now, I went back and re-read every comment, and re-reviewed the commits, here's a few things i noticed...

          • NullSecureRandom was only ever used in the SSLContext registered with clients ...
            • Jetty instances were configured to use an SslContextFactory using the string "SHA1PRNG"
              • SHA1PRNG is suppose to be faster then NativePRNG on linux, but (IIUC) still requires true entropy for seeding
              • It is however possible to initialize jetty with an explicit SSLContext object such that it should never even attempt to initialize it's own SecureRandom. So just like with the client code path, we can provide the explicit (Null)SecureRandom object for jetty to use.
          • NullSecureRandom implemented some methods as No-Op but...
            • it had no constructor, meaning it implicitly use SecureRandom()
              • This means the Platform default RNG Provider was loaded, and it may have it's own entropy blocking code on init
              • if we define our own NullSecureRandom constructor that explicitly calls super(SecureRandomSpi,Provider) we should be able to bypass the loading of the system default (and any init it entails) and use our own NullSecureRandomSpi
            • it overrode setSeed(byte[] seed) but not setSeed(long seed)
              • if any caller code (in jetty, or in the JVM) tried using setSeed(long that also could have resulted in using some entropy blocking method in the default (super class) RNG provider

          With this in mind, I set out to see if I could revive the old patch w/improvements, which i'm now attaching.

          I was never personally able to reproduce any of the "Tests take so long they crash because of lack of entroy" type problems that seemd to plague the jetty machines, but I did do some rough timings on my laptop that seem to suggest that this patch definitely reduces the overhead/time of the tests...

          • w/o patch...
            ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF
            ...
            Total time: 33 minutes 15 seconds
            
          • w/ patch...
            ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF
            ...
            Total time: 20 minutes 39 seconds
            
          • w/patch + override SSL randomization to always use SSL+clientAuth...
            // log.info("Randomized ssl ({}) and clientAuth ({})", trySsl, trySslClientAuth);
            // return new SSLTestConfig(trySsl, trySslClientAuth);
            log.info("nocommit: forcing SSL on test that does not have @SuppressSSL");
            return new SSLTestConfig(true, true);
            
            ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF
            ...
            Total time: 34 minutes 39 seconds
            

          ...admitedly there were two (reproducible) OOMs in that last case (when forcing SSL+clientAuth) that i'm still looking into, but I suspect these may just be be because the responses are very large, and the SSL overhead pushes them over the edge - I've definitely seen the TestDistributedSearch OOM from jenkins not too long ago when it randomly selected SSL+clientAuth...

             [junit4] Tests with failures [seed: DEADBEEF]:
             [junit4]   - org.apache.solr.handler.component.TestDistributedStatsComponentCardinality.test
             [junit4]   - org.apache.solr.TestDistributedSearch.test
          

          ...in any case, this patch definitely seems like it helps in terms of test performance. Even if we don't want to increase therandomization factors, this sems like it would help.

          Show
          Hoss Man added a comment - Having spent the past week+ looking into bugs in the SSL test setup, and getting a lot more familiar with the various Java SSL APIs, I started rethinking about this overall issue. Knowing what i know now, I went back and re-read every comment, and re-reviewed the commits, here's a few things i noticed... NullSecureRandom was only ever used in the SSLContext registered with clients ... Jetty instances were configured to use an SslContextFactory using the string "SHA1PRNG" SHA1PRNG is suppose to be faster then NativePRNG on linux, but (IIUC) still requires true entropy for seeding It is however possible to initialize jetty with an explicit SSLContext object such that it should never even attempt to initialize it's own SecureRandom. So just like with the client code path, we can provide the explicit (Null)SecureRandom object for jetty to use. NullSecureRandom implemented some methods as No-Op but... it had no constructor, meaning it implicitly use SecureRandom() This means the Platform default RNG Provider was loaded, and it may have it's own entropy blocking code on init if we define our own NullSecureRandom constructor that explicitly calls super(SecureRandomSpi,Provider) we should be able to bypass the loading of the system default (and any init it entails) and use our own NullSecureRandomSpi it overrode setSeed(byte[] seed) but not setSeed(long seed) if any caller code (in jetty, or in the JVM) tried using setSeed(long that also could have resulted in using some entropy blocking method in the default (super class) RNG provider With this in mind, I set out to see if I could revive the old patch w/improvements, which i'm now attaching. I was never personally able to reproduce any of the "Tests take so long they crash because of lack of entroy" type problems that seemd to plague the jetty machines, but I did do some rough timings on my laptop that seem to suggest that this patch definitely reduces the overhead/time of the tests... w/o patch... ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF ... Total time: 33 minutes 15 seconds w/ patch... ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF ... Total time: 20 minutes 39 seconds w/patch + override SSL randomization to always use SSL+clientAuth... // log.info( "Randomized ssl ({}) and clientAuth ({})" , trySsl, trySslClientAuth); // return new SSLTestConfig(trySsl, trySslClientAuth); log.info( "nocommit: forcing SSL on test that does not have @SuppressSSL" ); return new SSLTestConfig( true , true ); ant jar && cd solr/core && ant test -Dtests.seed=DEADBEEF ... Total time: 34 minutes 39 seconds ...admitedly there were two (reproducible) OOMs in that last case (when forcing SSL+clientAuth) that i'm still looking into, but I suspect these may just be be because the responses are very large, and the SSL overhead pushes them over the edge - I've definitely seen the TestDistributedSearch OOM from jenkins not too long ago when it randomly selected SSL+clientAuth... [junit4] Tests with failures [seed: DEADBEEF]: [junit4] - org.apache.solr.handler.component.TestDistributedStatsComponentCardinality.test [junit4] - org.apache.solr.TestDistributedSearch.test ...in any case, this patch definitely seems like it helps in terms of test performance. Even if we don't want to increase therandomization factors, this sems like it would help.
          Hide
          Mark Miller added a comment -

          Nice, thanks for looking into this!

          Show
          Mark Miller added a comment - Nice, thanks for looking into this!
          Hide
          Mark Miller added a comment -

          I was never personally able to reproduce any of the "Tests take so long they crash because of lack of entroy" type problems that seemd to plague the jetty machines

          It often took me running the tests many times in a row before I started seeing issues locally.

          Show
          Mark Miller added a comment - I was never personally able to reproduce any of the "Tests take so long they crash because of lack of entroy" type problems that seemd to plague the jetty machines It often took me running the tests many times in a row before I started seeing issues locally.
          Hide
          ASF subversion and git services added a comment -

          Commit 98b0da47ad3ec8c8ecaa8b1a121d3a89c22684a6 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=98b0da4 ]

          SOLR-5776: refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines

          (cherry picked from commit f45bd03ca2cc301dcec4e68c49d961c306d8f434)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          ASF subversion and git services added a comment - Commit 98b0da47ad3ec8c8ecaa8b1a121d3a89c22684a6 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=98b0da4 ] SOLR-5776 : refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines (cherry picked from commit f45bd03ca2cc301dcec4e68c49d961c306d8f434) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          ASF subversion and git services added a comment -

          Commit 9677e2c54bbc66283f3f4341a4e1166006069fc3 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9677e2c ]

          SOLR-5776: refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines

          Show
          ASF subversion and git services added a comment - Commit 9677e2c54bbc66283f3f4341a4e1166006069fc3 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9677e2c ] SOLR-5776 : refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines
          Hide
          Hoss Man added a comment -

          It often took me running the tests many times in a row before I started seeing issues locally.

          Hmm, ok . well - At this point i'm not even sure what to look for.

          For now my changes are on master & 6x, we should see jenkins jobs start to take significantly less time in total, i'll leave it up to you if you want to start removing any of the @SuppressSSL annotations you added to tests in the past.


          At some point in the future, after all this soaks, we should consider increasing the odds of using SSL – perhaps even add a new annotation (or replace @SupressSSL) with a param to help control the odds of using SSL / clientAuth on a per-class basis, ie...

            @UseSSL(false) // same as @SupressSSL
            @UseSSL() //  same as default if no annotation: SolrTestCaseJ4 picks SSL / clientAuth using LuceneTestCase.rarely
            @UseSSL(ssl=0.75,clientAuth=0.25) // fine control of odds of using ssl & clientauth
          

          ...some tests, like TestSSLRandomization should ideally have much higher odds of using SSL then other tests, and if we had an easy way to say "these handful of simple cloud tests should use SSL very frequently" then it wouldn't matter so much if the odds of other really 'expensive' tests only use SSL once in a blue moon.

          Show
          Hoss Man added a comment - It often took me running the tests many times in a row before I started seeing issues locally. Hmm, ok . well - At this point i'm not even sure what to look for. For now my changes are on master & 6x, we should see jenkins jobs start to take significantly less time in total, i'll leave it up to you if you want to start removing any of the @SuppressSSL annotations you added to tests in the past. At some point in the future, after all this soaks, we should consider increasing the odds of using SSL – perhaps even add a new annotation (or replace @SupressSSL ) with a param to help control the odds of using SSL / clientAuth on a per-class basis, ie... @UseSSL(false) // same as @SupressSSL @UseSSL() // same as default if no annotation: SolrTestCaseJ4 picks SSL / clientAuth using LuceneTestCase.rarely @UseSSL(ssl=0.75,clientAuth=0.25) // fine control of odds of using ssl & clientauth ...some tests, like TestSSLRandomization should ideally have much higher odds of using SSL then other tests, and if we had an easy way to say "these handful of simple cloud tests should use SSL very frequently" then it wouldn't matter so much if the odds of other really 'expensive' tests only use SSL once in a blue moon.
          Hide
          ASF subversion and git services added a comment -

          Commit f1ed73de114991758b52a3b10df45ea451dc1c80 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f1ed73d ]

          SOLR-5776: javadoc typo

          (cherry picked from commit c0a287cb7601d691a33f9f0e155578e1575ab454)

          Show
          ASF subversion and git services added a comment - Commit f1ed73de114991758b52a3b10df45ea451dc1c80 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f1ed73d ] SOLR-5776 : javadoc typo (cherry picked from commit c0a287cb7601d691a33f9f0e155578e1575ab454)
          Hide
          ASF subversion and git services added a comment -

          Commit c0a287cb7601d691a33f9f0e155578e1575ab454 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c0a287c ]

          SOLR-5776: javadoc typo

          Show
          ASF subversion and git services added a comment - Commit c0a287cb7601d691a33f9f0e155578e1575ab454 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c0a287c ] SOLR-5776 : javadoc typo
          Hide
          Hoss Man added a comment -

          Note: create a subtask to look into some (apparently solarais specific) failures since NullSecureRandom was committed: SOLR-9068

          Show
          Hoss Man added a comment - Note: create a subtask to look into some (apparently solarais specific) failures since NullSecureRandom was committed: SOLR-9068
          Hide
          ASF subversion and git services added a comment -

          Commit 7e2f9f506dd3a94c9df0514bf0e22624a8cb0f92 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e2f9f5 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          ASF subversion and git services added a comment - Commit 7e2f9f506dd3a94c9df0514bf0e22624a8cb0f92 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7e2f9f5 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          ASF subversion and git services added a comment -

          Commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5586d2 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          Show
          ASF subversion and git services added a comment - Commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208 in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a5586d2 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS
          Hide
          ASF subversion and git services added a comment -

          Commit 7144984e164e10a6ba2a7c89ffa748af1310cc50 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7144984 ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)

          Show
          ASF subversion and git services added a comment - Commit 7144984e164e10a6ba2a7c89ffa748af1310cc50 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7144984 ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)
          Hide
          ASF subversion and git services added a comment -

          Commit ac0e73a521a66fc37638e884ab386b0173f79b0f in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused)
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac0e73a ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          Show
          ASF subversion and git services added a comment - Commit ac0e73a521a66fc37638e884ab386b0173f79b0f in lucene-solr's branch refs/heads/master from Chris Hostetter (Unused) [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac0e73a ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom
          Hide
          ASF subversion and git services added a comment -

          Commit a81e3cf04692bf372edf098bbead17c315a9a755 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a81e3cf ]

          SOLR-5776: refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines

          (cherry picked from commit f45bd03ca2cc301dcec4e68c49d961c306d8f434)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          ASF subversion and git services added a comment - Commit a81e3cf04692bf372edf098bbead17c315a9a755 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a81e3cf ] SOLR-5776 : refactor SSLConfig so that SSLTestConfig can provide SSLContexts using a NullSecureRandom to prevent SSL tests from blocking on entropy starved machines (cherry picked from commit f45bd03ca2cc301dcec4e68c49d961c306d8f434) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          ASF subversion and git services added a comment -

          Commit afbb0f5d08ec998c18903f84ac297f2ed7fd561b in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=afbb0f5 ]

          SOLR-5776: javadoc typo

          (cherry picked from commit c0a287cb7601d691a33f9f0e155578e1575ab454)

          Show
          ASF subversion and git services added a comment - Commit afbb0f5d08ec998c18903f84ac297f2ed7fd561b in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=afbb0f5 ] SOLR-5776 : javadoc typo (cherry picked from commit c0a287cb7601d691a33f9f0e155578e1575ab454)
          Hide
          ASF subversion and git services added a comment -

          Commit d866ae79db42c28c99aa7efd58848418b9d2e6a6 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d866ae7 ]

          SOLR-9068 / SOLR-5776: Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS

          (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208)

          Conflicts:
          solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java

          Show
          ASF subversion and git services added a comment - Commit d866ae79db42c28c99aa7efd58848418b9d2e6a6 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d866ae7 ] SOLR-9068 / SOLR-5776 : Alternate (psuedo random) NullSecureRandom for Constants.SUN_OS (cherry picked from commit a5586d29b23f7d032e6d8f0cf8758e56b09e0208) Conflicts: solr/test-framework/src/java/org/apache/solr/util/SSLTestConfig.java
          Hide
          ASF subversion and git services added a comment -

          Commit fb9b7dcfbdb1ecf57cb0dfc3d2d722a96b471874 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fb9b7dc ]

          SOLR-9068 / SOLR-5776: replace NullSecureRandom w/ NotSecurePsuedoRandom

          (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)

          Show
          ASF subversion and git services added a comment - Commit fb9b7dcfbdb1ecf57cb0dfc3d2d722a96b471874 in lucene-solr's branch refs/heads/branch_6_0 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fb9b7dc ] SOLR-9068 / SOLR-5776 : replace NullSecureRandom w/ NotSecurePsuedoRandom (cherry picked from commit ac0e73a521a66fc37638e884ab386b0173f79b0f)
          Hide
          Hoss Man added a comment -

          At some point in the future, after all this soaks, we should consider increasing the odds of using SSL – perhaps even add a new annotation (or replace @SupressSSL) with a param to help control the odds of using SSL / clientAuth on a per-class basis, ie...

          I guess i forgot to mention it here, but that idea was spun out into SOLR-9107 which has landed on master & branch_6x.


          FWIW: I'm not sure what else, if anything, should be done to consider this issue "resolved"

          There are a bunch of tests still annotated with @SuppressSSL pointed back at this issue from before we started using the new secure random instance – but i don't know if they should all be removed, or if they may have other problems. I feel like that's a question really best left to the people who put those annotations on those tests?

          personally: i'm setting this issue aside and not planning on working on any more SSL related stuff anytime soon.

          Show
          Hoss Man added a comment - At some point in the future, after all this soaks, we should consider increasing the odds of using SSL – perhaps even add a new annotation (or replace @SupressSSL) with a param to help control the odds of using SSL / clientAuth on a per-class basis, ie... I guess i forgot to mention it here, but that idea was spun out into SOLR-9107 which has landed on master & branch_6x. FWIW: I'm not sure what else, if anything, should be done to consider this issue "resolved" There are a bunch of tests still annotated with @SuppressSSL pointed back at this issue from before we started using the new secure random instance – but i don't know if they should all be removed, or if they may have other problems. I feel like that's a question really best left to the people who put those annotations on those tests? personally: i'm setting this issue aside and not planning on working on any more SSL related stuff anytime soon.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development