Lucene - Core
  1. Lucene - Core
  2. LUCENE-2834

don't spawn thread statically in FSDirectory on Mac OS X

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      on the Mac, creating the digester starts up a PKCS11 thread.

      I do not think threads should be created statically (I have this same issue with TimeLimitedCollector and also FilterManager).

      Uwe discussed removing this md5 digester, I don't care if we remove it or not, just as long as it doesn't create a thread,
      and just as long as it doesn't use the system default locale.

      1. LUCENE-2834.patch
        9 kB
        Michael McCandless
      2. LUCENE-2834.patch
        25 kB
        Michael McCandless
      3. LUCENE-2834.patch
        5 kB
        Uwe Schindler
      4. LUCENE-2834.patch
        21 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Here's a patch.

        In general i fixed our tests too (to not use SecureRandom, this has been a problem for Mike before). The only hack is in QueryUtil where we test the serialization.

        But in my opinion its good to workaround this in the tests so that we know we aren't doing it in the code.

        Show
        Robert Muir added a comment - Here's a patch. In general i fixed our tests too (to not use SecureRandom, this has been a problem for Mike before). The only hack is in QueryUtil where we test the serialization. But in my opinion its good to workaround this in the tests so that we know we aren't doing it in the code.
        Hide
        Uwe Schindler added a comment -

        Hi Robert,

        patch looks theoretically fine, although I dislike the code duplication and the duplication of temp file code.

        I dont understand the whole problem why you care of this additional thread. Just put this thread name on the rogue list and we are fine. The badest thing is interrupting this thread for no reason. The PKCS thread is generated on MacOSX, because its underlying security system is different from Windows or Linux. On Windows and MacOSX the PKCS system uses the trusted root certificates from the operating system and it seems that on MacOSX this needs some thread to handle the OS' PKCS library. Its as stupid as killing processes on Android phones as recreating them uses more resources than keep them running. If you start to kill this thread you can also kill all GC threads just for fun

        The reason why serialization starts the thread: Serialization uses hashing, too - The PKCS thread is for me just a system thread with the difference that its started lazily when first accessed.

        One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException. Some user may stiull use a broken JVM where the charsets.jar file is lost (ok, that should not happen, but I like it more to have that).

        +1 for the UTF-8 fix (this will not break most lock file names, as filenames and hopefully index dir pathes are mostly ASCII)
        -1 for MD5Digest code duplication
        -1 for the temp file code dumplication

        My opinion:
        Lazy init the static MessageDigest class in FSDirectory. Its almost never used.

        Will post an alternative patch soon (lazy init MessageDigest only when needed first, add fixed UTF-8 charset).

        Show
        Uwe Schindler added a comment - Hi Robert, patch looks theoretically fine, although I dislike the code duplication and the duplication of temp file code. I dont understand the whole problem why you care of this additional thread. Just put this thread name on the rogue list and we are fine. The badest thing is interrupting this thread for no reason. The PKCS thread is generated on MacOSX, because its underlying security system is different from Windows or Linux. On Windows and MacOSX the PKCS system uses the trusted root certificates from the operating system and it seems that on MacOSX this needs some thread to handle the OS' PKCS library. Its as stupid as killing processes on Android phones as recreating them uses more resources than keep them running. If you start to kill this thread you can also kill all GC threads just for fun The reason why serialization starts the thread: Serialization uses hashing, too - The PKCS thread is for me just a system thread with the difference that its started lazily when first accessed. One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException. Some user may stiull use a broken JVM where the charsets.jar file is lost (ok, that should not happen, but I like it more to have that). +1 for the UTF-8 fix (this will not break most lock file names, as filenames and hopefully index dir pathes are mostly ASCII) -1 for MD5Digest code duplication -1 for the temp file code dumplication My opinion: Lazy init the static MessageDigest class in FSDirectory. Its almost never used. Will post an alternative patch soon (lazy init MessageDigest only when needed first, add fixed UTF-8 charset).
        Hide
        Uwe Schindler added a comment -

        Here my patch. With the System.out in FSDir.getLockID I checked when this method is called now. I changed the code in FSDir.setLockFactory() a little bit to not always call getLockId() and then lazy init the MessageDigest. Now its only called two times during the whole test run: once in TestIndexWriter (because there is some test with a lock file in foreign dir) and once in the LockFactory test (same reason). All other tests never call getLockID() useless.

        I don't like it to kill the PKCS thread, I just added it to the exclusion list in LuceneTestCase (using a regex on SunPKCS12). There may be other platform doing the same (as the SunPKCS provider uses for performance and consistency native algorithm routines from the underlying operating system, like the root certificates).

        Show
        Uwe Schindler added a comment - Here my patch. With the System.out in FSDir.getLockID I checked when this method is called now. I changed the code in FSDir.setLockFactory() a little bit to not always call getLockId() and then lazy init the MessageDigest. Now its only called two times during the whole test run: once in TestIndexWriter (because there is some test with a lock file in foreign dir) and once in the LockFactory test (same reason). All other tests never call getLockID() useless. I don't like it to kill the PKCS thread, I just added it to the exclusion list in LuceneTestCase (using a regex on SunPKCS12). There may be other platform doing the same (as the SunPKCS provider uses for performance and consistency native algorithm routines from the underlying operating system, like the root certificates).
        Hide
        Uwe Schindler added a comment -

        I forgot to mention: Patch not yet tested, as I first need to boot my Snow Leopard VirtualBOX and install SVN there. Maybe someone else can check this patch.

        Show
        Uwe Schindler added a comment - I forgot to mention: Patch not yet tested, as I first need to boot my Snow Leopard VirtualBOX and install SVN there. Maybe someone else can check this patch.
        Hide
        Uwe Schindler added a comment -

        Here the two tests that use the LockID:

        • TestLockFactory.testNativeFSLockFactoryPrefix(TestLockFactory.java:240) [this explicitely sets another dir as lock dir, so the hash is enforced]
        • TestIndexWriter.testEmptyFSDirWithNoLock(TestIndexWriter.java:2644) [this uses another lock factory than FSLockFactory, so the LockId is passed to the LockFactory]
        Show
        Uwe Schindler added a comment - Here the two tests that use the LockID: TestLockFactory.testNativeFSLockFactoryPrefix(TestLockFactory.java:240) [this explicitely sets another dir as lock dir, so the hash is enforced] TestIndexWriter.testEmptyFSDirWithNoLock(TestIndexWriter.java:2644) [this uses another lock factory than FSLockFactory, so the LockId is passed to the LockFactory]
        Hide
        Robert Muir added a comment -

        I don't like it to kill the PKCS thread, I just added it to the exclusion list in LuceneTestCase (using a regex on SunPKCS12). There may be other platform doing the same (as the SunPKCS provider uses for performance and consistency native algorithm routines from the underlying operating system, like the root certificates).

        please do not do this. i do not want the exclusion list growing.

        in fact, if we don't fix this TimeLimitingCollector soon, I'm gonna remove it too (which will cause tests to fail).
        When we hide things like this, then nobody will fix them.

        Show
        Robert Muir added a comment - I don't like it to kill the PKCS thread, I just added it to the exclusion list in LuceneTestCase (using a regex on SunPKCS12). There may be other platform doing the same (as the SunPKCS provider uses for performance and consistency native algorithm routines from the underlying operating system, like the root certificates). please do not do this. i do not want the exclusion list growing. in fact, if we don't fix this TimeLimitingCollector soon, I'm gonna remove it too (which will cause tests to fail). When we hide things like this, then nobody will fix them.
        Hide
        Uwe Schindler added a comment -

        This thread is a "system thread" and has nothing to do with Lucene. Killing it is something you should not do in a running JVM. To remove it from the exclusion list, just get a MessageDigest or somehow else access the PKCS library before you build the system thread map (the static while loop in LTC).

        Sorry, this time I don't agree with you and will revert all commits like that!

        Show
        Uwe Schindler added a comment - This thread is a "system thread" and has nothing to do with Lucene. Killing it is something you should not do in a running JVM. To remove it from the exclusion list, just get a MessageDigest or somehow else access the PKCS library before you build the system thread map (the static while loop in LTC). Sorry, this time I don't agree with you and will revert all commits like that!
        Hide
        Robert Muir added a comment -

        Sorry, I don't agree with you about the MessageDigest use.

        The fact MD5 exists is not guaranteed.

        So, you can say "-1 for the Md5 duplication" all you want but there is no duplication at all if the JVM does not support it!

        -1 for non-portable code.

        Show
        Robert Muir added a comment - Sorry, I don't agree with you about the MessageDigest use. The fact MD5 exists is not guaranteed . So, you can say "-1 for the Md5 duplication" all you want but there is no duplication at all if the JVM does not support it! -1 for non-portable code.
        Hide
        Robert Muir added a comment -

        setting this as a bug for 2.9.x and 3.0.x too, as its very serious.

        besides the fact we use non-portable code (the algorithm may not exist), we rely upon the default charset so the entire idea is completely broken.

        Show
        Robert Muir added a comment - setting this as a bug for 2.9.x and 3.0.x too, as its very serious. besides the fact we use non-portable code (the algorithm may not exist), we rely upon the default charset so the entire idea is completely broken.
        Hide
        Robert Muir added a comment -

        This thread is a "system thread" and has nothing to do with Lucene.

        Exactly, so we shouldn't be starting it in our code.

        Show
        Robert Muir added a comment - This thread is a "system thread" and has nothing to do with Lucene. Exactly, so we shouldn't be starting it in our code.
        Hide
        Robert Muir added a comment -

        One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException. Some user may stiull use a broken JVM where the charsets.jar file is lost (ok, that should not happen, but I like it more to have that).

        Your comment makes no sense, unlike the MD5 case, UTF-8 support is guaranteed in all versions of java.

        "Every implementation of the Java platform is required to support the following standard charsets.
        ...
        UTF-8 Eight-bit UCS Transformation Format"

        I think you are somehow confused about things that are mandatory to be supported in java,
        and things that only happen to exist in all sun jvms.

        Show
        Robert Muir added a comment - One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException. Some user may stiull use a broken JVM where the charsets.jar file is lost (ok, that should not happen, but I like it more to have that). Your comment makes no sense, unlike the MD5 case, UTF-8 support is guaranteed in all versions of java. "Every implementation of the Java platform is required to support the following standard charsets. ... UTF-8 Eight-bit UCS Transformation Format" I think you are somehow confused about things that are mandatory to be supported in java, and things that only happen to exist in all sun jvms .
        Hide
        Michael McCandless added a comment -

        I just hit this (got spooky leftover thread warning running test on OS
        X). I think we should fix it.

        I like the initial approach: let's not use MessageDigest at all
        (import our own MD5, that does not spawn threads!!). Sure it's code
        dup but it's tiny and it mitigates risk so I think it's well worth
        it.

        In general Lucene should not use "interesting" (risky) parts of the
        JVM/Java if we can avoid it w/o much cost, and this is a really silly
        reason to be using MessageDigest (similar to our now-gone crazy usage
        of ManagementFactory just to acquire a test lock). We are a search
        library! We must use the bare minimum of the OS/filesystem/JVM that
        we need.

        In fact in this case... can't we nuke DIGESTER altogether? Lucene now
        stores lock files in the index dir by default as "write.lock". We
        only need this digest if you change that dir. So, if your app somehow
        wants to put the lock file elsewhere (unusual), it should be up to you
        to name it "uniquely" relative to other IWs storing locks in the same
        dir (we can do this under a separate issue).

        And not using SecureRandom to create temp files is a no-brainer – the
        builtin File.createTempFile must be secure, by design, but we
        obviously don't need that here. I've had awful problems in the past w/
        SecureRandom (because my machine didn't have enough true
        randomness!). Again: Lucene should only use what's we really need.

        I think we can remove the controversial "interrupt the weird OS X
        PKCS11 thread" from the patch since serialization is now gone? I'd
        like to know if this thread suddenly pops up again in our tests... and
        I agree it's dangerous to interrupt this thread (it could then cause
        weird failures in subsequent tests, eg if the thread doesn't restart).

        One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException

        +1 – I like this idea (I don't do it now but I'll try to going
        forward). Defensive programming...

        Show
        Michael McCandless added a comment - I just hit this (got spooky leftover thread warning running test on OS X). I think we should fix it. I like the initial approach: let's not use MessageDigest at all (import our own MD5, that does not spawn threads!!). Sure it's code dup but it's tiny and it mitigates risk so I think it's well worth it. In general Lucene should not use "interesting" (risky) parts of the JVM/Java if we can avoid it w/o much cost, and this is a really silly reason to be using MessageDigest (similar to our now-gone crazy usage of ManagementFactory just to acquire a test lock). We are a search library! We must use the bare minimum of the OS/filesystem/JVM that we need. In fact in this case... can't we nuke DIGESTER altogether? Lucene now stores lock files in the index dir by default as "write.lock". We only need this digest if you change that dir. So, if your app somehow wants to put the lock file elsewhere (unusual), it should be up to you to name it "uniquely" relative to other IWs storing locks in the same dir (we can do this under a separate issue). And not using SecureRandom to create temp files is a no-brainer – the builtin File.createTempFile must be secure, by design, but we obviously don't need that here. I've had awful problems in the past w/ SecureRandom (because my machine didn't have enough true randomness!). Again: Lucene should only use what's we really need. I think we can remove the controversial "interrupt the weird OS X PKCS11 thread" from the patch since serialization is now gone? I'd like to know if this thread suddenly pops up again in our tests... and I agree it's dangerous to interrupt this thread (it could then cause weird failures in subsequent tests, eg if the thread doesn't restart). One thing: I don't like the empty catch blocks /* cannot happen */. Even if this is the case, please throw at least a RuntimException +1 – I like this idea (I don't do it now but I'll try to going forward). Defensive programming...
        Hide
        Michael McCandless added a comment -

        New patch, just modernized to current trunk, removed special-case
        checking of OS X's PKCS11 thread (since we no longer serialize), and
        throw new RE() in those "impossible" catch clauses.

        I think it's important we poach our own MD5 impl because 1) Java's
        crypto lib is not guaranteed to impl MD5, 2) using MD5 has hidden
        costs (spawning this thread in OS X), and 3) no longer using
        SecureRandom just to create our temp test dirs makes great sense
        (we've had crazy problems w/ SecureRandom in the past).

        I think it's ready to commit. Uwe do you still have such strong
        objections?

        Show
        Michael McCandless added a comment - New patch, just modernized to current trunk, removed special-case checking of OS X's PKCS11 thread (since we no longer serialize), and throw new RE() in those "impossible" catch clauses. I think it's important we poach our own MD5 impl because 1) Java's crypto lib is not guaranteed to impl MD5, 2) using MD5 has hidden costs (spawning this thread in OS X), and 3) no longer using SecureRandom just to create our temp test dirs makes great sense (we've had crazy problems w/ SecureRandom in the past). I think it's ready to commit. Uwe do you still have such strong objections?
        Hide
        Michael McCandless added a comment -

        New patch, I think it's ready to commit!

        I dropped MD5 entirely and cutover to our own simple digest computation (same as String.hashCode).

        Show
        Michael McCandless added a comment - New patch, I think it's ready to commit! I dropped MD5 entirely and cutover to our own simple digest computation (same as String.hashCode).
        Hide
        Uwe Schindler added a comment -

        Hi Mike,
        from my point of view, this seems fine. We broke the MD5 already by forcing UTF-8 instead of the default character set, so we can even change the whole hashing

        Collisions are not really an issue, this lock file naming is only used for lock files outside index directory to make the chance of file name collisions less. Normal users will store lock files in index directory, so this is not really an issue (lock prefix is null for this case).

        Alternatively, to prevent collision, instead of hashing we could also collect all characters that are valid for file names (remove all non-ASCII chars and '/'). But this could produce a too-long filename.

        Show
        Uwe Schindler added a comment - Hi Mike, from my point of view, this seems fine. We broke the MD5 already by forcing UTF-8 instead of the default character set, so we can even change the whole hashing Collisions are not really an issue, this lock file naming is only used for lock files outside index directory to make the chance of file name collisions less. Normal users will store lock files in index directory, so this is not really an issue (lock prefix is null for this case). Alternatively, to prevent collision, instead of hashing we could also collect all characters that are valid for file names (remove all non-ASCII chars and '/'). But this could produce a too-long filename.
        Hide
        Robert Muir added a comment -

        bulk close for 3.3

        Show
        Robert Muir added a comment - bulk close for 3.3

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development