PyLucene
  1. PyLucene
  2. PYLUCENE-17

Possible race condition with pylucene attachCurrentThread

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Labels:
    • Environment:
      Linux 2.6.39
      Sun jdk 1.6.26

      Description

      It looks like there is a possible race that can cause null pointer exceptions in the JVM, making it crash

      Because its a race it is hard to reproduce, the best luck I have had so far is dropping my FS cache in the OS, which seems to slow down the initialisation of the JVM enough to make it easier to reproduce.

      Attached is my test case

      Test session follows
      ---------------------------------------------------------------
      greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'
      greg@localhost ~/programming/python $ python ./lucene-threadtest.py
      #

      1. A fatal error has been detected by the Java Runtime Environment:
        #
      2. SIGSEGV (0xb) at pc=0x00007f79226b35c8, pid=26581, tid=140158003312384
        #
      3. JRE version: 6.0_26-b03
      4. Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
      5. Problematic frame:
      6. V [libjvm.so+0x4b05c8] instanceKlass::cached_itable_index(unsigned long)+0x18
        #
      7. An error report file with more information is saved as:
      8. /home/greg/programming/python/hs_err_pid26581.log
        #
      9. If you would like to submit a bug report, please visit:
      10. http://java.sun.com/webapps/bugreport/crash.jsp
        #
        Aborted (core dumped)
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        greg@localhost ~/programming/python $ rm -r /tmp/test-index/
        greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        #
      11. A fatal error has been detected by the Java Runtime Environment:
        [thread 139988165344768 also had an error][thread 139988165344768 also had an error]#
      1. SIGSEGV (0xb)
        at pc=0x00007f5197550a29, pid=27657, tid=139988039468800
        #
      2. JRE version: 6.0_26-b03
      3. Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
      4. Problematic frame:
      5. V [libjvm.so+0x4f2a29] unsigned+0x299
        #
      6. An error report file with more information is saved as:
      7. /home/greg/programming/python/hs_err_pid27657.log
        #
      8. If you would like to submit a bug report, please visit:
      9. http://java.sun.com/webapps/bugreport/crash.jsp
        #
        Aborted (core dumped)
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        greg@localhost ~/programming/python $ sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'
        greg@localhost ~/programming/python $ python ./lucene-threadtest.py
        #
      10. A fatal error has been detected by the Java Runtime Environment:
        #
      11. SIGSEGV (0xb) at pc=0x00007f51bc2eaa1e, pid=28124, tid=139988377052928
        #
      12. JRE version: 6.0_26-b03
      13. Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
      14. Problematic frame:
      15. V [libjvm.so+0x4f2a1e] unsigned+0x28e
        #
      16. An error report file with more information is saved as:
      17. /home/greg/programming/python/hs_err_pid28124.log
        #
      18. If you would like to submit a bug report, please visit:
      19. http://java.sun.com/webapps/bugreport/crash.jsp
        #
        Aborted (core dumped)
        greg@localhost ~/programming/python $
      1. PYLUCENE-17-4.patch
        53 kB
        Patrick J. McNerthney
      2. diff.17.txt
        49 kB
        Andi Vajda
      3. PYLUCENE-17-3.patch
        7 kB
        Patrick J. McNerthney
      4. backtrace
        113 kB
        Greg Bowyer
      5. lucene-threadtest.py
        1 kB
        Greg Bowyer

        Activity

        Hide
        Greg Bowyer added a comment - - edited

        I have on my development machine an automated corefile backtrace script, attached is the backtrace I get from a crashed version

        Show
        Greg Bowyer added a comment - - edited I have on my development machine an automated corefile backtrace script, attached is the backtrace I get from a crashed version
        Hide
        Andi Vajda added a comment -

        Yes, this makes sense. Looking at your backtrace a definite possibility is the class$ static variable on each of the C++ wrapper classes generated for a Java class. These are unprotected (oops) and could lead to such a race condition when multiple threads are all bootstrapping the code.
        I should at least mark it 'volatile' or whatever the construct is in C++ to ensure that the variable, when read, is reflecting the shared reality of all the threads.
        It is ok, but wasteful, to have multiple threads repeatedly/concurrently initialize the class$ and mids$ arrays for all the classes being loaded. It is not ok for a class$ to be reported not NULL yet being seen as NULL at other times. The class$ object, a jclass handle (the size of a pointer) must be read atomically and reliably. I believe the fix should be trivial if my theory is correct and the right C++ primitive exists (I believe so, I have to check). Add a mutex around class$ would be a big slowdown.
        If it turns out that the class$ variable should not be shared across threads (I could imagine Classloader tricks readily defeating this sharing), then it would have to be moved to thread-local storage, which would slow down thread initialization a bit since the class$ and mids$ caching would have to be reproduced for each thread. This may be in fact the safest fix.

        An incomplete workaround would be to have the main thread run the code being run by all the other threads first.
        A complete workaround would be to have the main thread force-initialize all the classes needed by the code being run in other threads.

        Show
        Andi Vajda added a comment - Yes, this makes sense. Looking at your backtrace a definite possibility is the class$ static variable on each of the C++ wrapper classes generated for a Java class. These are unprotected (oops) and could lead to such a race condition when multiple threads are all bootstrapping the code. I should at least mark it 'volatile' or whatever the construct is in C++ to ensure that the variable, when read, is reflecting the shared reality of all the threads. It is ok, but wasteful, to have multiple threads repeatedly/concurrently initialize the class$ and mids$ arrays for all the classes being loaded. It is not ok for a class$ to be reported not NULL yet being seen as NULL at other times. The class$ object, a jclass handle (the size of a pointer) must be read atomically and reliably. I believe the fix should be trivial if my theory is correct and the right C++ primitive exists (I believe so, I have to check). Add a mutex around class$ would be a big slowdown. If it turns out that the class$ variable should not be shared across threads (I could imagine Classloader tricks readily defeating this sharing), then it would have to be moved to thread-local storage, which would slow down thread initialization a bit since the class$ and mids$ caching would have to be reproduced for each thread. This may be in fact the safest fix. An incomplete workaround would be to have the main thread run the code being run by all the other threads first. A complete workaround would be to have the main thread force-initialize all the classes needed by the code being run in other threads.
        Hide
        Andi Vajda added a comment -

        Assuming this is the right C++ keyword, to test this theory (since you can reproduce the bug and I haven't tried yet) you should insert a 'volatile' keyword into the declaration of class$.
        This is done around line 892 in jcc/jcc/cpp.py.
        Then rebuild jcc and pylucene before re-running your test.

        Show
        Andi Vajda added a comment - Assuming this is the right C++ keyword, to test this theory (since you can reproduce the bug and I haven't tried yet) you should insert a 'volatile' keyword into the declaration of class$. This is done around line 892 in jcc/jcc/cpp.py. Then rebuild jcc and pylucene before re-running your test.
        Hide
        Andi Vajda added a comment -

        And I stand corrected, class$ is not a jclass handle but in fact a pointer to the java::lang::Class wrapper for the wrapped object's class.

        Show
        Andi Vajda added a comment - And I stand corrected, class$ is not a jclass handle but in fact a pointer to the java::lang::Class wrapper for the wrapped object's class.
        Hide
        Greg Bowyer added a comment -

        ... I was wondering if it is a JNI local reference, that prehaps it should be a global reference type instead, I dont know the code that well, but my take on it is if its a static shouldnt it be a jni global ?

        Show
        Greg Bowyer added a comment - ... I was wondering if it is a JNI local reference, that prehaps it should be a global reference type instead, I dont know the code that well, but my take on it is if its a static shouldnt it be a jni global ?
        Hide
        Andi Vajda added a comment -

        That is most likely not the problem, any jobject handle that escapes the VM and is wrapped is turned into a global ref.
        See the JObject() constructor in jcc/jcc/sources/JObject.h

        Show
        Andi Vajda added a comment - That is most likely not the problem, any jobject handle that escapes the VM and is wrapped is turned into a global ref. See the JObject() constructor in jcc/jcc/sources/JObject.h
        Hide
        Greg Bowyer added a comment -

        check that makes sense, again I am looking at this from not knowing the code :S

        Show
        Greg Bowyer added a comment - check that makes sense, again I am looking at this from not knowing the code :S
        Hide
        Patrick J. McNerthney added a comment -

        I just ran into this very problem with our use of JCC to integrate Python with the Eclipse BIRT Report Runtime. I have a test which is similar to Greg's which is wrapped in a shell script that repeatedly runs the test, since the error only occurs when the class is in need of being initialized. This test won't run for more then a minute before causing a seg fault. I initially tried adding the volatile keyword, but that had no effect.

        I believe I have figured out the exact cause of the seg fault. It is not the class$ pointer being null, but bad values in the mids$ array. The sequence of events is something like so:

        1. Thread A calls initializeClass, finds class$ to be null and starts to initialize it.
        2. Thread A sets mids$ to an empty array and begins populating it.
        3. Before Thread A sets class$ to a value, Thread B interrupts it starts executing.
        4. Thread B calls initializeClass, finds class$ to be null and starts to initialize it.
        5. Thread B sets mids$ to an empty array and begins populating it.
        6. Before Thread B finishes populating mids$, Thread A interrupts Thread B and starts executing, leaving empty mids$ entry values.
        7. Thread A continues executing, sets class$ to it's value and assumes that the class is now properly initialized.
        8. Thread A accesses one of the mids$ entry values that were left empty by Thread B because it was interrupted.
        9. Thread A causes a seg fault trying to use the empty mid value.

        I found that if I changed the initializeClass method to initially store the mids$ array in a locally scoped variable, populate that, and then only set mids$ with the filled in array (and to do the same with fids$) that my test would not fail.

        However, I do not care for that solution, because multiple threads still end up thinking they need to initialize the class, causing some objects to be allocated multiple times and are lost. My preferred solution is to use a mutex with a double check for class$ being null, the first check outside of the mutex and the second check inside the mutex. This avoids the mutex overhead once things are initialized, but still provides proper threading synchronization.

        Attached is a patch against the current pylucene trunk which uses the JCCEnv mutex to protect the initializeClass code. This has only been tested under Linux. I attempted to implement Windows support, but have no idea if it is correct.

        Show
        Patrick J. McNerthney added a comment - I just ran into this very problem with our use of JCC to integrate Python with the Eclipse BIRT Report Runtime. I have a test which is similar to Greg's which is wrapped in a shell script that repeatedly runs the test, since the error only occurs when the class is in need of being initialized. This test won't run for more then a minute before causing a seg fault. I initially tried adding the volatile keyword, but that had no effect. I believe I have figured out the exact cause of the seg fault. It is not the class$ pointer being null, but bad values in the mids$ array. The sequence of events is something like so: 1. Thread A calls initializeClass, finds class$ to be null and starts to initialize it. 2. Thread A sets mids$ to an empty array and begins populating it. 3. Before Thread A sets class$ to a value, Thread B interrupts it starts executing. 4. Thread B calls initializeClass, finds class$ to be null and starts to initialize it. 5. Thread B sets mids$ to an empty array and begins populating it. 6. Before Thread B finishes populating mids$, Thread A interrupts Thread B and starts executing, leaving empty mids$ entry values. 7. Thread A continues executing, sets class$ to it's value and assumes that the class is now properly initialized. 8. Thread A accesses one of the mids$ entry values that were left empty by Thread B because it was interrupted. 9. Thread A causes a seg fault trying to use the empty mid value. I found that if I changed the initializeClass method to initially store the mids$ array in a locally scoped variable, populate that, and then only set mids$ with the filled in array (and to do the same with fids$) that my test would not fail. However, I do not care for that solution, because multiple threads still end up thinking they need to initialize the class, causing some objects to be allocated multiple times and are lost. My preferred solution is to use a mutex with a double check for class$ being null, the first check outside of the mutex and the second check inside the mutex. This avoids the mutex overhead once things are initialized, but still provides proper threading synchronization. Attached is a patch against the current pylucene trunk which uses the JCCEnv mutex to protect the initializeClass code. This has only been tested under Linux. I attempted to implement Windows support, but have no idea if it is correct.
        Hide
        Andi Vajda added a comment -

        Thank you for the patch(es). I think that your explanation of the problem makes sense.
        I'm working on integrating your fix but via another route that doesn't involve changing the code generator but instead adding a getClass() method onto JCCEnv that invokes initializeClass inside a 'lock locked' block.
        I changed all places currently calling initializeClass() to call env-getClass(initializeClass) instead.
        I'm not sure I understand what you mean with the second patch you sent that renames the lock type to lock$.
        If it has to do with not introducing a new 'reserved' word because it is used in the generated code then my version of your first patch should not have that problem.
        Could you please attach the test your wrote so that I can verify that my version of your patch still fixes the problem ?
        Thanks !

        Show
        Andi Vajda added a comment - Thank you for the patch(es). I think that your explanation of the problem makes sense. I'm working on integrating your fix but via another route that doesn't involve changing the code generator but instead adding a getClass() method onto JCCEnv that invokes initializeClass inside a 'lock locked' block. I changed all places currently calling initializeClass() to call env-getClass(initializeClass) instead. I'm not sure I understand what you mean with the second patch you sent that renames the lock type to lock$. If it has to do with not introducing a new 'reserved' word because it is used in the generated code then my version of your first patch should not have that problem. Could you please attach the test your wrote so that I can verify that my version of your patch still fixes the problem ? Thanks !
        Hide
        Patrick J. McNerthney added a comment -

        Proposed patch against pylucene/trunk to fix jira issue PYLUCENE-17. (Version 3)

        Show
        Patrick J. McNerthney added a comment - Proposed patch against pylucene/trunk to fix jira issue PYLUCENE-17 . (Version 3)
        Hide
        Patrick J. McNerthney added a comment -

        "I'm working on integrating your fix but via another route that doesn't involve changing the code generator but instead adding a getClass() method onto JCCEnv that invokes initializeClass inside a 'lock locked' block."

        That will require the mutex to always be obtained, no? My patch only obtains the mutex if the class has never been initialized by generating code that looks like this:

        jclass AbstractMap::initializeClass()
        {
        if (!class$)
        {
        lock$ locked;
        if (!class$)
        {
        // We really do need to initialize it...

        "I'm not sure I understand what you mean with the second patch you sent that renames the lock type to lock$."

        I found a case where a class being wrapped contained a function called "lock". This caused the "lock" declaration in the classes .h file to override the lock declaration in JCCEnv.h, causing the "lock locked;" statement to not compile. I changed my patch to always post fix $ to every name used by the generated code.

        "Could you please attach the test your wrote so that I can verify that my version of your patch still fixes the problem ?"

        I am not using all of pylucene, but only using JCC to wrap the Eclipse BIRT Report Runtime. My test involves using this wrapped Java library, so I don't think it would be helpful to you. I am pretty sure if you put Greg's test in a shell script that repeatedly calls it, it will fail fairly reliably.

        Show
        Patrick J. McNerthney added a comment - "I'm working on integrating your fix but via another route that doesn't involve changing the code generator but instead adding a getClass() method onto JCCEnv that invokes initializeClass inside a 'lock locked' block." That will require the mutex to always be obtained, no? My patch only obtains the mutex if the class has never been initialized by generating code that looks like this: jclass AbstractMap::initializeClass() { if (!class$) { lock$ locked; if (!class$) { // We really do need to initialize it... "I'm not sure I understand what you mean with the second patch you sent that renames the lock type to lock$." I found a case where a class being wrapped contained a function called "lock". This caused the "lock" declaration in the classes .h file to override the lock declaration in JCCEnv.h, causing the "lock locked;" statement to not compile. I changed my patch to always post fix $ to every name used by the generated code. "Could you please attach the test your wrote so that I can verify that my version of your patch still fixes the problem ?" I am not using all of pylucene, but only using JCC to wrap the Eclipse BIRT Report Runtime. My test involves using this wrapped Java library, so I don't think it would be helpful to you. I am pretty sure if you put Greg's test in a shell script that repeatedly calls it, it will fail fairly reliably.
        Hide
        Andi Vajda added a comment -

        That is true, the lock would have to be always obtained. On the other hand, not having to export that lock variable would make
        a fix considerably easier on Windows. Let me see if I can think of a way to not use the lock if class$ is not NULL (although, something somewhere tells me that this double-check pattern is a bit flawed if the check on class$ is not atomic, is it ?).

        About the test, yes, Greg's test should do.
        I can also attach my version of the patch here, of course, for you to check it out against your test case.

        Show
        Andi Vajda added a comment - That is true, the lock would have to be always obtained. On the other hand, not having to export that lock variable would make a fix considerably easier on Windows. Let me see if I can think of a way to not use the lock if class$ is not NULL (although, something somewhere tells me that this double-check pattern is a bit flawed if the check on class$ is not atomic, is it ?). About the test, yes, Greg's test should do. I can also attach my version of the patch here, of course, for you to check it out against your test case.
        Hide
        Andi Vajda added a comment -

        Sadly, I can't reproduce the bug with Greg's test. Doesn't crash here. Not implying there isn't a bug, of course.

        Show
        Andi Vajda added a comment - Sadly, I can't reproduce the bug with Greg's test. Doesn't crash here. Not implying there isn't a bug, of course.
        Hide
        Andi Vajda added a comment -

        Correction, it does reproduce inside a while forever loop. My bad.

        Show
        Andi Vajda added a comment - Correction, it does reproduce inside a while forever loop. My bad.
        Hide
        Andi Vajda added a comment -

        With this patch, I can no longer reproduce this bug using Greg's test. Please try it with your use case and let me know if this resolves the issue for your use case too.
        In this patch, I only use the lock if the class$ variable was found to be NULL.

        Show
        Andi Vajda added a comment - With this patch, I can no longer reproduce this bug using Greg's test. Please try it with your use case and let me know if this resolves the issue for your use case too. In this patch, I only use the lock if the class$ variable was found to be NULL.
        Hide
        Patrick J. McNerthney added a comment -

        Your diff.17.txt patch works against my test also.

        There is still one small part of this initialization process that concerns me. The class static fields are initialized after the class$ value is set. This introduces the following possibility:

        1. Thread A calls initializeClass, finds class$ to be null and starts to initialize it.
        2. Thread A sets class$ to it's value.
        3. Before Thread A set the class static fields to their values, Thread B interrupts it starts executing.
        4. Thread B calls initializeClass, finds class$ have it's value and returns.
        5. Thread B then accesses one or more of that classes static fields, getting 0 or NULL values instead of their proper values.

        I initially attempted to instead set class$ after the initialization of the class static fields, but realized that there is a recursive issue that required establishing class$ before the class static fields are initialized. The Boolean class is a good example, where the TRUE and FALSE fields instantiate Boolean instances, causing the initializeClass function to be called again. I suppose it is also possible that a class static field of a different type might end up being initialized, and that class could then instantiate a static field memory of the original type.

        My latest attachment enhances your patch with what I believe is a way to allow for the recursion that occurs when the class static fields are initialized when the class$ has not been set.

        Let me know if this makes sense to you.

        Show
        Patrick J. McNerthney added a comment - Your diff.17.txt patch works against my test also. There is still one small part of this initialization process that concerns me. The class static fields are initialized after the class$ value is set. This introduces the following possibility: 1. Thread A calls initializeClass, finds class$ to be null and starts to initialize it. 2. Thread A sets class$ to it's value. 3. Before Thread A set the class static fields to their values, Thread B interrupts it starts executing. 4. Thread B calls initializeClass, finds class$ have it's value and returns. 5. Thread B then accesses one or more of that classes static fields, getting 0 or NULL values instead of their proper values. I initially attempted to instead set class$ after the initialization of the class static fields, but realized that there is a recursive issue that required establishing class$ before the class static fields are initialized. The Boolean class is a good example, where the TRUE and FALSE fields instantiate Boolean instances, causing the initializeClass function to be called again. I suppose it is also possible that a class static field of a different type might end up being initialized, and that class could then instantiate a static field memory of the original type. My latest attachment enhances your patch with what I believe is a way to allow for the recursion that occurs when the class static fields are initialized when the class$ has not been set. Let me know if this makes sense to you.
        Hide
        Patrick J. McNerthney added a comment -

        Sorry about the multiple uploads...hopefully this is the one.

        Show
        Patrick J. McNerthney added a comment - Sorry about the multiple uploads...hopefully this is the one.
        Hide
        Andi Vajda added a comment -

        While there could be something like your scenario going on, I'm not sure it's a problem (I'm a bit confused right now).
        The static fields are not initialized during initializeClass() and their initialization is not triggered by initializeClass() either (your step 1 there is incomplete). The static fields are setup during <class>::initialize() which is called by the _initialize() calls defined in __init.cpp which are called by the main __initialize_() function which is what is called when you call initVM() from Python for your module (thus why it is important to call the initVM() that comes with your module).

        If importing the module and calling initVM() is all done in one thread, the main one, others not running until initVM is complete, then I don't even understand right now (I'm confused, I said , how we can even get this bug. But I reproduced it, so I'm not arguing anything, just trying to wrap my head around this process again.

        Right now, I'm clearly missing something. I don't understand why this bug even happened in the first place anymore

        Show
        Andi Vajda added a comment - While there could be something like your scenario going on, I'm not sure it's a problem (I'm a bit confused right now). The static fields are not initialized during initializeClass() and their initialization is not triggered by initializeClass() either (your step 1 there is incomplete). The static fields are setup during <class>::initialize() which is called by the _ initialize () calls defined in __init .cpp which are called by the main __initialize _() function which is what is called when you call initVM() from Python for your module (thus why it is important to call the initVM() that comes with your module). If importing the module and calling initVM() is all done in one thread, the main one, others not running until initVM is complete, then I don't even understand right now (I'm confused, I said , how we can even get this bug. But I reproduced it, so I'm not arguing anything, just trying to wrap my head around this process again. Right now, I'm clearly missing something. I don't understand why this bug even happened in the first place anymore
        Hide
        Greg Bowyer added a comment -

        I am going to ask something really dumb that I will probably regret,
        couldn't we just lock the python GIL inside the initializeClass method ?

        Show
        Greg Bowyer added a comment - I am going to ask something really dumb that I will probably regret, couldn't we just lock the python GIL inside the initializeClass method ?
        Hide
        Andi Vajda added a comment -

        The bug with initializeClass() is already fixed with a lock. (fix is not yet checked in).
        The possible bug with initializing statics in <class>::initialize() remains to be explained/fixed.

        Show
        Andi Vajda added a comment - The bug with initializeClass() is already fixed with a lock. (fix is not yet checked in). The possible bug with initializing statics in <class>::initialize() remains to be explained/fixed.
        Hide
        Patrick J. McNerthney added a comment -

        What I mean by "class static fields" are those fields of the Java class that are declared as static. java.lang.Boolean is a good example, with it's TRUE and FALSE static fields. If you look at the Boolean::initializeClass generated code, it looks like this:

        jclass Boolean::initializeClass(bool getOnly)
        {
        if (getOnly)
        return (jclass) (class$ == NULL ? NULL : class$->this$);
        if (!class$)

        { jclass cls = (jclass) env->findClass("java/lang/Boolean"); mids$ = new jmethodID[max_mid]; mids$[mid_init$_bb0c767f] = env->getMethodID(cls, "<init>", "(Z)V"); mids$[mid_init$_5fdc3f48] = env->getMethodID(cls, "<init>", "(Ljava/lang/String;)V"); mids$[mid_booleanValue_54c6a16a] = env->getMethodID(cls, "booleanValue", "()Z"); mids$[mid_compareTo_d07f0c91] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Boolean;)I"); mids$[mid_compareTo_290588f1] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Object;)I"); mids$[mid_equals_290588e2] = env->getMethodID(cls, "equals", "(Ljava/lang/Object;)Z"); mids$[mid_getBoolean_5fdc3f44] = env->getStaticMethodID(cls, "getBoolean", "(Ljava/lang/String;)Z"); mids$[mid_hashCode_54c6a179] = env->getMethodID(cls, "hashCode", "()I"); mids$[mid_parseBoolean_5fdc3f44] = env->getStaticMethodID(cls, "parseBoolean", "(Ljava/lang/String;)Z"); mids$[mid_toString_14c7b5c5] = env->getMethodID(cls, "toString", "()Ljava/lang/String;"); mids$[mid_toString_445a175e] = env->getStaticMethodID(cls, "toString", "(Z)Ljava/lang/String;"); mids$[mid_valueOf_a98d5bba] = env->getStaticMethodID(cls, "valueOf", "(Z)Ljava/lang/Boolean;"); mids$[mid_valueOf_9d4a8ff9] = env->getStaticMethodID(cls, "valueOf", "(Ljava/lang/String;)Ljava/lang/Boolean;"); class$ = (::java::lang::Class *) new JObject(cls); FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;")); TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;")); TYPE = new ::java::lang::Class(env->getStaticObjectField(cls, "TYPE", "Ljava/lang/Class;")); }

        return (jclass) class$->this$;
        }

        In the code above the class$ variable is set to a non NULL value before the FALSE, TRUE, and TYPE class static fields are in fact properly initialized. So calling Boolean::initializeClass does not 100% ensure that those static fields are for sure, for sure, correctly initialized.

        Show
        Patrick J. McNerthney added a comment - What I mean by "class static fields" are those fields of the Java class that are declared as static. java.lang.Boolean is a good example, with it's TRUE and FALSE static fields. If you look at the Boolean::initializeClass generated code, it looks like this: jclass Boolean::initializeClass(bool getOnly) { if (getOnly) return (jclass) (class$ == NULL ? NULL : class$->this$); if (!class$) { jclass cls = (jclass) env->findClass("java/lang/Boolean"); mids$ = new jmethodID[max_mid]; mids$[mid_init$_bb0c767f] = env->getMethodID(cls, "<init>", "(Z)V"); mids$[mid_init$_5fdc3f48] = env->getMethodID(cls, "<init>", "(Ljava/lang/String;)V"); mids$[mid_booleanValue_54c6a16a] = env->getMethodID(cls, "booleanValue", "()Z"); mids$[mid_compareTo_d07f0c91] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Boolean;)I"); mids$[mid_compareTo_290588f1] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Object;)I"); mids$[mid_equals_290588e2] = env->getMethodID(cls, "equals", "(Ljava/lang/Object;)Z"); mids$[mid_getBoolean_5fdc3f44] = env->getStaticMethodID(cls, "getBoolean", "(Ljava/lang/String;)Z"); mids$[mid_hashCode_54c6a179] = env->getMethodID(cls, "hashCode", "()I"); mids$[mid_parseBoolean_5fdc3f44] = env->getStaticMethodID(cls, "parseBoolean", "(Ljava/lang/String;)Z"); mids$[mid_toString_14c7b5c5] = env->getMethodID(cls, "toString", "()Ljava/lang/String;"); mids$[mid_toString_445a175e] = env->getStaticMethodID(cls, "toString", "(Z)Ljava/lang/String;"); mids$[mid_valueOf_a98d5bba] = env->getStaticMethodID(cls, "valueOf", "(Z)Ljava/lang/Boolean;"); mids$[mid_valueOf_9d4a8ff9] = env->getStaticMethodID(cls, "valueOf", "(Ljava/lang/String;)Ljava/lang/Boolean;"); class$ = (::java::lang::Class *) new JObject(cls); FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;")); TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;")); TYPE = new ::java::lang::Class(env->getStaticObjectField(cls, "TYPE", "Ljava/lang/Class;")); } return (jclass) class$->this$; } In the code above the class$ variable is set to a non NULL value before the FALSE, TRUE, and TYPE class static fields are in fact properly initialized. So calling Boolean::initializeClass does not 100% ensure that those static fields are for sure, for sure , correctly initialized.
        Hide
        Andi Vajda added a comment -

        Ok, I missed that the problem is with the C++ side of things, not the Python layer.
        And with your initializing$ change, you make getClass(true) return NULL so that the locked path is taken. Got it.
        Thank you for the explanation.

        Show
        Andi Vajda added a comment - Ok, I missed that the problem is with the C++ side of things, not the Python layer. And with your initializing$ change, you make getClass(true) return NULL so that the locked path is taken. Got it. Thank you for the explanation.
        Hide
        Andi Vajda added a comment -

        fixed in rev 1353792.
        I ended up implementing something a bit simpler than PYLUCENE-17-4.
        I added a static bool live$; variable that causes the initializeClass(true) case to return NULL when !$live.
        This causes the slower path to be taken, via the lock.

        for a in

        {1..100}

        ; do _install/bin/python lucene-threadtest.py ; echo $a; done
        seems to succeed reliably.

        Show
        Andi Vajda added a comment - fixed in rev 1353792. I ended up implementing something a bit simpler than PYLUCENE-17 -4. I added a static bool live$; variable that causes the initializeClass(true) case to return NULL when !$live. This causes the slower path to be taken, via the lock. for a in {1..100} ; do _install/bin/python lucene-threadtest.py ; echo $a; done seems to succeed reliably.
        Hide
        Patrick J. McNerthney added a comment -

        (I was in the middle of composing this comment when your comment showed up. I haven't had a chance to look at it, but once this comment has been submitted.)

        "And with your initializing$ change, you make getClass(true) return NULL so that the locked path is taken." - Actually...not quite right.

        Here is the code generated for Boolean::initializeClass:

        jclass Boolean::initializeClass(bool getOnly)
        {
        if (getOnly)
        return (jclass) (class$ == NULL ? NULL : class$->this$);
        if (!class$)

        { if (initializing$) return NULL; initializing$ = true; jclass cls = (jclass) env->findClass("java/lang/Boolean"); mids$ = new jmethodID[max_mid]; mids$[mid_init$_bb0c767f] = env->getMethodID(cls, "<init>", "(Z)V"); mids$[mid_init$_5fdc3f48] = env->getMethodID(cls, "<init>", "(Ljava/lang/String;)V"); mids$[mid_booleanValue_54c6a16a] = env->getMethodID(cls, "booleanValue", "()Z"); mids$[mid_compareTo_d07f0c91] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Boolean;)I"); mids$[mid_compareTo_290588f1] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Object;)I"); mids$[mid_equals_290588e2] = env->getMethodID(cls, "equals", "(Ljava/lang/Object;)Z"); mids$[mid_getBoolean_5fdc3f44] = env->getStaticMethodID(cls, "getBoolean", "(Ljava/lang/String;)Z"); mids$[mid_hashCode_54c6a179] = env->getMethodID(cls, "hashCode", "()I"); mids$[mid_parseBoolean_5fdc3f44] = env->getStaticMethodID(cls, "parseBoolean", "(Ljava/lang/String;)Z"); mids$[mid_toString_14c7b5c5] = env->getMethodID(cls, "toString", "()Ljava/lang/String;"); mids$[mid_toString_445a175e] = env->getStaticMethodID(cls, "toString", "(Z)Ljava/lang/String;"); mids$[mid_valueOf_a98d5bba] = env->getStaticMethodID(cls, "valueOf", "(Z)Ljava/lang/Boolean;"); mids$[mid_valueOf_9d4a8ff9] = env->getStaticMethodID(cls, "valueOf", "(Ljava/lang/String;)Ljava/lang/Boolean;"); FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;")); TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;")); TYPE = new ::java::lang::Class(env->getStaticObjectField(cls, "TYPE", "Ljava/lang/Class;")); initializing$ = false; class$ = (::java::lang::Class *) new JObject(cls); }

        return (jclass) class$->this$;
        }

        This part:

        if (getOnly)
        return (jclass) (class$ == NULL ? NULL : class$->this$);

        always handles all cases of being called without the mutex lock in place and works exactly like your version does. Remember, all the rest of the code will be executing under the mutex lock, so it is not an accurate statement to say that "you make getClass(true) return NULL so that the locked path is taken.". Following along in the rest of the code, this part:

        if (!class$)
        {

        also works exactly the same as before. If another thread managed to initialize this class before we got the mutex, then no need for us to initialize it. Next is this new code:

        if (initializing$)
        return NULL;

        This first checks to see if this class is currently being initialized. Since there is only one intializeClass mutex, there can only be one thread at a time performing initialization and since the current thread has the mutex, the current thread is the only one who could be recursively initializing this class. So, if we are in the middle of initializing this class, just return null, because the return value is not needed. If we did not return now, we would end up in an infinite loop.

        initializing$ = true;

        This flags this class as being actively initialized. Then, further down in the code are these two statements:

        FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;"));
        TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;"));

        This is was is doing to cause the recursive call back into the initializeClass method. The Boolean constructor does the call to "env->getClass(initializeClass)", but the return value is ignored. This is why it is okay for the above returning of NULL if initializing$ is true. This code:

        initializing$ = false;
        class$ = (::java::lang::Class *) new JObject(cls);

        Flags this class as no longer being actively initialized and then sets the class$ field. Note that this is occurring after the above initialization of FALSE and TRUE now.

        Show
        Patrick J. McNerthney added a comment - (I was in the middle of composing this comment when your comment showed up. I haven't had a chance to look at it, but once this comment has been submitted.) "And with your initializing$ change, you make getClass(true) return NULL so that the locked path is taken." - Actually...not quite right. Here is the code generated for Boolean::initializeClass: jclass Boolean::initializeClass(bool getOnly) { if (getOnly) return (jclass) (class$ == NULL ? NULL : class$->this$); if (!class$) { if (initializing$) return NULL; initializing$ = true; jclass cls = (jclass) env->findClass("java/lang/Boolean"); mids$ = new jmethodID[max_mid]; mids$[mid_init$_bb0c767f] = env->getMethodID(cls, "<init>", "(Z)V"); mids$[mid_init$_5fdc3f48] = env->getMethodID(cls, "<init>", "(Ljava/lang/String;)V"); mids$[mid_booleanValue_54c6a16a] = env->getMethodID(cls, "booleanValue", "()Z"); mids$[mid_compareTo_d07f0c91] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Boolean;)I"); mids$[mid_compareTo_290588f1] = env->getMethodID(cls, "compareTo", "(Ljava/lang/Object;)I"); mids$[mid_equals_290588e2] = env->getMethodID(cls, "equals", "(Ljava/lang/Object;)Z"); mids$[mid_getBoolean_5fdc3f44] = env->getStaticMethodID(cls, "getBoolean", "(Ljava/lang/String;)Z"); mids$[mid_hashCode_54c6a179] = env->getMethodID(cls, "hashCode", "()I"); mids$[mid_parseBoolean_5fdc3f44] = env->getStaticMethodID(cls, "parseBoolean", "(Ljava/lang/String;)Z"); mids$[mid_toString_14c7b5c5] = env->getMethodID(cls, "toString", "()Ljava/lang/String;"); mids$[mid_toString_445a175e] = env->getStaticMethodID(cls, "toString", "(Z)Ljava/lang/String;"); mids$[mid_valueOf_a98d5bba] = env->getStaticMethodID(cls, "valueOf", "(Z)Ljava/lang/Boolean;"); mids$[mid_valueOf_9d4a8ff9] = env->getStaticMethodID(cls, "valueOf", "(Ljava/lang/String;)Ljava/lang/Boolean;"); FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;")); TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;")); TYPE = new ::java::lang::Class(env->getStaticObjectField(cls, "TYPE", "Ljava/lang/Class;")); initializing$ = false; class$ = (::java::lang::Class *) new JObject(cls); } return (jclass) class$->this$; } This part: if (getOnly) return (jclass) (class$ == NULL ? NULL : class$->this$); always handles all cases of being called without the mutex lock in place and works exactly like your version does. Remember, all the rest of the code will be executing under the mutex lock, so it is not an accurate statement to say that "you make getClass(true) return NULL so that the locked path is taken.". Following along in the rest of the code, this part: if (!class$) { also works exactly the same as before. If another thread managed to initialize this class before we got the mutex, then no need for us to initialize it. Next is this new code: if (initializing$) return NULL; This first checks to see if this class is currently being initialized. Since there is only one intializeClass mutex, there can only be one thread at a time performing initialization and since the current thread has the mutex, the current thread is the only one who could be recursively initializing this class. So, if we are in the middle of initializing this class, just return null, because the return value is not needed. If we did not return now, we would end up in an infinite loop. initializing$ = true; This flags this class as being actively initialized. Then, further down in the code are these two statements: FALSE = new Boolean(env->getStaticObjectField(cls, "FALSE", "Ljava/lang/Boolean;")); TRUE = new Boolean(env->getStaticObjectField(cls, "TRUE", "Ljava/lang/Boolean;")); This is was is doing to cause the recursive call back into the initializeClass method. The Boolean constructor does the call to "env->getClass(initializeClass)", but the return value is ignored. This is why it is okay for the above returning of NULL if initializing$ is true. This code: initializing$ = false; class$ = (::java::lang::Class *) new JObject(cls); Flags this class as no longer being actively initialized and then sets the class$ field. Note that this is occurring after the above initialization of FALSE and TRUE now.
        Hide
        Patrick J. McNerthney added a comment -

        I like rev 1353792. I dare say I think I even like it better then my proposal! I'll run my tests overnight, but so far, so good.

        Show
        Patrick J. McNerthney added a comment - I like rev 1353792. I dare say I think I even like it better then my proposal! I'll run my tests overnight, but so far, so good.

          People

          • Assignee:
            Unassigned
            Reporter:
            Greg Bowyer
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development