Lucene - Core
  1. Lucene - Core
  2. LUCENE-3452

The native FS lock used in test-framework's o.a.l.util.LuceneJUnitResultFormatter prohibits testing on a multi-user system

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: general/test
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LuceneJUnitResultFormatter uses a lock to buffer test suites' output, so that when run in parallel, they don't interrupt each other when they are displayed on the console.

      The current implementation uses a fixed directory (lucene_junit_lock/ in java.io.tmpdir (by default /tmp/ on Unix/Linux systems) as the location of this lock. This functionality was introduced on SOLR-1835.

      As Shawn Heisey reported on SOLR-2739, some tests fail when run as root, but succeed when run as a non-root user.

      On #lucene IRC today, Shawn wrote:

      (2:06:07 PM) elyograg: Now that I know I can't run the tests as root, I have discovered /tmp/lucene_junit_lock. Once you run the tests as user A, you cannot run them again as user B until that directory is deleted, and only root or the original user can do so.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        32d 4h 19m 1 Robert Muir 26/Oct/11 00:51
        Closed Closed Reopened Reopened
        4d 13h 13m 1 Robert Muir 02/Dec/11 01:42
        Reopened Reopened Resolved Resolved
        1h 32m 1 Robert Muir 02/Dec/11 03:15
        Resolved Resolved Closed Closed
        557d 20h 6m 2 Uwe Schindler 10/May/13 11:44
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Robert Muir made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Robert Muir added a comment -

        Fortunately i only screwed this up on trunk... its been fine in 3.x all along.

        Show
        Robert Muir added a comment - Fortunately i only screwed this up on trunk... its been fine in 3.x all along.
        Robert Muir made changes -
        Resolution Fixed [ 1 ]
        Status Closed [ 6 ] Reopened [ 4 ]
        Assignee Robert Muir [ rcmuir ]
        Hide
        Robert Muir added a comment -

        Somehow my commit got 'lost' e.g. during merging.

        Show
        Robert Muir added a comment - Somehow my commit got 'lost' e.g. during merging.
        Hide
        Grant Ingersoll added a comment -

        I think I am still seeing this between my local user on my machine and Jenkins on the same machine. Jenkins checks out clean every time.

        When I run with ant -v I do see the lockdir set to the build dir, but I also see 'Override ignored for property "tests.lockdir"'

        Show
        Grant Ingersoll added a comment - I think I am still seeing this between my local user on my machine and Jenkins on the same machine. Jenkins checks out clean every time. When I run with ant -v I do see the lockdir set to the build dir, but I also see 'Override ignored for property "tests.lockdir"'
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.5

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.5
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 3.5 [ 12317877 ]
        Fix Version/s 4.0 [ 12314025 ]
        Resolution Fixed [ 1 ]
        Hide
        Steve Rowe added a comment -

        I just successfully ran all trunk Lucene/Solr tests with this patch, and everything passed.

        +1 to commit.

        Show
        Steve Rowe added a comment - I just successfully ran all trunk Lucene/Solr tests with this patch, and everything passed. +1 to commit.
        Hide
        Uwe Schindler added a comment -

        On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed.

        We had this several times on Jenkins, too. I killed the JVM approx 4 times the last 2 weeks.

        Show
        Uwe Schindler added a comment - On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed. We had this several times on Jenkins, too. I killed the JVM approx 4 times the last 2 weeks.
        Hide
        Steve Rowe added a comment -

        maybe someone can test this patch?

        I ran Lucene & Solr trunk tests with this patch on a 4-cpu-core Windows 7 box.

        On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed.

        I restarted the Solr tests, and they all succeeded. But after completing ("BUILD SUCCESSFUL, Total time: ..."), the Ant build hung. I killed it after half an hour.

        The changes in the patch are so simple, I really doubt that the hangs I experienced had anything to do with them.

        Show
        Steve Rowe added a comment - maybe someone can test this patch? I ran Lucene & Solr trunk tests with this patch on a 4-cpu-core Windows 7 box. On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed. I restarted the Solr tests, and they all succeeded. But after completing ("BUILD SUCCESSFUL, Total time: ..."), the Ant build hung. I killed it after half an hour. The changes in the patch are so simple, I really doubt that the hangs I experienced had anything to do with them.
        Robert Muir made changes -
        Attachment LUCENE-3452.patch [ 12496324 ]
        Hide
        Robert Muir added a comment -

        maybe someone can test this patch?

        Show
        Robert Muir added a comment - maybe someone can test this patch?
        Hide
        Robert Muir added a comment -

        However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the $

        Unknown macro: {java.io.tmpdir}

        system property.

        There was some reason I didn't do this, but I think it might be obselete.

        Show
        Robert Muir added a comment - However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the $ Unknown macro: {java.io.tmpdir} system property. There was some reason I didn't do this, but I think it might be obselete.
        Hide
        Dawid Weiss added a comment -

        Yep, I was seconds ahead of your post, so I didn't see it

        Show
        Dawid Weiss added a comment - Yep, I was seconds ahead of your post, so I didn't see it
        Hide
        Steve Rowe added a comment -

        Would a quickfix of prepending or postfixing with the user name work here?

        Dawid, as Hoss mentioned (quoted above), it would be even better to additionally include (a hash of a filename from) the source directory in which the tests are being run, to handle the case of simultaneous test runs by the same user.

        However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the ${java.io.tmpdir} system property.

        Show
        Steve Rowe added a comment - Would a quickfix of prepending or postfixing with the user name work here? Dawid, as Hoss mentioned (quoted above), it would be even better to additionally include (a hash of a filename from) the source directory in which the tests are being run, to handle the case of simultaneous test runs by the same user. However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the ${java.io.tmpdir } system property.
        Hide
        Steve Rowe added a comment -

        From #lucene IRC:

        sarowe: line #69 of lucene/test-framework/o/a/l/util/LuceneJUnitResultFormatter/:
                    File lockDir = new File(System.getProperty("java.io.tmpdir"),
                        "lucene_junit_lock");
          hoss: right ... i saw that ... i suspect it's for using random factories and stuff
                not clera if the goal is to have a dir that all the parallel JVMs agree on or what
                even if it is: should be able to put it in the build dir ... right?
        sarowe: I think so - the only question in my mind is whether it's used in both lucene and solr
                if so, then "the build dir" is not fixed
          hoss: hmmm... good question
                in which case somethine like: "dir = tmp + user.name + md5(path to common-build.xml)" would be better, and probably solve the same problem
        sarowe: I don't understand the last component
                why md5(c-b.xml path)?
                I mean, why anything at all other than tmp + user.name?
          hoss: think rmuir and his beast box ... multiple copies of lucene checked out into diff dirs where he's testing diff patches
        sarowe: a (maybe simpler) option would be to always use lucene/build/?
                even when running solr tests, I mean
          hoss: sure .. maybe ... can you opne an issue and we'll figure it out there? 
        
        Show
        Steve Rowe added a comment - From #lucene IRC: sarowe: line #69 of lucene/test-framework/o/a/l/util/LuceneJUnitResultFormatter/: File lockDir = new File(System.getProperty("java.io.tmpdir"), "lucene_junit_lock"); hoss: right ... i saw that ... i suspect it's for using random factories and stuff not clera if the goal is to have a dir that all the parallel JVMs agree on or what even if it is: should be able to put it in the build dir ... right? sarowe: I think so - the only question in my mind is whether it's used in both lucene and solr if so, then "the build dir" is not fixed hoss: hmmm... good question in which case somethine like: "dir = tmp + user.name + md5(path to common-build.xml)" would be better, and probably solve the same problem sarowe: I don't understand the last component why md5(c-b.xml path)? I mean, why anything at all other than tmp + user.name? hoss: think rmuir and his beast box ... multiple copies of lucene checked out into diff dirs where he's testing diff patches sarowe: a (maybe simpler) option would be to always use lucene/build/? even when running solr tests, I mean hoss: sure .. maybe ... can you opne an issue and we'll figure it out there?
        Hide
        Dawid Weiss added a comment -

        Would a quickfix of prepending or postfixing with the user name work here?

        Show
        Dawid Weiss added a comment - Would a quickfix of prepending or postfixing with the user name work here?
        Steve Rowe made changes -
        Field Original Value New Value
        Description {{LuceneJUnitResultFormatter}} uses a lock to buffer test suites' output, so that when run in parallel, they don't interrupt each other when they are displayed on the console.

        The current implementation uses a fixed directory ({{lucene_junit_lock/}} in {{java.io.tmpdir}} (by default {{/tmp/}} on Unix/Linux systems) as the location of this lock. This functionality was introduced on SOLR-1835.

        As Shawn Heisey reported on SOLR-2739, some tests fail when run as root, but succeed when run as a non-root user.

        On #lucene IRC today, Shawn wrote:
        {noformat}
        (2:06:07 PM) elyograg: Now that I know I can't run the tests as root, I have discovered /tmp/lucene_junit_lock. Once you run the tests as user A, you cannot run them again as user B until that directory is deleted, and only root or the original user can do so.
        {noformat}
        {{LuceneJUnitResultFormatter}} uses a lock to buffer test suites' output, so that when run in parallel, they don't interrupt each other when they are displayed on the console.

        The current implementation uses a fixed directory ({{lucene_junit_lock/}} in {{java.io.tmpdir}} (by default {{/tmp/}} on Unix/Linux systems) as the location of this lock. This functionality was introduced on SOLR-1835.

        As Shawn Heisey reported on SOLR-2739, some tests fail when run as root, but succeed when run as a non-root user.

        On #lucene IRC today, Shawn wrote:
        {quote}
        (2:06:07 PM) elyograg: Now that I know I can't run the tests as root, I have discovered /tmp/lucene_junit_lock. Once you run the tests as user A, you cannot run them again as user B until that directory is deleted, and only root or the original user can do so.
        {quote}
        Steve Rowe created issue -

          People

          • Assignee:
            Robert Muir
            Reporter:
            Steve Rowe
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development