Lucene - Core
  1. Lucene - Core
  2. LUCENE-3452

The native FS lock used in test-framework's o.a.l.util.LuceneJUnitResultFormatter prohibits testing on a multi-user system

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: general/test
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LuceneJUnitResultFormatter uses a lock to buffer test suites' output, so that when run in parallel, they don't interrupt each other when they are displayed on the console.

      The current implementation uses a fixed directory (lucene_junit_lock/ in java.io.tmpdir (by default /tmp/ on Unix/Linux systems) as the location of this lock. This functionality was introduced on SOLR-1835.

      As Shawn Heisey reported on SOLR-2739, some tests fail when run as root, but succeed when run as a non-root user.

      On #lucene IRC today, Shawn wrote:

      (2:06:07 PM) elyograg: Now that I know I can't run the tests as root, I have discovered /tmp/lucene_junit_lock. Once you run the tests as user A, you cannot run them again as user B until that directory is deleted, and only root or the original user can do so.

        Activity

        Hide
        Robert Muir added a comment -

        Fortunately i only screwed this up on trunk... its been fine in 3.x all along.

        Show
        Robert Muir added a comment - Fortunately i only screwed this up on trunk... its been fine in 3.x all along.
        Hide
        Robert Muir added a comment -

        Somehow my commit got 'lost' e.g. during merging.

        Show
        Robert Muir added a comment - Somehow my commit got 'lost' e.g. during merging.
        Hide
        Grant Ingersoll added a comment -

        I think I am still seeing this between my local user on my machine and Jenkins on the same machine. Jenkins checks out clean every time.

        When I run with ant -v I do see the lockdir set to the build dir, but I also see 'Override ignored for property "tests.lockdir"'

        Show
        Grant Ingersoll added a comment - I think I am still seeing this between my local user on my machine and Jenkins on the same machine. Jenkins checks out clean every time. When I run with ant -v I do see the lockdir set to the build dir, but I also see 'Override ignored for property "tests.lockdir"'
        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.5

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.5
        Hide
        Steve Rowe added a comment -

        I just successfully ran all trunk Lucene/Solr tests with this patch, and everything passed.

        +1 to commit.

        Show
        Steve Rowe added a comment - I just successfully ran all trunk Lucene/Solr tests with this patch, and everything passed. +1 to commit.
        Hide
        Uwe Schindler added a comment -

        On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed.

        We had this several times on Jenkins, too. I killed the JVM approx 4 times the last 2 weeks.

        Show
        Uwe Schindler added a comment - On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed. We had this several times on Jenkins, too. I killed the JVM approx 4 times the last 2 weeks.
        Hide
        Steve Rowe added a comment -

        maybe someone can test this patch?

        I ran Lucene & Solr trunk tests with this patch on a 4-cpu-core Windows 7 box.

        On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed.

        I restarted the Solr tests, and they all succeeded. But after completing ("BUILD SUCCESSFUL, Total time: ..."), the Ant build hung. I killed it after half an hour.

        The changes in the patch are so simple, I really doubt that the hangs I experienced had anything to do with them.

        Show
        Steve Rowe added a comment - maybe someone can test this patch? I ran Lucene & Solr trunk tests with this patch on a 4-cpu-core Windows 7 box. On the first pass, the build hung in the middle of the lucene core tests - I killed the process after half an hour with no output. I restarted the tests, and the build made it through the Lucene tests, but then at least one Solr core test failed. I restarted the Solr tests, and they all succeeded. But after completing ("BUILD SUCCESSFUL, Total time: ..."), the Ant build hung. I killed it after half an hour. The changes in the patch are so simple, I really doubt that the hangs I experienced had anything to do with them.
        Hide
        Robert Muir added a comment -

        maybe someone can test this patch?

        Show
        Robert Muir added a comment - maybe someone can test this patch?
        Hide
        Robert Muir added a comment -

        However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the $

        Unknown macro: {java.io.tmpdir}

        system property.

        There was some reason I didn't do this, but I think it might be obselete.

        Show
        Robert Muir added a comment - However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the $ Unknown macro: {java.io.tmpdir} system property. There was some reason I didn't do this, but I think it might be obselete.
        Hide
        Dawid Weiss added a comment -

        Yep, I was seconds ahead of your post, so I didn't see it

        Show
        Dawid Weiss added a comment - Yep, I was seconds ahead of your post, so I didn't see it
        Hide
        Steve Rowe added a comment -

        Would a quickfix of prepending or postfixing with the user name work here?

        Dawid, as Hoss mentioned (quoted above), it would be even better to additionally include (a hash of a filename from) the source directory in which the tests are being run, to handle the case of simultaneous test runs by the same user.

        However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the ${java.io.tmpdir} system property.

        Show
        Steve Rowe added a comment - Would a quickfix of prepending or postfixing with the user name work here? Dawid, as Hoss mentioned (quoted above), it would be even better to additionally include (a hash of a filename from) the source directory in which the tests are being run, to handle the case of simultaneous test runs by the same user. However, I think an even quick(er)fix would be to just use lucene/build/ as the location of the lock, instead of the value of the ${java.io.tmpdir } system property.
        Hide
        Steve Rowe added a comment -

        From #lucene IRC:

        sarowe: line #69 of lucene/test-framework/o/a/l/util/LuceneJUnitResultFormatter/:
                    File lockDir = new File(System.getProperty("java.io.tmpdir"),
                        "lucene_junit_lock");
          hoss: right ... i saw that ... i suspect it's for using random factories and stuff
                not clera if the goal is to have a dir that all the parallel JVMs agree on or what
                even if it is: should be able to put it in the build dir ... right?
        sarowe: I think so - the only question in my mind is whether it's used in both lucene and solr
                if so, then "the build dir" is not fixed
          hoss: hmmm... good question
                in which case somethine like: "dir = tmp + user.name + md5(path to common-build.xml)" would be better, and probably solve the same problem
        sarowe: I don't understand the last component
                why md5(c-b.xml path)?
                I mean, why anything at all other than tmp + user.name?
          hoss: think rmuir and his beast box ... multiple copies of lucene checked out into diff dirs where he's testing diff patches
        sarowe: a (maybe simpler) option would be to always use lucene/build/?
                even when running solr tests, I mean
          hoss: sure .. maybe ... can you opne an issue and we'll figure it out there? 
        
        Show
        Steve Rowe added a comment - From #lucene IRC: sarowe: line #69 of lucene/test-framework/o/a/l/util/LuceneJUnitResultFormatter/: File lockDir = new File(System.getProperty("java.io.tmpdir"), "lucene_junit_lock"); hoss: right ... i saw that ... i suspect it's for using random factories and stuff not clera if the goal is to have a dir that all the parallel JVMs agree on or what even if it is: should be able to put it in the build dir ... right? sarowe: I think so - the only question in my mind is whether it's used in both lucene and solr if so, then "the build dir" is not fixed hoss: hmmm... good question in which case somethine like: "dir = tmp + user.name + md5(path to common-build.xml)" would be better, and probably solve the same problem sarowe: I don't understand the last component why md5(c-b.xml path)? I mean, why anything at all other than tmp + user.name? hoss: think rmuir and his beast box ... multiple copies of lucene checked out into diff dirs where he's testing diff patches sarowe: a (maybe simpler) option would be to always use lucene/build/? even when running solr tests, I mean hoss: sure .. maybe ... can you opne an issue and we'll figure it out there?
        Hide
        Dawid Weiss added a comment -

        Would a quickfix of prepending or postfixing with the user name work here?

        Show
        Dawid Weiss added a comment - Would a quickfix of prepending or postfixing with the user name work here?

          People

          • Assignee:
            Robert Muir
            Reporter:
            Steve Rowe
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development