Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA, 4.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Hudson has a special huge linedocs file that it sets via a -D parameter,
      but this means that anything using LineDocs won't reproduce via our home
      computers if it fails on hudson.

      I think we should disable this.

      1. LUCENE-3910.patch
        2 kB
        Michael McCandless
      2. LUCENE-3910.patch
        3 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        This one is controversial (at least Mike and I don't agree):

        its a reproducibility versus coverage thing.

        I'm gonna unset 3.6 because the problem already exists in other
        3.x releases, and it only affects nightly builds: for end users
        there is no concern.

        Show
        Robert Muir added a comment - This one is controversial (at least Mike and I don't agree): its a reproducibility versus coverage thing. I'm gonna unset 3.6 because the problem already exists in other 3.x releases, and it only affects nightly builds: for end users there is no concern.
        Hide
        Dawid Weiss added a comment -

        I agree with you both. No, it's not a paradox. On one hand – I agree that having larger test files is good and on the other I agree with Robert that not being able to reproduce locally because of different (or inconsistent) data is a pain.

        At Carrot Search we have put all the "big data" into a separate git repository and this is simply mirrored across build servers and our local machines. Granted, the first clone takes a while, but then pulls of additional data are much faster and (which is a big plus) git repo has an md5 of the revision so this can be emitted as a log upon failure (we don't do it because we're pretty much sure the checkouts are consistent, but it could be done to ensure testing against exact same test files).

        Just thoughts to consider.

        Show
        Dawid Weiss added a comment - I agree with you both. No, it's not a paradox. On one hand – I agree that having larger test files is good and on the other I agree with Robert that not being able to reproduce locally because of different (or inconsistent) data is a pain. At Carrot Search we have put all the "big data" into a separate git repository and this is simply mirrored across build servers and our local machines. Granted, the first clone takes a while, but then pulls of additional data are much faster and (which is a big plus) git repo has an md5 of the revision so this can be emitted as a log upon failure (we don't do it because we're pretty much sure the checkouts are consistent, but it could be done to ensure testing against exact same test files). Just thoughts to consider.
        Hide
        Robert Muir added a comment -

        If we are going to keep this large file, local reproducibility needs to be made easier.

        Currently, if something fails in the nightly build, nobody fixes it because of this problem:
        I know i wont go wasting my time with tests that dont reproduce!

        If we arent going to do this we should disable the linedocs (I will do this in 72 hours,
        if the situation isnt improved)

        There is absolutely no point of finding test fails that no one will debug because its too hard.

        Show
        Robert Muir added a comment - If we are going to keep this large file, local reproducibility needs to be made easier. Currently, if something fails in the nightly build, nobody fixes it because of this problem: I know i wont go wasting my time with tests that dont reproduce! If we arent going to do this we should disable the linedocs (I will do this in 72 hours, if the situation isnt improved) There is absolutely no point of finding test fails that no one will debug because its too hard.
        Hide
        Michael McCandless added a comment -

        If we are going to keep this large file, local reproducibility needs to be made easier.

        +1

        I think, first, we should add -Dtests.linedocsfile=XXX to the "reproduce line", when that was passed to 'ant test'.

        Second, I think we should put the nightly line file somewhere "accessible". It's currently at http://people.apache.org/~mikemccand/enwiki.random.lines.txt ...

        Maybe we can have an ant target, or ivy, to pull down a copy to your local area? I also like Dawid's idea to have a separate "big test data" repository somewhere...

        Show
        Michael McCandless added a comment - If we are going to keep this large file, local reproducibility needs to be made easier. +1 I think, first, we should add -Dtests.linedocsfile=XXX to the "reproduce line", when that was passed to 'ant test'. Second, I think we should put the nightly line file somewhere "accessible". It's currently at http://people.apache.org/~mikemccand/enwiki.random.lines.txt ... Maybe we can have an ant target, or ivy, to pull down a copy to your local area? I also like Dawid's idea to have a separate "big test data" repository somewhere...
        Hide
        Robert Muir added a comment -

        I also like Dawid's idea to have a separate "big test data" repository somewhere...

        +1 for some better separation here. Source release and svn checkouts are bloated because of
        all this test data.
        E.g. linefile data is over 5MB, snowball test data is over 3MB, ...

        Maybe we should separate this out in svn? its "artifact" is a .jar file
        with all these huge files in the appropriate places or something?

        So basically we could sucker this thing down with ivy and put it in the classpath like
        any other dependency.

        The problems that make it hard though, are versioning and "releasing" this thing:

        • if its outside of dev/ SVN that makes versioning the testdata wrt different releases/branches hard.
          this could easily get annoying and complicated.
        • where would we put the resulting "jar" to download via ivy? I dont think we should be downloading
          this from SVN, since our source releases would actually have this as a test dependency.
        Show
        Robert Muir added a comment - I also like Dawid's idea to have a separate "big test data" repository somewhere... +1 for some better separation here. Source release and svn checkouts are bloated because of all this test data. E.g. linefile data is over 5MB, snowball test data is over 3MB, ... Maybe we should separate this out in svn? its "artifact" is a .jar file with all these huge files in the appropriate places or something? So basically we could sucker this thing down with ivy and put it in the classpath like any other dependency. The problems that make it hard though, are versioning and "releasing" this thing: if its outside of dev/ SVN that makes versioning the testdata wrt different releases/branches hard. this could easily get annoying and complicated. where would we put the resulting "jar" to download via ivy? I dont think we should be downloading this from SVN, since our source releases would actually have this as a test dependency.
        Hide
        Michael McCandless added a comment -

        Trivial patch that adds -Dtests.linedocsfile=XXX to the "reproduce with" line.

        Show
        Michael McCandless added a comment - Trivial patch that adds -Dtests.linedocsfile=XXX to the "reproduce with" line.
        Hide
        Michael McCandless added a comment -

        Maybe as a baby step (before we figure out how to hold/release test data in a separate repository)... we can add ant/ivy sugar to pull down the nightly line docs file from p.a.o?

        We can then eg improve the patch I put up to give instructions to run that...

        Show
        Michael McCandless added a comment - Maybe as a baby step (before we figure out how to hold/release test data in a separate repository)... we can add ant/ivy sugar to pull down the nightly line docs file from p.a.o? We can then eg improve the patch I put up to give instructions to run that...
        Hide
        Michael McCandless added a comment -

        Improved patch adding the baby step: a new ant target "get-jenkins-line-docs", and an additional NOTE on a test failure that you can use this target to download the large line docs file.

        I think it's ready...

        Show
        Michael McCandless added a comment - Improved patch adding the baby step: a new ant target "get-jenkins-line-docs", and an additional NOTE on a test failure that you can use this target to download the large line docs file. I think it's ready...
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development