Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think this is a blocker for LUCENE-6825, because the block KD-tree makes heavy use of OfflineSorter and we don't want to fill up tmp space ...

      This should be a straightforward cutover, but there are some challenges, e.g. the test was failing because virus checker blocked deleting of files.

      1. LUCENE-6829.patch
        231 kB
        Michael McCandless
      2. LUCENE-6829.patch
        233 kB
        Michael McCandless
      3. LUCENE-6829.patch
        209 kB
        Michael McCandless
      4. LUCENE-6829.patch
        27 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Initial very, very rough patch ... only lucene core compiles/tests, I still need to cutover all places that use OfflineSorter.

        TestOfflineSorter seems to pass, but I sidestepped the virus checker issue.

        It becomes the callers job to pass in a "temp file prefix" from which the OfflineSorter will generate its own temp file names.

        Show
        mikemccand Michael McCandless added a comment - Initial very, very rough patch ... only lucene core compiles/tests, I still need to cutover all places that use OfflineSorter. TestOfflineSorter seems to pass, but I sidestepped the virus checker issue. It becomes the callers job to pass in a "temp file prefix" from which the OfflineSorter will generate its own temp file names.
        Hide
        mikemccand Michael McCandless added a comment -

        New patch. Tests pass, but I don't like the hacks I had to resort to,
        to have things carefully compute "safe" (write once) temp file names.
        There are bugs in those parts (I put nocommits).

        I think to fix this more safely we need to add a new required
        (abstract) method to Directory:

        public abstract IndexOutput createTempOutput(String prefix, String suffix);
        

        This way it's not the caller's (difficult) job of properly recursively
        carving up the namespace.

        Second, this is an API break for things that secretly used
        OfflineSorter (suggesters, Hunspell Dictionary, etc.), but I think
        this is good because it makes it clear to the caller that disk space
        is going to be used by this method and lets the call control where, vs
        running the risk of e.g. filling up your tmp partition.

        Some things still wanted access to "the default temp dir", so I moved
        it from OfflineSorter to IOUtils, and kept the test infra initializing
        that to the mock fs wrapped version. Hmm I think maybe it's just
        Hunspell, maybe I can remove that from IOUtils and just put it
        (privately) in Hunspell. I think ideally nothing in Lucene should be
        secretly using /tmp anymore.

        I also think somehow we should extend the "retry the file delete" that
        IW/IFD provides "down" to things like OfflineSorter so that virus
        checkers won't cause exceptions (this was Rob's idea!) but I think we
        should postpone until later, so I just sidestep the issue by having
        tests disable the virus checker. This change is already big enough.

        Show
        mikemccand Michael McCandless added a comment - New patch. Tests pass, but I don't like the hacks I had to resort to, to have things carefully compute "safe" (write once) temp file names. There are bugs in those parts (I put nocommits). I think to fix this more safely we need to add a new required (abstract) method to Directory: public abstract IndexOutput createTempOutput(String prefix, String suffix); This way it's not the caller's (difficult) job of properly recursively carving up the namespace. Second, this is an API break for things that secretly used OfflineSorter (suggesters, Hunspell Dictionary, etc.), but I think this is good because it makes it clear to the caller that disk space is going to be used by this method and lets the call control where, vs running the risk of e.g. filling up your tmp partition. Some things still wanted access to "the default temp dir", so I moved it from OfflineSorter to IOUtils, and kept the test infra initializing that to the mock fs wrapped version. Hmm I think maybe it's just Hunspell, maybe I can remove that from IOUtils and just put it (privately) in Hunspell. I think ideally nothing in Lucene should be secretly using /tmp anymore. I also think somehow we should extend the "retry the file delete" that IW/IFD provides "down" to things like OfflineSorter so that virus checkers won't cause exceptions (this was Rob's idea!) but I think we should postpone until later, so I just sidestep the issue by having tests disable the virus checker. This change is already big enough.
        Hide
        rcmuir Robert Muir added a comment -

        Second, this is an API break for things that secretly used
        OfflineSorter (suggesters, Hunspell Dictionary, etc.), but I think
        this is good because it makes it clear to the caller that disk space
        is going to be used by this method and lets the call control where, vs
        running the risk of e.g. filling up your tmp partition.

        Some things still wanted access to "the default temp dir", so I moved
        it from OfflineSorter to IOUtils, and kept the test infra initializing
        that to the mock fs wrapped version. Hmm I think maybe it's just
        Hunspell, maybe I can remove that from IOUtils and just put it
        (privately) in Hunspell. I think ideally nothing in Lucene should be
        secretly using /tmp anymore.

        Yeah, +1 for fixing that, I think the break is ok. It was a trap before.

        I also think somehow we should extend the "retry the file delete" that
        IW/IFD provides "down" to things like OfflineSorter

        Its more than that right? (with the current patch) Its also inflateGens too?

        Looks like something went horribly wrong with the formatting of TestRandomChains?

        I think to fix this more safely we need to add a new required
        (abstract) method to Directory:

        So would FSDirectory implement this with the obvious Files.createTempFile() ? That would be my vote. I guess some could argue its no longer technically write-once, since methods like that rely on the atomicity of creating a (zero-byte) file, but we are just going to append to that file after, once. And its a separate method, solely as a temporary file facility, so I think this is how it should be implemented? Deserves some thought I guess either way.

        Show
        rcmuir Robert Muir added a comment - Second, this is an API break for things that secretly used OfflineSorter (suggesters, Hunspell Dictionary, etc.), but I think this is good because it makes it clear to the caller that disk space is going to be used by this method and lets the call control where, vs running the risk of e.g. filling up your tmp partition. Some things still wanted access to "the default temp dir", so I moved it from OfflineSorter to IOUtils, and kept the test infra initializing that to the mock fs wrapped version. Hmm I think maybe it's just Hunspell, maybe I can remove that from IOUtils and just put it (privately) in Hunspell. I think ideally nothing in Lucene should be secretly using /tmp anymore. Yeah, +1 for fixing that, I think the break is ok. It was a trap before. I also think somehow we should extend the "retry the file delete" that IW/IFD provides "down" to things like OfflineSorter Its more than that right? (with the current patch) Its also inflateGens too? Looks like something went horribly wrong with the formatting of TestRandomChains? I think to fix this more safely we need to add a new required (abstract) method to Directory: So would FSDirectory implement this with the obvious Files.createTempFile() ? That would be my vote. I guess some could argue its no longer technically write-once, since methods like that rely on the atomicity of creating a (zero-byte) file, but we are just going to append to that file after, once. And its a separate method, solely as a temporary file facility, so I think this is how it should be implemented? Deserves some thought I guess either way.
        Hide
        rcmuir Robert Muir added a comment -

        and we always have the option of our own simple "temp file logic" that just uses StandardOpenOption.CREATE_NEW to ensure its really unique. I think i like this better...I don't think we have to worry about crazy security concerns like someone guessing what the filename will be. And really we should be opening all files with that flag anyway, its just that we don't yet today because files aren't always write-once in exceptional cases.

        Show
        rcmuir Robert Muir added a comment - and we always have the option of our own simple "temp file logic" that just uses StandardOpenOption.CREATE_NEW to ensure its really unique. I think i like this better...I don't think we have to worry about crazy security concerns like someone guessing what the filename will be. And really we should be opening all files with that flag anyway, its just that we don't yet today because files aren't always write-once in exceptional cases.
        Hide
        dweiss Dawid Weiss added a comment -

        StandardOpenOption.CREATE_NEW

        Oh, I like this!

        Show
        dweiss Dawid Weiss added a comment - StandardOpenOption.CREATE_NEW Oh, I like this!
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, I think it's closer:

        • The default temp dir logic is now private to Hunspell
        • I added Directory.createTempOutput, and also added IndexOutput.getName (so
          you can ask what temp name was picked). I use a seed'd random
          instance to generate the name candidates, retrying until I get one
          that didn't already exist.
        • Simplified the OfflineSorter API: the sort method now owns
          creating a temp file (sorted), and then returns its name
        • Fixed the formatting disaster from TestRandomChains (I blame emacs)
        • I cutover to TrackingDirectory in OfflineSorter to manage
          "deleting temp files on exception", and simply the
          try/finally/success horror show
        • I changed TrackingDiretoryWrapper.getCreatedFiles to make a clone
          first (it had a TODO about it, and I also hit a cryptic
          ConcurrentModificationExc because it didn't clone), and I added an
          explicit clearCreatedFiles, used by IW
        Show
        mikemccand Michael McCandless added a comment - New patch, I think it's closer: The default temp dir logic is now private to Hunspell I added Directory.createTempOutput, and also added IndexOutput.getName (so you can ask what temp name was picked). I use a seed'd random instance to generate the name candidates, retrying until I get one that didn't already exist. Simplified the OfflineSorter API: the sort method now owns creating a temp file (sorted), and then returns its name Fixed the formatting disaster from TestRandomChains (I blame emacs) I cutover to TrackingDirectory in OfflineSorter to manage "deleting temp files on exception", and simply the try/finally/success horror show I changed TrackingDiretoryWrapper.getCreatedFiles to make a clone first (it had a TODO about it, and I also hit a cryptic ConcurrentModificationExc because it didn't clone), and I added an explicit clearCreatedFiles, used by IW
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, I think it's close.

        I made a few simplifications, removed the StandardOpenOption.TRUNCATE_EXISTING (that seems silly to use also with CREATE_NEW), and got precommit passing. Tests seem to pass (at least once).

        I'm hoping to commit this soon: LUCENE-6825 is blocked on it ...

        Show
        mikemccand Michael McCandless added a comment - New patch, I think it's close. I made a few simplifications, removed the StandardOpenOption.TRUNCATE_EXISTING (that seems silly to use also with CREATE_NEW), and got precommit passing. Tests seem to pass (at least once). I'm hoping to commit this soon: LUCENE-6825 is blocked on it ...
        Hide
        dweiss Dawid Weiss added a comment - - edited

        This is a large patch and I only scanned it briefly. It looks good to me. I don't know how to avoid the virus checker special case (requiring odd hacks in the code to disable it).

        Also, blocks like this one:

        +      Path tempPath = Files.createTempDirectory(Dictionary.getDefaultTempDir(), "Hunspell");
        +      boolean success = false;
        +      try (Directory tempDir = FSDirectory.open(tempPath)) {
        +        this.dictionary = new Dictionary(tempDir, "hunspell", affix, dictionaries, ignoreCase);
        +        success = true;
        +      } finally {
        +        // tempPath (directory) should be empty at this point:
        +        if (success) {
        +          Files.delete(tempPath);
        +        } else {
        +          IOUtils.deleteFilesIgnoringExceptions(tempPath);
        +        }
        +      }
        

        Is there any reason why we shouldn't just let the regular exception suppression be used here? I know it'd reverse the precedence, but at least you'd get the full picture (temp. file couldn't be deleted too). Isn't this a leftover pattern from before 1.7 days?

        So, to be clear, why isn't the above just:

        +      Path tempPath = Files.createTempDirectory(Dictionary.getDefaultTempDir(), "Hunspell");
        +      try (Directory tempDir = FSDirectory.open(tempPath)) {
        +        this.dictionary = new Dictionary(tempDir, "hunspell", affix, dictionaries, ignoreCase);
        +      } finally {
        +         Files.delete(tempPath); // will fail if tempPath has any files in it, suppressing any exception.
        +      }
        
        Show
        dweiss Dawid Weiss added a comment - - edited This is a large patch and I only scanned it briefly. It looks good to me. I don't know how to avoid the virus checker special case (requiring odd hacks in the code to disable it). Also, blocks like this one: + Path tempPath = Files.createTempDirectory(Dictionary.getDefaultTempDir(), "Hunspell" ); + boolean success = false ; + try (Directory tempDir = FSDirectory.open(tempPath)) { + this .dictionary = new Dictionary(tempDir, "hunspell" , affix, dictionaries, ignoreCase); + success = true ; + } finally { + // tempPath (directory) should be empty at this point: + if (success) { + Files.delete(tempPath); + } else { + IOUtils.deleteFilesIgnoringExceptions(tempPath); + } + } Is there any reason why we shouldn't just let the regular exception suppression be used here? I know it'd reverse the precedence, but at least you'd get the full picture (temp. file couldn't be deleted too). Isn't this a leftover pattern from before 1.7 days? So, to be clear, why isn't the above just: + Path tempPath = Files.createTempDirectory(Dictionary.getDefaultTempDir(), "Hunspell" ); + try (Directory tempDir = FSDirectory.open(tempPath)) { + this .dictionary = new Dictionary(tempDir, "hunspell" , affix, dictionaries, ignoreCase); + } finally { + Files.delete(tempPath); // will fail if tempPath has any files in it, suppressing any exception. + }
        Hide
        mikemccand Michael McCandless added a comment -

        So, to be clear, why isn't the above just:

        Woops: on exception I had wanted that code block to recursively remove the temp dir, and any files under it, because the temp dir is likely not empty on exception.

        But the patch does not do that, because IOUtils.deleteFilesIgnoringExceptions will not recursively delete. I think I need to change it to your new version, but use IOUtils.rm instead?

        Show
        mikemccand Michael McCandless added a comment - So, to be clear, why isn't the above just: Woops: on exception I had wanted that code block to recursively remove the temp dir, and any files under it, because the temp dir is likely not empty on exception. But the patch does not do that, because IOUtils.deleteFilesIgnoringExceptions will not recursively delete. I think I need to change it to your new version, but use IOUtils.rm instead?
        Hide
        dweiss Dawid Weiss added a comment -

        There was something odd about it. I think IOUtils.rm sounds good – we always attempt to remove the temp. stuff. If we can't do it for some reason, we should fail explicitly. Otherwise you'd have temp files piling up without any visible reason?

        Show
        dweiss Dawid Weiss added a comment - There was something odd about it. I think IOUtils.rm sounds good – we always attempt to remove the temp. stuff. If we can't do it for some reason, we should fail explicitly. Otherwise you'd have temp files piling up without any visible reason?
        Hide
        mikemccand Michael McCandless added a comment -

        Otherwise you'd have temp files piling up without any visible reason?

        Exactly!

        Show
        mikemccand Michael McCandless added a comment - Otherwise you'd have temp files piling up without any visible reason? Exactly!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1708760 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1708760 ]

        LUCENE-6829: OfflineSorter now uses Directory API; add Directory.createTempOutput and IndexOutput.getName

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1708760 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1708760 ] LUCENE-6829 : OfflineSorter now uses Directory API; add Directory.createTempOutput and IndexOutput.getName
        Hide
        steve_rowe Steve Rowe added a comment -

        My Jenkins found a nightly seed that causes TestStressIndexing..testStressIndexAndSearching() to fail nearly 100% of the time:

        [junit4] Suite: org.apache.lucene.index.TestStressIndexing
           [junit4]   1> Thread[Thread-66,5,TGRP-TestStressIndexing]: exc
           [junit4]   1> java.lang.NullPointerException
           [junit4]   1> 	at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:317)
           [junit4]   1> 	at java.util.ComparableTimSort.sort(ComparableTimSort.java:198)
           [junit4]   1> 	at java.util.Arrays.sort(Arrays.java:1246)
           [junit4]   1> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
           [junit4]   1> 	at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73)
           [junit4]   1> 	at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
           [junit4]   1> 	at org.apache.lucene.index.TestStressIndexing$SearcherThread.doWork(TestStressIndexing.java:106)
           [junit4]   1> 	at org.apache.lucene.index.TestStressIndexing$TimedThread.run(TestStressIndexing.java:48)
           [junit4]   1> Thread[Thread-65,5,TGRP-TestStressIndexing]: exc
           [junit4]   1> java.lang.NullPointerException
           [junit4]   1> 	at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258)
           [junit4]   1> 	at java.util.ComparableTimSort.sort(ComparableTimSort.java:203)
           [junit4]   1> 	at java.util.Arrays.sort(Arrays.java:1246)
           [junit4]   1> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
           [junit4]   1> 	at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73)
           [junit4]   1> 	at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
           [junit4]   1> 	at org.apache.lucene.index.TestStressIndexing$SearcherThread.doWork(TestStressIndexing.java:106)
           [junit4]   1> 	at org.apache.lucene.index.TestStressIndexing$TimedThread.run(TestStressIndexing.java:48)
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestStressIndexing -Dtests.method=testStressIndexAndSearching -Dtests.seed=98FCDFF6002C97E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Indian/Comoro -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 0.12s J3 | TestStressIndexing.testStressIndexAndSearching <<<
           [junit4]    > Throwable #1: java.lang.AssertionError
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([98FCDFF6002C97E1:7F95878F4DC6F61D]:0)
           [junit4]    > 	at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:155)
           [junit4]    > 	at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:172)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {contents=PostingsFormat(name=Memory doPackFST= false), id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{}, sim=ClassicSimilarity, locale=tr, timezone=Indian/Comoro
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=573503112,total=1469054976
           [junit4]   2> NOTE: All tests run in this JVM: [TestNumericRangeQuery32, TestMmapDirectory, TestSpans, TestIndexWriterDelete, TestIndexReaderClose, TestSearchForDuplicates, TestStringHelper, TestPackedInts, TestControlledRealTimeReopenThread, TestDirectory, TestMinimize, TestNamedSPILoader, TestLucene50SegmentInfoFormat, TestStressIndexing]
        

        I put a debug println in SegmentInfos.FindSegmentsFile.run() and found that RAMDirectory.listAll() can sometimes produce null entries. Michael McCandless, looks like your commit to RAMDirector.listAll() on this issue is the problem:

        @@ -111,10 +111,7 @@
             // and do not synchronize or anything stronger. it's great for testing!
             // NOTE: fileMap.keySet().toArray(new String[0]) is broken in non Sun JDKs,
             // and the code below is resilient to map changes during the array population.
        -    Set<String> fileNames = fileMap.keySet();
        -    List<String> names = new ArrayList<>(fileNames.size());
        -    for (String name : fileNames) names.add(name);
        -    return names.toArray(new String[names.size()]);
        +    return fileMap.keySet().toArray(new String[fileMap.size()]);
           }
        

        I'm guessing that an inconsistency between fileMap.size() and fileMap.keySet() will cause this issue - if the number of entries goes down inbetween these calls then trailing entries in the resulting array will be null.

        Show
        steve_rowe Steve Rowe added a comment - My Jenkins found a nightly seed that causes TestStressIndexing..testStressIndexAndSearching() to fail nearly 100% of the time: [junit4] Suite: org.apache.lucene.index.TestStressIndexing [junit4] 1> Thread[Thread-66,5,TGRP-TestStressIndexing]: exc [junit4] 1> java.lang.NullPointerException [junit4] 1> at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:317) [junit4] 1> at java.util.ComparableTimSort.sort(ComparableTimSort.java:198) [junit4] 1> at java.util.Arrays.sort(Arrays.java:1246) [junit4] 1> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) [junit4] 1> at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73) [junit4] 1> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) [junit4] 1> at org.apache.lucene.index.TestStressIndexing$SearcherThread.doWork(TestStressIndexing.java:106) [junit4] 1> at org.apache.lucene.index.TestStressIndexing$TimedThread.run(TestStressIndexing.java:48) [junit4] 1> Thread[Thread-65,5,TGRP-TestStressIndexing]: exc [junit4] 1> java.lang.NullPointerException [junit4] 1> at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258) [junit4] 1> at java.util.ComparableTimSort.sort(ComparableTimSort.java:203) [junit4] 1> at java.util.Arrays.sort(Arrays.java:1246) [junit4] 1> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) [junit4] 1> at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73) [junit4] 1> at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) [junit4] 1> at org.apache.lucene.index.TestStressIndexing$SearcherThread.doWork(TestStressIndexing.java:106) [junit4] 1> at org.apache.lucene.index.TestStressIndexing$TimedThread.run(TestStressIndexing.java:48) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing -Dtests.method=testStressIndexAndSearching -Dtests.seed=98FCDFF6002C97E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Indian/Comoro -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.12s J3 | TestStressIndexing.testStressIndexAndSearching <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([98FCDFF6002C97E1:7F95878F4DC6F61D]:0) [junit4] > at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:155) [junit4] > at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:172) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {contents=PostingsFormat(name=Memory doPackFST= false), id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{}, sim=ClassicSimilarity, locale=tr, timezone=Indian/Comoro [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=573503112,total=1469054976 [junit4] 2> NOTE: All tests run in this JVM: [TestNumericRangeQuery32, TestMmapDirectory, TestSpans, TestIndexWriterDelete, TestIndexReaderClose, TestSearchForDuplicates, TestStringHelper, TestPackedInts, TestControlledRealTimeReopenThread, TestDirectory, TestMinimize, TestNamedSPILoader, TestLucene50SegmentInfoFormat, TestStressIndexing] I put a debug println in SegmentInfos.FindSegmentsFile.run() and found that RAMDirectory.listAll() can sometimes produce null entries. Michael McCandless , looks like your commit to RAMDirector.listAll() on this issue is the problem: @@ -111,10 +111,7 @@ // and do not synchronize or anything stronger. it's great for testing! // NOTE: fileMap.keySet().toArray( new String [0]) is broken in non Sun JDKs, // and the code below is resilient to map changes during the array population. - Set< String > fileNames = fileMap.keySet(); - List< String > names = new ArrayList<>(fileNames.size()); - for ( String name : fileNames) names.add(name); - return names.toArray( new String [names.size()]); + return fileMap.keySet().toArray( new String [fileMap.size()]); } I'm guessing that an inconsistency between fileMap.size() and fileMap.keySet() will cause this issue - if the number of entries goes down inbetween these calls then trailing entries in the resulting array will be null.
        Hide
        steve_rowe Steve Rowe added a comment - - edited

        Another test failure my Jenkins found with the same root cause, only reproduces 8/100 iters for me:

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriterCommit -Dtests.method=testPrepareCommitRollback -Dtests.seed=98FCDFF6002C97E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/e
        nwiki.random.lines.txt -Dtests.locale=is -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] ERROR   0.32s | TestIndexWriterCommit.testPrepareCommitRollback <<<
           [junit4]    > Throwable #1: java.lang.NullPointerException
           [junit4]    >        at __randomizedtesting.SeedInfo.seed([98FCDFF6002C97E1:94F93159522F894E]:0)
           [junit4]    >        at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258)
           [junit4]    >        at java.util.ComparableTimSort.sort(ComparableTimSort.java:203)
           [junit4]    >        at java.util.Arrays.sort(Arrays.java:1246)
           [junit4]    >        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:660)
           [junit4]    >        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73)
           [junit4]    >        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
           [junit4]    >        at org.apache.lucene.index.TestIndexWriterCommit.testPrepareCommitRollback(TestIndexWriterCommit.java:599)
           [junit4]    >        at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {content=Lucene50(blocksize=128)}, docValues:{}, sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {content=DFR I(ne)3(800.0)}, locale=is, timezone=Australia/Sydney
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=401791056,total=514850816
           [junit4]   2> NOTE: All tests run in this JVM: [TestIndexWriterCommit]
        
        Show
        steve_rowe Steve Rowe added a comment - - edited Another test failure my Jenkins found with the same root cause, only reproduces 8/100 iters for me: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterCommit -Dtests.method=testPrepareCommitRollback -Dtests.seed=98FCDFF6002C97E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/e nwiki.random.lines.txt -Dtests.locale=is -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.32s | TestIndexWriterCommit.testPrepareCommitRollback <<< [junit4] > Throwable #1: java.lang.NullPointerException [junit4] > at __randomizedtesting.SeedInfo.seed([98FCDFF6002C97E1:94F93159522F894E]:0) [junit4] > at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258) [junit4] > at java.util.ComparableTimSort.sort(ComparableTimSort.java:203) [junit4] > at java.util.Arrays.sort(Arrays.java:1246) [junit4] > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:660) [junit4] > at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73) [junit4] > at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) [junit4] > at org.apache.lucene.index.TestIndexWriterCommit.testPrepareCommitRollback(TestIndexWriterCommit.java:599) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {content=Lucene50(blocksize=128)}, docValues:{}, sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {content=DFR I(ne)3(800.0)}, locale=is, timezone=Australia/Sydney [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=401791056,total=514850816 [junit4] 2> NOTE: All tests run in this JVM: [TestIndexWriterCommit]
        Hide
        mikemccand Michael McCandless added a comment -

        Michael McCandless, looks like your commit to RAMDirector.listAll() on this issue is the problem:

        Sigh, sorry Steve Rowe and thanks for digging/fixing. Tricky concurrency...

        Show
        mikemccand Michael McCandless added a comment - Michael McCandless, looks like your commit to RAMDirector.listAll() on this issue is the problem: Sigh, sorry Steve Rowe and thanks for digging/fixing. Tricky concurrency...

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development