Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6886

IndexWriter gets angry at leftover temp files (e.g. from BKD)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was trying to run performance test for the new dimensional values and hit this crazy exception:

      Exception in thread "main" java.lang.NumberFormatException: For input string: "5976285795"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:583)
        at org.apache.lucene.index.IndexFileDeleter.inflateGens(IndexFileDeleter.java:287)
        at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:217)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:935)
        at perf.IndexAndSearchOpenStreetMaps.createIndex(IndexAndSearchOpenStreetMaps.java:64)
        at perf.IndexAndSearchOpenStreetMaps.main(IndexAndSearchOpenStreetMaps.java:162)
      

      It happened because I killed my indexing process while BKD was writing temp files. On starting up again, IW would have removed these unreferenced files, except inflateGens got confused by their names.

      This bug only happens on trunk.

      1. LUCENE-6886.patch
        12 kB
        Michael McCandless
      2. LUCENE-6886.patch
        10 kB
        Michael McCandless
      3. LUCENE-6886.patch
        5 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Patch w/ test and fix ... I fixed the createTempOutput impls to be more careful in how they name the temp files, stuffing the "randomness" into the file extension instead of the "gen". I added a protected utility method to BaseDirectory to do this (maybe it should be in Directory instead?).

        Show
        mikemccand Michael McCandless added a comment - Patch w/ test and fix ... I fixed the createTempOutput impls to be more careful in how they name the temp files, stuffing the "randomness" into the file extension instead of the "gen". I added a protected utility method to BaseDirectory to do this (maybe it should be in Directory instead?).
        Hide
        mikemccand Michael McCandless added a comment -

        Here's another possible approach, after talking to Robert Muir.

        I reserved the .tmp extension for use by createTempOutput, and don't allow codecs to use it.

        I also switched to a simple long counter, instead of a randomness, to generate file name candidates, and removed the new method from BaseDirectory.

        Show
        mikemccand Michael McCandless added a comment - Here's another possible approach, after talking to Robert Muir . I reserved the .tmp extension for use by createTempOutput , and don't allow codecs to use it. I also switched to a simple long counter, instead of a randomness, to generate file name candidates, and removed the new method from BaseDirectory .
        Hide
        mikemccand Michael McCandless added a comment -

        New patch: duh, my last patch actually failed to enforce the "no codec
        can make a file ending with .tmp". So I fixed that, then hit exciting
        test failures because we do in fact have a codec component
        (FSTTermsWriter) attempting to do this! So I changed that place
        to use a different extension (it's experimental!), and tests pass.

        Show
        mikemccand Michael McCandless added a comment - New patch: duh, my last patch actually failed to enforce the "no codec can make a file ending with .tmp". So I fixed that, then hit exciting test failures because we do in fact have a codec component ( FSTTermsWriter ) attempting to do this! So I changed that place to use a different extension (it's experimental!), and tests pass.
        Hide
        rcmuir Robert Muir added a comment -

        I like it. I think the counters can be atomics in this case and its still easy to understand. Thanks for adding the check!

        Show
        rcmuir Robert Muir added a comment - I like it. I think the counters can be atomics in this case and its still easy to understand. Thanks for adding the check!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1713103 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1713103 ]

        LUCENE-6886: Directory.createTempOutput always uses .tmp extension, and codecs are not allowed to

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1713103 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1713103 ] LUCENE-6886 : Directory.createTempOutput always uses .tmp extension, and codecs are not allowed to

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development