Lucene - Core
  1. Lucene - Core
  2. LUCENE-6658

IndexUpgrader doesn't upgrade an index if it has zero segments

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.4, 5.2.1
    • Fix Version/s: 4.10.5, 5.3, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexUpgrader uses merges to do its job. Therefore, if you use it to upgrade an index with no segments, it will do nothing - it won't even update the version numbers in the segments file, meaning that later versions of Lucene will fail to open the index, despite the fact that you "upgraded" it.

      The suggested workaround when this was raised on the mailing list in January seems to be to use filesystem magic to look at the files, figure out whether there are any segments, and write a new empty index if there are none.

      This sounds easy, but there are probably traps. For instance, there might be files in the directory which don't really belong to the index. Earlier versions of Lucene used to have a FilenameFilter which was usable to distinguish one from the other, but that seems to have disappeared, making it less obvious how to do this.

      This issue is presumed to exist in 3.x as well, I just haven't encountered it yet because the only empty indices I have hit have been later versions.

      1. empty.4.10.4.zip
        0.4 kB
        Uwe Schindler
      2. LUCENE-6658.patch
        4 kB
        Uwe Schindler
      3. LUCENE-6658.patch
        4 kB
        Uwe Schindler
      4. LUCENE-6658.patch
        3 kB
        Uwe Schindler
      5. LUCENE-6658.patch
        2 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        This is well known problem. I thin there is already an issue open!

        Show
        Uwe Schindler added a comment - This is well known problem. I thin there is already an issue open!
        Hide
        Uwe Schindler added a comment -

        This sounds easy, but there are probably traps. For instance, there might be files in the directory which don't really belong to the index. Earlier versions of Lucene used to have a FilenameFilter which was usable to distinguish one from the other, but that seems to have disappeared, making it less obvious how to do this.

        Because you want to upgrade the index, you can also open it with a DirectoryReader and check maxDoc() for 0. Opening a DirectoryReader should be easy if you have the JAR files at hand - you need them anyways for upgrading.

        We can add similar behaviour to the IndexUpgrader in 5.x, but not in older versions!

        Show
        Uwe Schindler added a comment - This sounds easy, but there are probably traps. For instance, there might be files in the directory which don't really belong to the index. Earlier versions of Lucene used to have a FilenameFilter which was usable to distinguish one from the other, but that seems to have disappeared, making it less obvious how to do this. Because you want to upgrade the index, you can also open it with a DirectoryReader and check maxDoc() for 0. Opening a DirectoryReader should be easy if you have the JAR files at hand - you need them anyways for upgrading. We can add similar behaviour to the IndexUpgrader in 5.x, but not in older versions!
        Hide
        Uwe Schindler added a comment -

        The suggested workaround when this was raised on the mailing list in January seems to be to use filesystem magic to look at the files, figure out whether there are any segments, and write a new empty index if there are none.

        I think IndexUpgrader should simply force a commit (maybe also add some commit metadata that this is an "upgraded" index). A forced commit for an empty index will definitely write the new files. The problem with IndexUpgrader is that it does not see any segement, so it will force no merge and finally does not commit on closing IndexWriter.

        Show
        Uwe Schindler added a comment - The suggested workaround when this was raised on the mailing list in January seems to be to use filesystem magic to look at the files, figure out whether there are any segments, and write a new empty index if there are none. I think IndexUpgrader should simply force a commit (maybe also add some commit metadata that this is an "upgraded" index). A forced commit for an empty index will definitely write the new files. The problem with IndexUpgrader is that it does not see any segement, so it will force no merge and finally does not commit on closing IndexWriter.
        Hide
        Uwe Schindler added a comment -

        Here is a patch to make IndexUpgrader force a commit (I had to use a trick: supplying empty commit metadata to provoke a change).

        I have not yet added a test (I would need to create some older "empty" index...).

        This patch could also be applied to 4.10.5, but I don't think we get a 3.6 anymore. Trejkaz - can you try this out and see if it helps. Maybe apply the patch also to older versions.

        Michael McCandless: Do you think this is good? Or is there a better solution to enforce a commit, so all metadata of empty index is rewritten?

        Show
        Uwe Schindler added a comment - Here is a patch to make IndexUpgrader force a commit (I had to use a trick: supplying empty commit metadata to provoke a change). I have not yet added a test (I would need to create some older "empty" index...). This patch could also be applied to 4.10.5, but I don't think we get a 3.6 anymore. Trejkaz - can you try this out and see if it helps. Maybe apply the patch also to older versions. Michael McCandless : Do you think this is good? Or is there a better solution to enforce a commit, so all metadata of empty index is rewritten?
        Hide
        Uwe Schindler added a comment -

        Just as unrelated recommandation: We should move the IndexUpgrader and its merge policy to backward-codecs.jar. Makes no sense to have this in core. Should I open new issue about that?

        Show
        Uwe Schindler added a comment - Just as unrelated recommandation: We should move the IndexUpgrader and its merge policy to backward-codecs.jar. Makes no sense to have this in core. Should I open new issue about that?
        Hide
        Uwe Schindler added a comment -

        I added an additional check to TestBackwardsCompatibility that our index also has latest version in commit point after upgrade.

        I will now create an empty 4.10 index and place it there to verify the update (like the single segment one). I will not create one for all old versions (I don't think this is needed).

        Show
        Uwe Schindler added a comment - I added an additional check to TestBackwardsCompatibility that our index also has latest version in commit point after upgrade. I will now create an empty 4.10 index and place it there to verify the update (like the single segment one). I will not create one for all old versions (I don't think this is needed).
        Hide
        Uwe Schindler added a comment -

        I added a test to verify that it works. This patch applies to 5.x.

        Show
        Uwe Schindler added a comment - I added a test to verify that it works. This patch applies to 5.x.
        Hide
        Michael McCandless added a comment -

        Michael McCandless: Do you think this is good?

        I think calling IW.setCommitData is a good trick to force a commit, except we should be careful not to overwrite any prior commit data in this index?

        Show
        Michael McCandless added a comment - Michael McCandless: Do you think this is good? I think calling IW.setCommitData is a good trick to force a commit, except we should be careful not to overwrite any prior commit data in this index?
        Hide
        Uwe Schindler added a comment - - edited

        Previous commit data will be removed in any case (see code that removes old commit points). Or is commit data preserved while opening IndexWriter and reused when committing to a new commit point? If this is the case, I can use get/set maybe?

        Show
        Uwe Schindler added a comment - - edited Previous commit data will be removed in any case (see code that removes old commit points). Or is commit data preserved while opening IndexWriter and reused when committing to a new commit point? If this is the case, I can use get/set maybe?
        Hide
        Uwe Schindler added a comment -

        I changed the fake userdata call to:

        w.setCommitData(w.getCommitData()); // fake change to enforce a commit (e.g. if index has no segments)
        
        Show
        Uwe Schindler added a comment - I changed the fake userdata call to: w.setCommitData(w.getCommitData()); // fake change to enforce a commit (e.g. if index has no segments)
        Hide
        Michael McCandless added a comment -

        Or is commit data preserved while opening IndexWriter and reused when committing to a new commit point? If this is the case, I can use get/set maybe?

        Yeah, it will be preserved, carried over from the commit point that IW had opened (the latest commit point in this case).

        +1 to use get/set.

        Show
        Michael McCandless added a comment - Or is commit data preserved while opening IndexWriter and reused when committing to a new commit point? If this is the case, I can use get/set maybe? Yeah, it will be preserved, carried over from the commit point that IW had opened (the latest commit point in this case). +1 to use get/set.
        Hide
        Michael McCandless added a comment -

        I changed the fake userdata call to:

        +1, thanks Uwe Schindler

        Show
        Michael McCandless added a comment - I changed the fake userdata call to: +1, thanks Uwe Schindler
        Hide
        ASF subversion and git services added a comment -

        Commit 1689411 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1689411 ]

        LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any segments

        Show
        ASF subversion and git services added a comment - Commit 1689411 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689411 ] LUCENE-6658 : Fix IndexUpgrader to also upgrade indexes without any segments
        Hide
        ASF subversion and git services added a comment -

        Commit 1689420 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1689420 ]

        Merged revision(s) 1689411 from lucene/dev/branches/branch_5x:
        LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any segments

        Show
        ASF subversion and git services added a comment - Commit 1689420 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1689420 ] Merged revision(s) 1689411 from lucene/dev/branches/branch_5x: LUCENE-6658 : Fix IndexUpgrader to also upgrade indexes without any segments
        Hide
        ASF subversion and git services added a comment -

        Commit 1689424 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1689424 ]

        Merged revision(s) 1689411 from lucene/dev/branches/branch_5x:
        LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any segments

        Show
        ASF subversion and git services added a comment - Commit 1689424 from Uwe Schindler in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1689424 ] Merged revision(s) 1689411 from lucene/dev/branches/branch_5x: LUCENE-6658 : Fix IndexUpgrader to also upgrade indexes without any segments
        Hide
        Uwe Schindler added a comment -

        I also backported to 4.10.5 (was a bit harder, because we had no Version in the SegmentInfos). To check that the commit was actually applied, I check the generation there.

        Show
        Uwe Schindler added a comment - I also backported to 4.10.5 (was a bit harder, because we had no Version in the SegmentInfos). To check that the commit was actually applied, I check the generation there.
        Hide
        Uwe Schindler added a comment - - edited

        Trejkaz: This will not be backported to 3.6 (and also release of 4.10.5 is very unlikely). If you want to "ensure" than index was upgraded, I'd suggest to use the UpgradeIndexMergePolicy directly and then more or less copy the code from IndexUpgrader (so it opens IndexWriter, sets the special merge policy, forceMerge(1), setCommitUserData(getCommitUserData()), and finally commit()). Alternatively patch 3.6.x.

        Show
        Uwe Schindler added a comment - - edited Trejkaz : This will not be backported to 3.6 (and also release of 4.10.5 is very unlikely). If you want to "ensure" than index was upgraded, I'd suggest to use the UpgradeIndexMergePolicy directly and then more or less copy the code from IndexUpgrader (so it opens IndexWriter, sets the special merge policy, forceMerge(1), setCommitUserData(getCommitUserData()), and finally commit()). Alternatively patch 3.6.x.
        Hide
        Trejkaz added a comment -

        This works when applied to my v4 and v5 here as well. (At least for my inadequate collection of test indices... since I started generating them today.)

        I tried to backport into my copy of v3.6, but IndexWriter.setCommitUserData doesn't exist and commit(Map) doesn't force a commit, even with a non-empty map. Or if it does, it doesn't update the index format.

        Show
        Trejkaz added a comment - This works when applied to my v4 and v5 here as well. (At least for my inadequate collection of test indices... since I started generating them today.) I tried to backport into my copy of v3.6, but IndexWriter.setCommitUserData doesn't exist and commit(Map) doesn't force a commit, even with a non-empty map. Or if it does, it doesn't update the index format.
        Hide
        Trejkaz added a comment -

        What does work: making a forceCommit() method which does nothing other than increment changeCount and call commit().

        Show
        Trejkaz added a comment - What does work: making a forceCommit() method which does nothing other than increment changeCount and call commit().
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Trejkaz
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development