Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexCommit.equals() checks for equality of Directories and versions, but it doesn't check IMHO the more important generation numbers. It looks like commits are really identified by a combination of directory and segments_XXX, which means the generation number, because that's what the DirectoryReader.open() checks for.

      This bug leads to an unexpected behavior when the only change to be committed is in userData - we get two commits then that are declared equal, they have the same version but they have different generation numbers. I have no idea how this situation is treated in a few dozen references to IndexCommit.equals() across Lucene...

      On the surface the fix is trivial - either add the gen number to equals(), or use gen number instead of version. However, it's puzzling why these two would ever get out of sync??? and if they are always supposed to be in sync then maybe we don't need both of them at all, maybe just generation or version is sufficient?

      1. LUCENE-3672.patch
        56 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        and its even crazier because compareTo() uses generation.

        if we move hashCode() and equals() to generation too, and completely remove IndexCommit.getVersion(), all tests pass in Lucene,
        but Solr replication is using IndexCommit.getVersion() for some reason (is it a good reason?)

        Show
        Robert Muir added a comment - and its even crazier because compareTo() uses generation. if we move hashCode() and equals() to generation too, and completely remove IndexCommit.getVersion(), all tests pass in Lucene, but Solr replication is using IndexCommit.getVersion() for some reason (is it a good reason?)
        Hide
        Yonik Seeley added a comment -

        This bug leads to an unexpected behavior when the only change to be committed is in userData - we get two commits then that are declared equal, they have the same version but they have different generation numbers.

        Hmmm, why is the version not bumped? That seems like the bug here (the index changed but the version didn't).

        Show
        Yonik Seeley added a comment - This bug leads to an unexpected behavior when the only change to be committed is in userData - we get two commits then that are declared equal, they have the same version but they have different generation numbers. Hmmm, why is the version not bumped? That seems like the bug here (the index changed but the version didn't).
        Hide
        Robert Muir added a comment -

        its also not ideal and that equals() is inconsistent with compareTo() even if we fixed it this way, because compareTo() doesn't consider Directory.

        Show
        Robert Muir added a comment - its also not ideal and that equals() is inconsistent with compareTo() even if we fixed it this way, because compareTo() doesn't consider Directory.
        Hide
        Robert Muir added a comment -

        Hmmm, why is the version not bumped? That seems like the bug here (the index changed but the version didn't).

        Seems like version is pretty redundant here though, why not just get rid of it. the only thing using it is Solr, if it wants an additional integer here it could do this in the commit user data instead.

        Show
        Robert Muir added a comment - Hmmm, why is the version not bumped? That seems like the bug here (the index changed but the version didn't). Seems like version is pretty redundant here though, why not just get rid of it. the only thing using it is Solr, if it wants an additional integer here it could do this in the commit user data instead.
        Hide
        Michael McCandless added a comment -

        +1 to nuking version, and fixing equals/compareTo/hashCode to only use generation; I think version is redundant w/ the generation.

        Show
        Michael McCandless added a comment - +1 to nuking version, and fixing equals/compareTo/hashCode to only use generation; I think version is redundant w/ the generation.
        Hide
        Yonik Seeley added a comment -

        Generation isn't as good as version for telling if something has changed - it's much easier (probably trivial) to get a false match under real usage scenarios.

        Show
        Yonik Seeley added a comment - Generation isn't as good as version for telling if something has changed - it's much easier (probably trivial) to get a false match under real usage scenarios.
        Hide
        Yonik Seeley added a comment -

        Since bumping version and generation on changes is redundant, what if we replaced version with a creation timestamp?

        Show
        Yonik Seeley added a comment - Since bumping version and generation on changes is redundant, what if we replaced version with a creation timestamp?
        Hide
        Robert Muir added a comment -

        i don't think we need any more timestamps.

        Also the existing IndexCommit getTimeStamp should be removed.

        We currently don't fsync the directory file itself and this is totally wrong on e.g. NFS

        Show
        Robert Muir added a comment - i don't think we need any more timestamps. Also the existing IndexCommit getTimeStamp should be removed. We currently don't fsync the directory file itself and this is totally wrong on e.g. NFS
        Hide
        Michael McCandless added a comment -

        Patch.

        I removed Directory.fileModified, and
        IndexCommit.getVersion/getTimestamp. I changed Solr to take its own
        timestamp (System.currentTimeMillis) and store into the commit's
        userData, and fixed the places that needed it to look it up from
        there. I also throw an exception if ever IndexCommit.compareTo is
        passed an IndexCommit with a different Directory.

        Show
        Michael McCandless added a comment - Patch. I removed Directory.fileModified, and IndexCommit.getVersion/getTimestamp. I changed Solr to take its own timestamp (System.currentTimeMillis) and store into the commit's userData, and fixed the places that needed it to look it up from there. I also throw an exception if ever IndexCommit.compareTo is passed an IndexCommit with a different Directory.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Andrzej Bialecki
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development